Importing csv file with ragged right format and ” text qualifier

csvssis

I'm importing csv files in ssis that are in a ragged right format, I've been able to create the columns perfectly fine but for some of the data rows it has included a " text qualifier into the data in the first column which makes the character count of each column out sync and pushes some characters over to the next columns, the " text qualifier is also found in the last column of the same row by itself instead of the value being blank. These " text qualifiers are also not found in the original csv file, the flat file connection manager also does not let you use the text qualifier option when working with ragged right so i can't take them out that way.

enter image description here

Best Answer

The options I can think of are.

  • Preprocess the file to remove the double quotes using an 'execute process' task. An easy powershell script that could do this for you would be something like :
    powershell -Command {cat \yourpath\stuff.txt | %{if($_ -match '"?(\w+)'){$Matches[1]} }>\yourpath\stuffout.txt} 
    
  • Process the file as a single column of variable text. Use an execute script task to process each record and apply a regular expression as the one above ("?(\w+)) or one that better serves your business rules. Redirect each non-compliant record to a file to review later.

The second option is more "correct" as your etl will be prepared to report on non-compliant rows. You can expand on your regular expression to make it more specific.