Sql-server – Escape a quote in SSIS Flat File Conn Manager

sql serverssis

I am trying to import a simple text file, that has a few "extra" quotation marks in it, and I am trying to see if there is a simple way to accommodate this.

To reproduce the issue, I have a simple three line file I want to import:

"FirstName","LastName","ZIPCODE"
"Tim","Harkin","11111"
"Tim","Harkin"","11111"
"George","Washington","12345"

The extra quote in the third line is throwing off the import. Looking at the file connection manager, I have the following settings:

enter image description here

However, looking at the preview, the extra quote then ends up "hiding" the column delimiter:

enter image description here

Any suggestions to get the connection manager to ignore that extra quote, and interpret each line as they exist in the flat file?

Best Answer

First, I'll assume you can't simply ask for a correctly formatted csv file. If the last name should really be Harkin", then the entry in the csv file should have escaped the double quote character by doubling it (i.e., "Harkin"""). If this is a recurring process, then the best solution will definitely be to require a correctly formatted file.

I don't know of a good way to force SSIS to read in an incorrectly formatted file and figure out how it should be formatted.

If I were in your shoes, I'd try one of two approaches:

  1. Repair the file manually: Open the file up with a text editor, preferably one that can handle regular expressions. Replace the string "," in the file with a unique token (say, --COLBREAK-- - anything that won't be in your data will work [check first if in doubt]). Then search for double quotes between two characters (i.e., any remaining double quotes not at the start and end of the line). Check these, if appropriate replacing " with "". you might also want to confirm that all lines have the expected number of column breaks, to make sure that you don't have the ","embedded in an existing string, or rows that are missing columns altogether. When done, change your token back to "," and try importing the file again.

  2. Load the data in two phases: If there too much data to manipulate in an editor, rewrite your initial load process to import the entire row as one column. You can process this data in SSIS or load it to a SQL table. Next, check for valid rows (start with ", have "," twice, end with "; and, does not have seven or more instances of " in it), and write those rows to a "valid" file. Write the rest of the rows to an "invalid" file. Process the "valid" file as you originally planned; again, manually resolve the rows in the "invalid" file so they are valid, then process them as well.