I'm importing a large set (126M rows) of file data from CSVs. Import works fine, except I'm trying to derive a column as I import rather than trying to script it out afterward (due to the number of rows).
SSIS is not terribly helpful with string manipulation functions, and the possibilities I have are pretty complicated as well.
The derived column needs to reflect the file extensions. File extensions range from 1-10 characters (anything longer can be truncated at 10), and are (of course) separated from the file name and path by a period. They don't correspond to a concise list of file types (something like "docx, xlsx, accdb, msg" are Office types). The file path may also have periods in it. And in some cases it will not have files at the end of the path, just a folder name itself.
Some examples:
\\Server\Share\named resource
\\Server\Share\this.looks.like.a.file_but.it.isnt\its_a_directory
\\Server\Share\Group\Year.Month.Day.Subfolder\File.ext
\\Server\Share\Team Folder 1.404\another.file.here.extension
So I need to be able to extract the extensions up to 10 characters.
I tried TOKEN() but with folder names possibly having periods, that didn't work well – adding \
as a token delimiter helped some, but I still got odd extensions from folder names.
I can't find a combination of SUBSTRING(), RIGHT(), FINDSTRING(), and/or TOKEN() that will meet the rules and the derived column definition doesn't allow logic like IF or IIF.
Some false positives are expected and I plan to sort them out after importing. If it helps here, I have a second column in the CSVs which is extracted by SSIS, it's the size of the file (for folders it populates a 0). I haven't gotten this to matter either, because of the lack of IF or IIF in derived column definition.
Best Answer
Firstly, the IF/IIF you are looking for is the "?" Conditional Operator
? Conditional
One way to implement the Derived Column would be to parse the string like this
In this case, "name" is the column that holds the string to be parsed.
What we are doing here is:
RTRIM( name )
REVERSE()
TOKEN()
TOKENCOUNT()
?
TOKEN()
REVERSE()