SSIS 2012 Ftp parallel/asynchronous multiple file download

cparallelismssisssis-2012

Any help would be appreciated with either scenario.

if

Are there any settings available to download multiple files in parallel from an FTP server?

else

If not, I guess I'll just roll my own via C# script source component using: List Directory Contents with FTP, FtpWebRequest Class, and C# Multiple Download from FTP using parallel task – Duplicate Download issue to get going.

other details

  • SSIS 2012 saving to a local Windows Server 2012 directory
  • 100 + ascii text files
  • 6 sibling folders
  • Filename convention: AAAA_YYYYMMDD.txt
  • File size ranges from 5KB – 15MB
  • Currently only able to download one file at a time
  • Run once per day manually at this stage via SSDT (still prototyping/developing, so not on a production system, yet, but I plan to have Sql Agent running the packages in the future)
  • Downstream packages to load text file data into data warehouse staging tables (if script task is used then perhaps the response stream buffer can go into a multicast with 1 buffer stream writing to text files, while the other buffer stream writes to staging tables?)

Best Answer

Native, no.

Heck, I barely classify the out of the box FTP component as a real component. It meets some technical specification for FTP only for the most rudimentary operations. I've usually gone the route of calling ftp.exe from the Execute Process Task with a parameter file or just used the .NET libraries for doing so.

I had never thought about parallelizing FTP downloads but this question over on SO looks like a valid implementation of it https://stackoverflow.com/questions/18117536/c-sharp-multiple-download-from-ftp-using-parallel-task-duplicate-download-issu

Depending on how you design things, I've come to the conclusion that having packages download files and then perform operations on the file(s) it might have found is a painful pattern. Instead, I find I've had better results with separating those activities and making an assumption that the consuming package will only run if data already exists. This allowed me to make radical changes to how I acquired the data (we went from SFTP to FTP with no core package change) without having to validate/retest the processing of the data. Might not be an issue for me but simplified my compliance life.

The net result of the above was that my agent job went from "run Package" to "run ftp package", "test for existence", "run processing package" or "alert that no file found". Modularization allowed us more flexibility as well as letting more people work on the problem vs one person working on a monolithic package.