SSIS Performance – Push and Pull Optimization

business-intelligenceperformancesql serverssist-sql

After reading Which one is more efficient: select from linked server or insert into linked server? I wonder if this is true also for SSIS?

For example, I have 2 SQL Servers (A and B) and SSIS solution which extract data from A and load it to B (A->B). What's the difference in performance, between deploying SSIS package on server A or B?

Best Answer

SSIS is going to execute outside of SQL Server's address space. The data flow transformation within SSIS derives its performance through in-memory transformation to data. The more ram you can give it, the better theoretical performance you can obtain.

All things being equal between servers, I'd be surprised if there was an appreciable difference no reference sited

Things are rarely equal, even between servers that are spec'ed the same

If A has 32GB of RAM with SQL Server sized to take 24GB, that leaves 8 gigs for the OS and any other applications, like SSIS, to run in. If B is similarly allocated, things may still not be exactly the same - do people regularly RDP into one over the other? Do they do copy files around (#3 Check task manager) Lots of little things like that will skew the amount of memory left to the OS. Moral of the story, run it on the box that likely has the most amount of RAM available.

Unless you profile your SSIS packages and determine that you're CPU bound, in which case you need to see which server is less CPU intensive, which may vary based on the ETL schedule. I've had large servers that are basically idle during "working hours." When the ETL fires off, the CPU's go to 100 percent for few hours.

There is a destination within the Data Flow Task, the SQL Server destination. It only works if the package is executed on the destination server, B. When SSIS was brand new, everyone wanted to use this as it was screaming cheetah wheelies fast. Under the covers, they do this awesome memory space mapping thing into SQL Server - sort of like a partition switch. The data is in SSIS space and poof it's now in SQL Server's space. Made for really fast destination. Except it's fragile as hell. Not only does it have the aforementioned limitation (which you now understand due to the whole memory mapping exercise going cross server is not an option) but you would regularly encounter errors that basically said "oops, that didn't work, try again" Restart the package with all the same data and now it saves fine. No code changes, no data changes, nothing. Just randomly fails, or succeeds depending on how you view the situation. Moral of the story, use the OLE DB Destination - performance is just as good as SQL Server destination but without the baggage.

If A or B have different disk configurations and/or performance signatures, Spinning Rust on A while B has the better-than-SSD-memory-thing do your processing on the one with better resources, especially if you are pulling LOB types or see the "Buffers Spooled" performance counter being non-zero then your in memory ETL is now disk based file processing and you will feel the pain.