Sql-server – Using OLE DB Source SQL command vs desinging the query using the object tools

sql serversql-server-2008-r2ssis

In SSIS, we can use the OLE DB Source SQL command to run a T-SQL script on the database and then use the output as the source. For example the a complex query can be wrapped in the OLD DB source SQL command and the output can be directly written to a file or target table. This way, we don't have to design the dataflow to do the Merge joins, lookups, etc etc.

I'm wondering if there is any disadvantage on this approach vs, defining the sql query in SSIS design tool (using the available objects in toolbox). It seems, it takes less time to get the job done and possibly faster to run.

Appreciate if you share your idea on pros and cons of this approach.

Thanks.

Best Answer

I'm having a hard time thinking about any advantage to laying it out in SSIS other than being able to visualize the logic behind the query. Which is something I would expect developers to be able to do by looking at a query, not a design surface, anyway.

The main disadvantage of doing that is that you're effectively overriding/bypassing the Query Optimizer by dictating the physical operations yourself. If cardinality estimates change behind the scenes, the physical operations that exist in the SSIS design surface may no longer be optimal to service the query.

Had the query been serviced all at once through the normal pipeline:

  • You gain the insights and dynamicism of the Query Optimizer
  • It's much easier to develop, debug, and test
  • It's easy to get a sense of what the process is doing (I don't want to have to open 1,000 Property dialogs to figure out where the data is coming from)
  • Plan cache bloat is minimized, since fewer plans will end up there
  • The number of roundtrips to the server is minimized, thus improving performance
  • This is the expected way to do it, which greatly reduces the WTF Factor

I would really only encourage using the SSIS operators to do joins if the data is coming from multiple disparate sources (the real power of SSIS!), or if you can prove that the performance will be better doing it that way (subject to the caveats above).

Even so, I would still attempt to group things as much as possible into single queries (i.e., one query for sales data, one for inventory, etc.). Depending on your environment, all the business and query logic you need could be placed inside stored procedures, views, etc., of the source database(s); this greatly simplifies the amount of SQL code that ends up in SSIS, and I highly recommend doing this if you can.