In SSIS, we can use the OLE DB Source SQL command to run a T-SQL script on the database and then use the output as the source. For example the a complex query can be wrapped in the OLD DB source SQL command and the output can be directly written to a file or target table. This way, we don't have to design the dataflow to do the Merge joins, lookups, etc etc.
I'm wondering if there is any disadvantage on this approach vs, defining the sql query in SSIS design tool (using the available objects in toolbox). It seems, it takes less time to get the job done and possibly faster to run.
Appreciate if you share your idea on pros and cons of this approach.
Thanks.
Best Answer
I'm having a hard time thinking about any advantage to laying it out in SSIS other than being able to visualize the logic behind the query. Which is something I would expect developers to be able to do by looking at a query, not a design surface, anyway.
The main disadvantage of doing that is that you're effectively overriding/bypassing the Query Optimizer by dictating the physical operations yourself. If cardinality estimates change behind the scenes, the physical operations that exist in the SSIS design surface may no longer be optimal to service the query.
Had the query been serviced all at once through the normal pipeline:
I would really only encourage using the SSIS operators to do joins if the data is coming from multiple disparate sources (the real power of SSIS!), or if you can prove that the performance will be better doing it that way (subject to the caveats above).
Even so, I would still attempt to group things as much as possible into single queries (i.e., one query for sales data, one for inventory, etc.). Depending on your environment, all the business and query logic you need could be placed inside stored procedures, views, etc., of the source database(s); this greatly simplifies the amount of SQL code that ends up in SSIS, and I highly recommend doing this if you can.