I have a source table that looks essentially like this:
- EmployeeCode
- WeekStartDate
- HoursWorkedDay1
- HoursWorkedDay2
- HoursWorkedDay3
- HoursWorkedDay4
- HoursWorkedDay5
- HoursWorkedDay6
- HoursWorkedDay7
The actual table has something like 500 numbered columns (didn't really count them – there are various and numerous fields numbered 1-7, and then another handful numbered 1-25, times 7) per weekday (no, that's not my design), and there are currently something like 38,600 rows (growing every week).
I have an SSIS package that's trying to normalize this data… that currently looks like this:
Each "source" is selecting one set of numbered columns from the same source table, and the UNION ALL component combines the 7 sources into one, resulting in some 258,900 rows.
The rest of the workflow adds some calculated columns, looks up surrogate keys (e.g. EmployeeCode
is used to lookup an EmployeeId
, and then the date is computed and used for looking up a TimeId
), and then the "modified" rows get updated and the "new" ones get inserted into a normalized table; unchanged rows end up nowhere.
Is there any better way (e.g. a bit less heavy on memory pressure) to normalize the source data?
Best Answer
Without the full table definition, it is difficult to provide a perfect answer. However, in an attempt to show the differences in a limited repro, with a very small amount of data, I've created the following testbed:
Below we are comparing the two queries; the first uses the
CROSS APPLY
method, detailed by me at SQLServerScience.com, and the second uses theUNION ALL
method.First thing to note, the
CROSS APPLY
is easier to look at. This already makes me happy.Lets check the execution plans for the two variants:
The
UNION ALL
variant scans the source table 7 times, whereas theCROSS APPLY
uses a single table scan. By using the cross apply, we're #Winning.Let's add more data:
On my system the above code generated around 85,000 rows. The plans for the two queries are now:
SQL Sentry Plan Explorer shows the following summary information, which is invaluable:
This says the CPU is used more intensively by the
CROSS APPLY
, however there is 7 times more I/O used by theUNION ALL
variant.