Sql-server – Sql server table insert performance optimization

insertjoin;optimizationsql server

Setting

In a datawarehouse, I am joining a fact table to 20 dimensions. The fact table has 32 million rows and 30 columns. This is a temporary staging table so I don't have to deal with other users reading or writing to the table. I select 10 columns from the base table and 20 columns from the respective dimensions. The dimension tables are small (between 3 and 15.000 rows). The fields on which are joined are both integers and nvarchars. I use a SELECT … INTO statement. There are no indexes on the tables.

The execution speed of this query is too slow to be useful.

Tried-out solutions

Because the query takes too long to process, I tried out following solutions:

Split the 20 joins into 4 joins on 5 tables. The query performance remains low however.
Put indexes on the foreign key columns. No significant time decrease.
Make sure the fields of the join condition are integers. I noticed a performance increase of 25%. Not quite what I am searching for.
Use an insert into statement instead of select into. Worse performance because of log file growth although the database is in simple recovery mode.

These findings led me to including the actual execution plan which shows that 89% of the cost lies in the table insert. The other costs are 8% table scan on the fact table and 2% on hash matching for the inner joins.

Questions

What are the possible reasons of the slow table insert?
What are ways to identify this bottleneck without the execution plan?
What actions can I take to reduce the cost of the table insert?

Best Answer

What are the possible reasons of the slow table insert? What are ways to identify this bottleneck without the execution plan?

Read How to analyse SQL Server performance, specially the part about Analyzing individual query execution wait times.

What actions can I take to reduce the cost of the table insert?

That would depend largely on the result of the performance analysis. First and foremost, ensure the SELECT part is as fast as possible. Assuming that problem is the single threaded fully logged insert, some solutions are:

Use partition switch to move 'in' the data. This is, by far, the best solution. Prepare the staging data in a separate staging table, then switch this staging table into the DW table. Read Transferring Data Efficiently by Using Partition Switching.
Make sure the INSERT is minimally logged. Read Operations That Can Be Minimally Logged and Prerequisites for Minimal Logging. Even if you use partition switch operations, is still worth making sure the build of the staging table is minimally logged.
Make sure your IO subsystem is capable of driving a fast load. Read Introducing SSDs.

Related Solutions

Sql-server – Sql server full text search performance with additional conditions

Kin made a couple of suggestions which should be helpful.

First: Make sure that your statistics for the CallTime index are up to date.

Your plan shows that the time filter is being made by seeks to the clustered index. So, for some reason the CallTime index is not being used. What is the definition of that index? If it is a multicolumn index, be sure that the most specific column is first. Example:

IDX_Calls_CallTime
  NOT: CallID, CallTime
  USE: CallTime, CallID

Second: Consider using CONTAINSTABLE.

I doubt that you will ever want to return 1.200.000 rows for "efendim". By using CONTAINSTABLE and ranking you can set the top number of rows that you want, thus reducing the number of FullText results to process to 10, 100, 1000, or whatever fits your needs.

Sql-server – Different execution plans during INSERT at @table variable and #temporary table

The first thing that I noticed is that the query plan compilation time was over 3 seconds for each query. Wow, this is a really complex query!

Because the solution space of potential execution plans is so large (it grows exponentially with the number of the number of objects involved in the query), SQL Server is only going to be able to explore a tiny fraction of the potential query plans when coming up with a plan for these queries. Remember that SQL Server's job isn't to create the best query plan possible, but instead to create a query plan that is good enough and to do so as quickly as possible.

I have often found that small changes in the way a query is formulated, even if they don't impact the logic of the query, can have a significant impact on the query plan. Anecdotally, this grows more and more likely as the query grows more and more complex. One possible reason that this could happen is that a tweak to the query might cause SQL Server to begin cost-based optimization with a different initial plan. As cost-based optimization proceeds, this different starting point could yield a different exploration of the space of potential query plans--kind of like a different random seed impacts random number generation. Note that the query plans you provided are significantly different (compare images of the plan shape below!) and SQL Server actually does estimate that the @table variable plan is slightly cheaper.

In terms of why the table variable vs. temp table would have such an impact on cost-based optimization, I'll hazard an only-partially-educated guess: inserting into a table variable forces a serial plan (see the NonParallelPlanReason of CouldNotGenerateValidParallelPlan that appears in the table variable plan, but not the temp table plan), and this may impact the code path that the query optimizer takes either generating an initial plan or in some phase of plan optimization.

If possible, the next step that I would try is to simplify the query so that fewer tables are used and/or the query is split into multiple queries (with intermediate #temp tables) so that each query is simpler and has better statistics available. If that's not possible, you could also try more hacky options such as using query hints (e.g., force MAXDOP 1 on the temp table query, and see if the plan comes out more like the table variable query).

Query plan with #temp table: enter image description here

Query plan with @table variable: enter image description here

And finally, if you are interested in going a little bit deeper into how the query optimizer works, I have found Paul White's blog to be a great resource!