SQL Server – Data Warehouse Bad Practices to Avoid

best practicesdata-warehousesql serversql-server-2008

In our Datawarehouse context, we have to update some record in fact tables inside our ETL.

One thing we did is to create nonclustered indexes just before the heaviest queries, and drop them afterwards. This lead us to much less time spent scanning tables in queries, and the time spent in building indexes has very low impact (1-2 minutes).

Is this a bad practice?

Note: We cannot partition tables right now.

Best Answer

If it works for you then it's a good practice. There are basically no hard rules for databases.

Related Solutions

Db2 – Should indexes automatically be placed into their own bufferpool

Lets run two hypothetical databases (H1 and H2) in parallel, with the same total amount of RAM for bufferpools (R).

Let H1 have a single bufferpool of size R.
Let H2 have two bufferpools: one of size I for index pages, the other of size D for data pages. (And D+I==R of course.)

The question is:

How do you determine the right values for I and D to make H2 perform better than H1?

My answer is that you can't in general. The database engine for H1 has more room for optimizing its cache than H2. If there are times of day where more index pages would lead to better performance, it can discard unused data pages and have a "mostly index pages" cache. If at a later time, data pages become hotter, it can evict more index pages and have a "mostly data pages" cache.

H2 can't do that. Once it has I pages-worth of index pages cached, it can't cache more even if that's what would have been best right now. It's stuck there with a sub-optimal use of RAM.

The only way for H2 to perform just as well as H1 is if the D/I split initially chosen is ideal, and the workload is very stable. That sure can happen, but I'd wager that it is not a very usual database scenario. If that's not your case, think of H1 as being the same as H2 but with dynamic partitioning of the cache between data and index pages managed directly by the thing that knows the most about how it needs to optimizing I/O (i.e. the database engine).

This is not to say that maintaining different bufferpools is never a good idea.
One scenario for isolating specific pages in a specific cache I've encountered is having a critical "report" that needed to run fast (obviously) at all times, and happened to use a few tables that were pretty much never used elsewhere. So those pages kept getting evicted, and that "report" had runtimes that varied a lot from run to run. Moving a specific set of tables (and their indexes) to a specific pool removed much of the "non-deterministic" performance part.
But that's sub-optimal for the database as a whole, more of a kludge and nearer Voodoo optimization IMO. (And that wasn't on DB2 but I believe that's irrelevant here.)

So my advice is: if you have X Gb of RAM available for cache, use a single buffer, put everything in it, and let the database engine do its magic in there.
If you run into a corner case that would appear to benefit from cache segregation, try it out, benchmark it, think about the overhead of having to maintain the magic numbers for each cache size, and go tune the queries, schema or disk layout instead :)

Sql-server – SSIS Validation slow. OLTP OLAP running on same server

With some offline conversations, Zane was able to determine that if they could keep a connection open, it would hasten the validation.

To that end, I suggested they change the property on the Connection Manager to flip the RetainSameConnection property to True from its default.

Best Answer

Related Solutions

Db2 – Should indexes automatically be placed into their own bufferpool

Sql-server – SSIS Validation slow. OLTP OLAP running on same server

Related Question