The nonclustered index very likely has a different order than the base table, and so inserts/updates to that column will fragment the index. FILLFACTOR
(and PAD_INDEX
) settings only apply when the index is rebuilt, not during normal operations.
What I would suggest is to drop all the nonclustered indexes on the table before loading the new data, and then recreate the indexes after you're done fiddling with the data that would affect them.
Whether or not the fragmentation is an issue in your case is debatable. While the table is fairly wide, eliminating fragmentation of the nonclustered index is only a concern if you need to scan that index from disk, which won't happen if the index is already in memory. The nonclustered index is very small compared to the table. You would need to figure out on your system if there's a measurable difference in query performance before/after rebuilding the index, but as I mentioned above, you may be further ahead to drop/recreate the index anyway, which would eliminate the fragmentation and also let you use a 100% FILLFACTOR
.
There are many general solutions to the problem. Here are three that quickly come to mind.
- SSAS
- VoltDB
- Map Reduce Framework
What you're describing is what would historically fall under an OLAP solution. If you can handle some lag time in your data then perhaps a solution such as SQL Server Analytic Services (SSAS) could provide you a solution. You could read data from these tables, perhaps via some secondary reporting server to not affect your operational database, and then perform these aggregations and processing of data you described.
However, this problem also has some newer solutions we see in the Big Data realm with a specific focus on "Velocity" not necessarily "Volume" of data. Velocity can mean either the processing of relatively static data quickly or the processing of dynamically changing data that is actively being ingested by the DB.
A second option is to allow a contemporary "NewSQL" database engine, such as VoltDB, to do this for you. VoltDB handles a lot of this "indexing" as you referred to it and it does so in real-time as opposed to per-aggregated data that may not be current. There are always performance tradeoffs when performing aggregations across many tables as opposed to a specific table but in my experience an in-memory DB engine, such as Volt, can beat traditional OLAP solutions when considering many metrics including processing speed and management.
Another option is to use the Map Reduce framework many DB engines support today. This implies that you're happy with off-line processing of a problem\query as opposed to real-time processing.
It's not an exhaustive list but hopefully that gives you some other ideas. I know that the triggers you are speaking of can easily spiral out of control due to hotspots in your tables especially if your SQL is not optimized. The one recommendation I will strongly make is to definitely get this processing off of your operational server and do this on a secondary replica\reports server.
Best Answer