Sql-server – Why would I not run “index reorganization” regularly

azure-sql-databasedisk-spaceindexsql server

I tried index reorganization queries several times so far. They look like this:

ALTER INDEX ALL ON TableName REORGANIZE

Such query may take hours but it looks like it doesn't interfere with other database operations. For some table its effect is next to nothing but for some tables it actually saves like 15% of overall database space.

What if I just run this query for all indices every week or so? Is there any reason not to do so?

Best Answer

Reorganizing an Index should be done when you have elevated amounts of white space within your index (i.e. avg_page_space_used_in_percent column in the sys.dm_db_index_physical_stats DMV). Many people (and even solutions) in the SQL Server community say you need to look at Index Fragmentation levels (e.g. the avg_fragmentation_in_percent in that same DMV), but this is not as much a concern depending on the type of underlying storage (e.g. anything that's RAIDed fragments your data for redundancy purposes regardless). Others also have come to the same conclusion.

The downsides to Reorganizing your Indexes when not needed is that this operation will purge portions of your cache so it can have sufficient room to perform the reorg within memory. Reorgs aren't as destructive to your cache as Rebuild operations, but they will reduce your Page Life Expectancy (PLE) and potentially purge useful data that you will have to re-read from disk (e.g. increased I/O operations). Additionally, Reorg operations do NOT update Index Statistics. Updated Statistics do more to produce optimal executions plans than defragmented indexes, so if you're going to mindlessly run a nightly operation I would say Updating Stats is the better choice over Reorging instances.

The right answer though is to have a solution check your White Space thresholds and reorg/rebuild indexes when needed. Have another process check your statistics and update when needed. There are a few solutions out there, such as Ola's maintenance solution, Minionware's Reindex solution, etc. But I often find myself customizing solutions to fit my needs.

Related Solutions

Sql-server – Index defrag takes forever to run

You should approach index reorganize performance troubleshooting problems the same way you approach any performance troubleshooting problem: identify the waits and resources, using a methodology like waits and queues for example. Once you do the due diligence and collect the data (perf counters, wait stats etc), if you are still unable to find the issue we can help, rather than ask us to roll the 8 ball and guess (and more importantly, rather than you working by guess).

Next, there are many maintenance scripts out there that are a heck of a lot better than reorganizing ad-nauseam every night, like Ola Hallengren or Michelle Ufford just to give two examples.

Mysql – Why did adding a secondary index not increase select performance

I'll skip my comments about that CREATE TABLE statement that would be classified from wrong to insulting, like having all columns declared NULL (what would a row with (NULL, NULL, NULL, NULL) mean?) and concentrate on the question asked.

The difference between 8.6 and 8.9 seconds is negligible. For all we know, the running times of the 2 tests are almost identical. If you want to test properly, you should run longer tests and with larger size of the table (try with 15K, 150K, 1500K, 15M rows) and see whether the efficiency changes or stays the same between having or not the specific index.

You should also examine the execution plans created for the queries in the 2 cases (at various table sizes), to see what is happening behind the scenes, what indexes are used, if any, etc.

My prediction would be that as the table grows larger, the index will get more useful, as the queries will require an index seek to find the (IDs of the) rows in the wanted time range and then some additional seeks from the clustering PK index (I assume you use InnoDB.)

Without the index, a whole table scan will have to be done each time. The larger the table gets, the biggest the difference you'll see between the 2 test cases (with and without the index).

If the vast majority of your queries is similar to the following, getting rows from a small time range:

SELECT <columns_wanted>
FROM mssg
WHERE created_at >= @start_timestamp
  AND created_at < @end_timestamp ;

then I would suggest (assuming you use InnoDB) an alternative design: make the (created_at) the clustering key of the table. Since you may have 2 or more rows with exact same timestamp, you can set the primary key to (create_at, id) for this effect (as InnoDB allows only unique keys for clustering). You will also need an additional index on (id) for the AUTO_INCREMENT to be allowed. This can be declared UNIQUE or not, it's your choice:

CREATE TABLE mssg (
  mssg_id INTEGER NOT NULL AUTO_INCREMENT,
  body MEDIUMTEXT NULL,
  length VARCHAR(20) NULL,
  created_at TIMESTAMP NOT NULL DEFAULT CURRENT_TIMESTAMP,
  PRIMARY KEY (created_at, mssg_id),
  -- UNIQUE
    INDEX mssg_id_ix (mssg_id)             -- this is needed for the AutoIncrement
);

This way, all the rows that your queries need will be stored in consecutive pages of the clustering (PK) index and the efficiency of the queries - as long as they ask for a small time range - will not be affected by the size of the table.

Further comments, partially related to the question;

Why are all columns declared as NULL? I suggest you change them all to NOT NULL except for those columns that there is a specific reason to allow nulls.
Why mediumtext for the message? Are you going to store so large messages? It might be better for performance, if you could reduce the size, to say VARCHAR(250).
Why is the length declared as VARCHAR and not INTEGER?
Further, if it's going to be storing the length of the text, it is not needed at all. You can get the length at any time using the proper text function.
You could declare the TIMESTAMP column to get a default CURRENT_TIMESTAMP if you want (if it is not already provided by the application.)

Best Answer

Related Solutions

Sql-server – Index defrag takes forever to run

Mysql – Why did adding a secondary index not increase select performance

Related Question