This isn't fragmentation.
Fragmentation is generated of course, but deletes will simply create "islands" of remaining pages, which is less evil then GUID/clustered key INSERT fragmentation.
If you're PK is an IDENTITY, then CreationDate
should roughly track this so you're actually deleting chunks of contiguous rows anyway.
- Do you have an index on
CreationDate
- Do you have delete cascades?
- Is the TOP 1000 in a single transaction?
For point 3, doing a loop inside a transaction is pointless: is this it?
At some point, a statistics update may be needed if you delete enough rows but I don't think it's that.
Other options:
- why not use TRUNCATE TABLE, wrapped in a stored procedure with EXECUTE AS OWNER
- use SYNONYMs for poor man's partitioning
According to your comment on a deleted post, you load all rows into a java module to conduct a search there. But searching is better done in the database itself - that's what a database is good at. Only return the rows you actually need.
If you really need all rows, there are many little things to make this faster. 1M rows will never be very fast, though.
Postgres 9.2 or later
You can make the index covering by appending fcv_id
:
CREATE INDEX factura_venta_orden
ON factura_venta (fcv_fecha_comprobante, fcv_numero_comprobante, fcv_id);
This way, provided the table isn't updated too much, Postgres can retrieve results with an index-only scan.
The additional column comes last since it does not contribute to the sort order. Explanation:
In Postgres 11 or later you could make that:
CREATE INDEX factura_venta_orden
ON factura_venta (fcv_fecha_comprobante, fcv_numero_comprobante) INCLUDE (fcv_id);
CLUSTER
/ pg_repack
I see you already found CLUSTER
. You are aware that this is a one-time operation, that should help your cause, but needs to be re-run after enough updates?
There is also the community tool pg_repack
as replacement for VACUUM FULL
/ CLUSTER
.
work_mem
This line in your EXPLAIN
output:
Sort Method: external merge Disk: 2928kB
tells us, that sorting is not done in RAM, which is expensive. You could probably improve performance by tuning the according setting for work_mem
work_mem (integer)
Specifies the amount of memory to be used by internal sort operations and hash tables before writing to temporary disk files. ...
Setting this too high may have adverse effects. Read the manual carefully. Consider increasing the setting only for the transaction with the big query:
BEGIN;
SET LOCAL work_mem = '50MB';
SELECT ...;
COMMIT;
50 MB are an estimate based on your EXPLAIN ANALYZE
output for 73k rows. Test with 1M rows to get the actual amount you need.
Best Answer
There really isn't even any guarantee that the optimizer will use the index in the first place. The difference of only one column between the two indexes is (for the most part) trivial. But if it did, any performance gain you would see (if any) would be trivial.
The reason for this is in how the B-Tree index is implemented by SQL Server. Both indexes are equally capable of satisfying the predicate (
WHERE
clause) and therefore locating only the rows that meet the search parameters. The only difference between the two would be how many pages in the index SQL Server would have to read in order to locate the required rows. And how many pages will ultimately be determined by the size of the index rows (Storage size in bytes of(TransactionDate, ClientID)
vs. the alternative, with only the size of theState
column being the difference between the two.Just as an example, if a single index row was 60 bytes, SQL Server would only have to read 2 pages in order to locate a row up until there were around 2.5 million rows in the index. Then it would only require 3 pages to be read until the table reached more than 300 million rows.
So, what does that mean? Whether it is looking through an index of 2 or all 3 columns, SQL Server is still only looking through the same number of pages to locate the rows, unless the
State
column is so large as to create a significant difference in the size of a single row, thus causing the index pages to fill considerably faster. Any noticeable performance difference would only be caused if SQL Server had to read through more levels of the index to satisfy the query (3 levels vs. 4 levels, etc) and a single column of text, especially ifState
is a 2 char state abbreviation, simply won't be enough to make a significant difference.If you really want to boost performance of the query, and the table is quite large, you might be better off exploring table partitioning, or if not on Enterprise Edition perhaps a partitioned view, all depending on whether there are certain ranges that are primarily searched and others that are searched much less often.