Given the table description, I see
- 66 bytes per row of data
- 4 bytes per row for the primary key
- 7 bytes per row for country code index
- 3 bytes for the country
- 4 bytes for Clustered Key attached to the country code
- Total of 77 bytes of data and keys
- This does not factoring housekeeping for BTREEs or Tablespace Fragmentation
For a million rows, that would 77,000,000 bytes (73.43 MB)
As for measuring the table, for a given table mydb.mytable, you can run this query
SELECT
CONCAT(FORMAT(DAT/POWER(1024,pw1),2),' ',SUBSTR(units,pw1*2+1,2)) DATSIZE,
CONCAT(FORMAT(NDX/POWER(1024,pw2),2),' ',SUBSTR(units,pw2*2+1,2)) NDXSIZE,
CONCAT(FORMAT(TBL/POWER(1024,pw3),2),' ',SUBSTR(units,pw3*2+1,2)) TBLSIZE
FROM
(
SELECT DAT,NDX,TBL,IF(px>4,4,px) pw1,IF(py>4,4,py) pw2,IF(pz>4,4,pz) pw3
FROM
(
SELECT data_length DAT,index_length NDX,data_length+index_length TBL,
FLOOR(LOG(IF(data_length=0,1,data_length))/LOG(1024)) px,
FLOOR(LOG(IF(index_length=0,1,index_length))/LOG(1024)) py,
FLOOR(LOG(data_length+index_length)/LOG(1024)) pz
FROM information_schema.tables
WHERE table_schema='mydb'
AND table_name='mytable'
) AA
) A,(SELECT 'B KBMBGBTB' units) B;
To measure all tables grouped by Database and Storage Engine
SELECT
IF(ISNULL(DB)+ISNULL(ENGINE)=2,'Database Total',
CONCAT(DB,' ',IFNULL(ENGINE,'Total'))) "Reported Statistic",
LPAD(CONCAT(FORMAT(DAT/POWER(1024,pw1),2),' ',
SUBSTR(units,pw1*2+1,2)),17,' ') "Data Size",
LPAD(CONCAT(FORMAT(NDX/POWER(1024,pw2),2),' ',
SUBSTR(units,pw2*2+1,2)),17,' ') "Index Size",
LPAD(CONCAT(FORMAT(TBL/POWER(1024,pw3),2),' ',
SUBSTR(units,pw3*2+1,2)),17,' ') "Total Size"
FROM
(
SELECT DB,ENGINE,DAT,NDX,TBL,
IF(px>4,4,px) pw1,IF(py>4,4,py) pw2,IF(pz>4,4,pz) pw3
FROM
(SELECT *,
FLOOR(LOG(IF(DAT=0,1,DAT))/LOG(1024)) px,
FLOOR(LOG(IF(NDX=0,1,NDX))/LOG(1024)) py,
FLOOR(LOG(IF(TBL=0,1,TBL))/LOG(1024)) pz
FROM
(SELECT
DB,ENGINE,
SUM(data_length) DAT,
SUM(index_length) NDX,
SUM(data_length+index_length) TBL
FROM
(
SELECT table_schema DB,ENGINE,data_length,index_length FROM
information_schema.tables WHERE table_schema NOT IN
('information_schema','performance_schema','mysql')
AND ENGINE IS NOT NULL
) AAA GROUP BY DB,ENGINE WITH ROLLUP
) AAA) AA) A,(SELECT ' BKBMBGBTB' units) B;
Run these queries and you can track changes in database/engine disk usage.
Give it a Try !!!
Explanation
My question is: why does this not use the index amplifier_saturation_start
?
Even with 30,000,000 rows, only 3,500 in the date range
it can be faster to read tuples from the top of the index amplifier_saturation_lddate
on lddate
. The first row that passes the filter on start
can be returned as is. No sort step needed. With a perfectly random distribution a little below 9000 index tuples would have to be checked on average.
Using amplifier_saturation_start
, Postgres would still have to determinemax(lddate)
after fetching all 3500 qualifying rows. Close call. The decision depends on gathered statistics and your cost settings. Depending on data distribution and other details one or the other will be faster, and one or the other will be expected to be faster.
Better index
This can be substantially faster with a multicolumn index on (lddate, start)
like you found yourself already. This way Postgres can use an index-only scan and not touch the heap (the table) at all.
But there is another minor thing you can improve. Did you wonder about of this detail in your EXPLAIN
output?
Index Cond: ((lddate IS NOT NULL) AND ...
Why does Postgres have to exclude NULL values?
Because NULL sorts after the greatest value in ASCENDING
, or before in DESCENDING
order. The maximum non-null value which is returned by the aggregate function max()
is not at the beginning / end of the index if there are NULL values. Adding NULLS LAST | FIRST
adjusts the sort order to the characteristic of max()
(and makes the opposite min()
more expensive). Since we are mostly interested in the latest timestamp, DESC NULLS LAST
is the better choice.
CREATE INDEX tmp_as ON bss.amplifier_saturation (lddate DESC NULLS LAST, start);
Now, your table column lddate
obviously doesn't have NULL values, being defined NOT NULL
. The effect on performance will be negligible in this particular case. Still worth mentioning for cases that can have NULL.
The other index option would be on (start, lddate)
, basically a pimped amplifier_saturation_start
index, which also would allow index-only scans. Depending on data distribution and actual parameter values in your query one or the other will be faster.
Two notes concerning timestamp
- Your table columns are
timestamptz
, but your query predicates use timestamp
literals. Postgres derives the time zone from your current timezone
setting and adjusts accordingly. This may or may not be as intended. It certainly makes the query volatile - depending on a setting of your session. It would be problematic for a call that could be made from different time zones (with differing sessions settings). Then you'd rather use an explicit offset or the AT TIME ZONE
construct to make it stable. Details:
- You typically would want to exclude the upper bound for correctness.
<
instead of <=
.
select max(lddate)
from bss.amplifier_saturation
where start >= '1987-12-31 00:00:00'::timestamp AT TIME ZONE 'PST'
and start < '1988-04-09 00:00:00 PST'::timestamptz; -- shorter
PST
(pacific standard time) being a random example time zone.
Best Answer
To "dive" is to drill down the BTree from the 'root' down to the leaf node for a particular key.
Let's say you have a non-unique
INDEX(x)
. Further, let's say that there are 100 rows withx=123
.WHERE x = 123
, the two dives would be to "find the first index entry forx=123
" and to find the "last" such row.His first use of "accurate" (in the article) was to say that the dives are more accurate than using statistics that may be stale.
I suspect his use of "accurate" is an exaggeration in the other uses. I say this because I cannot believe that counting the rows between the first and last dive is no more than an estimation. To get the exact number would require fetching each block -- this is too costly.
Note the 100 rows might be in the same block, or might (if the rows are wide) be spread across multiple blocks. For the purposes of
EXPLAIN
and query planning, an estimate is considered "good enough". The exact ("accurate") number can be quickly counted if the start and end are in the same block, but, if spread across multiple blocks, I think then it would only estimate the row counts for the intermediate blocks.The "dives", even if they involve a disk-read, are usually not wasted. This is because you will (presumably) then proceed to perform the query and need those block(s) anyway.
Side note: Since Jørgen Løland wrote that article, the Optimizer has employed a "cost basis" algorithm. However, the "dives" are still important in it.