According to your current explanation indexes are not going to help much (if at all) with your current query.
So there are 3.25M rows with a project key.
That's also the total number of rows, so this predicate is true
for (almost) every row ... and not selective at all. But there is no useful index for the jsonb
column "references"
. Including it in the btree index on ("createdAt", "references", type)
is just pointless.
Even if you had a generally more useful GIN index on "reference"
like:
CREATE INDEX stats_references_gix ON stats USING gin ("references");
... Postgres would still have no useful statistics about individual keys inside the jsonb
column.
There are only 5 distinct values for the type
Your query selects all of one type and an unknown fraction of another type. That's an estimated 20 - 40 % of all rows. A sequential scan is most certainly going to be fastest plan. Indexes start to make sense for around 5 % of all rows or less.
To test, you can force a possible index by setting for debugging purposes in your session:
SET enable_seqscan = off;
Reset with:
RESET enable_seqscan;
You'll see slower queries ...
You group by project values:
GROUP BY "references"->> 'project'
And:
There are exactly 400,000 distinct values for the project reference.
That's 8 rows per project on average. Depending on value frequencies we still have to retrieve an estimated 3 - 20 % of all rows if we only picked min and max per project in a LATERAL subquery ...
Try this index, it makes more sense than what you have now:
CREATE INDEX stats_special_idx ON stats (type, ("references" ->> 'project'), "createdAt")
WHERE "references" ? 'project';
Postgres might still fall back to a sequential scan ...
More might be done with a normalized schema / more selective criteria / a smarter query that only picks min and max "createdAt"
...
Query
I would write your query like this:
SELECT EXTRACT(EPOCH FROM (MAX("createdAt") - MIN("createdAt")))
FROM stats
WHERE "references" ? 'project'
AND (type = 'event1' OR
type = 'event2'
AND "createdAt" >= '2015-11-02 00:00:00+08:00' -- I guess you want this
AND "createdAt" < '2015-12-04 00:00:00+08:00'
)
GROUP BY "references"->> 'project'; -- don't cast
Notes
Don't cast here:
stats."references"::jsonb ? 'project'
The column is jsonb
already, you gain nothing. If the predicate was selective, index usage might be prohibited by the cast.
Your predicates on "createdAt"
are probably incorrect at lower and upper bound. To include whole days consider my suggested alternative.
references
is a reserved word, so you have to always double-quote it. Don't use it as identifier. Similar for double-quoted CaMeL-case names like "createdAt"
either. Allowed, but error prone, needless complication.
type
type character varying(255) NOT NULL,
The type is a simple string no longer than 50 characters.
There are only 5 distinct values for the type field, this would probably be no more than a few hundred in production.
None of this seems to make sense.
varchar(255)
in itself hardly ever makes any sense. 255 characters is an arbitrary limit without significance in Postgres.
- If it's no longer than 50 chars, then the limit of 255 makes even less sense.
- In a properly normalized design you would have a small
integer
column type_id
(referencing a small type
table) which occupies only 4 bytes per row and makes indexes smaller and faster.
Ideally, you would have a project
table, listing all projects and another small integer FK column project_id
in stats
. Would make any such query faster. And for selective criteria, much faster queries would be possible - even without the suggested normalization. Along these lines:
Optimize GROUP BY query to retrieve latest record per user
I think you found the main culprit alright. There are more ways to miss rows with this setup. Consider D3 I added to your time diagram:
D1 |st1....................|
D2 |st2......|
D3 |st3......|
R |.......|
^ -- read query starts here
Transaction D3 starts after D2, but since it's not committed when your read query in transaction R starts, it is invisible as well. It will be included in the next page, but it may still be confusing that it's missing this time. That's how the default transaction isolation level Read Committed is defined. You seem to be aware of it, but for the general public:
When a transaction uses this isolation level, a SELECT
query (without
a FOR UPDATE
/SHARE
clause) sees only data committed before the query
began; it never sees either uncommitted data or changes committed
during query execution by concurrent transactions. In effect, a SELECT
query sees a snapshot of the database as of the instant the query
begins to run.
There is more. Read the manual.
With concurrent access to the DB, various conflicts might stall a DELETE
query - which will patiently wait until it can get an exclusive lock on the rows to delete. 10 minutes of missed rows would indicate a (possibly unrelated) problem, I agree with your suspicion. Transactions that remain "idle in transaction" are prime suspects. To diagnose, look for log long running statements or transactions. Some pointers:
Database design
You seem to be using data type timestamp
. Your INSERT
with statement_timestamp() AT TIME ZONE 'UTC'
should work correctly for your purpose, but you shouldn't rely on INSERT
statements to get this right. Rather move the logic to the table itself. Use the data type timestamptz
(timestamp with time zone
) for tombstone.delete_timestamp
to begin with. This rules out any discrepancies with time zones or DST (daylight saving time). Details:
Also don't rely on the INSERT
statement to provide the correct time. Put the logic in the column default (or even in a trigger BEFORE INSERT
to overwrite any provided timestamp).
CREATE TABLE tombstone (
id int NOT NULL PRIMARY KEY -- assuming PK
, delete_timestamp timestamptz DEFAULT statement_timestamp()
);
Since timestamps are not visible until the transaction is committed anyway, the cheaper current_timestamp
(transaction timestamp) may be just as useful as statement_timestamp()
- depends on unknown details.
The simpler INSERT
statement in the trigger to go with the above:
INSERT INTO tombstone (id) VALUES (OLD.id);
Query
Your current SELECT
query can be faster comparing row values, which is also index-backed in Postgres:
SELECT id, delete_timestamp
FROM tombstone
WHERE (delete_timestamp, id) > (@sinceUTC, @id)
ORDER BY delete_timestamp, id
LIMIT @count;
Details:
Secure alternative
All this aside, timestamps are just not completely reliable for queuing or paging in a multi-user environment. If you need it to be completely reliable, consider a different approach: Lock unread rows before retrieving and mark them as read. The new FOR UPDATE SKIP LOCKED
in Postgres 9.5 may prove useful for this:
Or, if the reading is done by a single user, you don't need locks. Just mark rows you have read. A partial index for unmarked rows makes reading fast.
Upgrade to current Postgres
Postgres 9.2 is getting old. Upgrading to the current version might help for various reasons.
Best Answer
So, I got it figured out and I'm leaving the answer for posterity.
Essentially, remove the
*
from the query and add a LIMIT 1 and it all appears to work. In addition, as @ypercubetm stated, you can use AND in place of OR, but it depends on your usecase.returns a single, latest time.