PostgreSQL – Analyzing Slow Query with Low Memory Usage and High Writes

performancepostgresqlpostgresql-10query-performance

I'm having issues with the following query in PostgreSQL 10.5.

Using pg_activity while the query runs show that only ~400 MB of RAM is used.

I also noticed that the planner estimates seem to be pretty far off.

This is the query:

explain analyze SELECT 
    Event.data AS "Event_data", Event_entity_identifiers.identifier AS "Event_rootIdentifier", 
    expeditions.expedition_code AS "expeditionCode", expeditions.project_id AS "projectId", 
    Sample.data AS "Sample_data", Sample_entity_identifiers.identifier AS "Sample_rootIdentifier", 
    fastqMetadata.data AS "fastqMetadata_data", fastqMetadata_entity_identifiers.identifier AS "fastqMetadata_rootIdentifier" 
FROM network_1.Event AS Event 
    JOIN expeditions ON expeditions.id = Event.expedition_id 
    LEFT JOIN network_1.Sample AS Sample ON Sample.parent_identifier = Event.local_identifier and Sample.expedition_id = Event.expedition_id 
    LEFT JOIN network_1.Tissue AS Tissue ON Tissue.parent_identifier = Sample.local_identifier and Tissue.expedition_id = Sample.expedition_id 
    LEFT JOIN network_1.fastqMetadata AS fastqMetadata ON fastqMetadata.parent_identifier = Tissue.local_identifier and fastqMetadata.expedition_id = Tissue.expedition_id 
    LEFT JOIN entity_identifiers AS Event_entity_identifiers ON Event_entity_identifiers.expedition_id = Event.expedition_id and Event_entity_identifiers.concept_alias = 'Event' 
    LEFT JOIN entity_identifiers AS Sample_entity_identifiers ON Sample_entity_identifiers.expedition_id = Sample.expedition_id and Sample_entity_identifiers.concept_alias = 'Sample' 
    LEFT JOIN entity_identifiers AS fastqMetadata_entity_identifiers ON fastqMetadata_entity_identifiers.expedition_id = fastqMetadata.expedition_id and fastqMetadata_entity_identifiers.concept_alias = 'fastqMetadata' 
WHERE (expeditions.project_id = 2 AND Event.data ? 'urn:decimalLatitude' AND Event.data ? 'urn:decimalLongitude') AND expeditions.public = true 
ORDER BY Event.local_identifier, Event.expedition_id 
OFFSET 0 LIMIT 10000;

Explain analyze results:

Limit  (cost=3158.61..3158.64 rows=12 width=1658) (actual time=104656.385..104692.387 rows=10000 loops=1)
  ->  Sort  (cost=3158.61..3158.64 rows=12 width=1658) (actual time=104656.383..104682.873 rows=10000 loops=1)
        Sort Key: event.local_identifier, event.expedition_id
        Sort Method: external merge  Disk: 40616kB
        ->  Nested Loop Left Join  (cost=1627.71..3158.39 rows=12 width=1658) (actual time=6.389..104128.516 rows=38826 loops=1)
              ->  Nested Loop Left Join  (cost=20.42..40.63 rows=1 width=625) (actual time=1.329..105.572 rows=5650 loops=1)
                    ->  Nested Loop  (cost=20.15..32.26 rows=1 width=607) (actual time=1.300..61.179 rows=5650 loops=1)
                          ->  Bitmap Heap Scan on event  (cost=20.00..24.02 rows=1 width=588) (actual time=1.286..17.407 rows=6025 loops=1)
                                Recheck Cond: ((data ? 'urn:decimalLatitude'::text) AND (data ? 'urn:decimalLongitude'::text))
                                Heap Blocks: exact=793
                                ->  Bitmap Index Scan on idx_network_1_event_data  (cost=0.00..20.00 rows=1 width=0) (actual time=1.168..1.169 rows=6025 loops=1)
                                      Index Cond: ((data ? 'urn:decimalLatitude'::text) AND (data ? 'urn:decimalLongitude'::text))
                          ->  Index Scan using expeditions_pkey on expeditions  (cost=0.14..8.16 rows=1 width=23) (actual time=0.004..0.004 rows=1 loops=6025)
                                Index Cond: (id = event.expedition_id)
                                Filter: (public AND (project_id = 2))
                                Rows Removed by Filter: 0
                    ->  Index Scan using entitiy_identifiers_expediton_id_concept_alias_uniq on entity_identifiers event_entity_identifiers  (cost=0.28..8.30 rows=1 width=22) (actual time=0.004..0.004 rows=1 loops=5650)
                          Index Cond: ((expedition_id = event.expedition_id) AND (concept_alias = 'Event'::text))
              ->  Nested Loop Left Join  (cost=1607.29..3117.75 rows=1 width=1060) (actual time=3.372..18.399 rows=7 loops=5650)
                    ->  Nested Loop Left Join  (cost=1607.01..3117.36 rows=1 width=1046) (actual time=3.369..18.374 rows=7 loops=5650)
                          ->  Nested Loop Left Join  (cost=1606.74..3117.06 rows=1 width=1028) (actual time=3.360..18.316 rows=7 loops=5650)
                                ->  Nested Loop Left Join  (cost=1606.46..3109.90 rows=1 width=480) (actual time=3.325..18.079 rows=7 loops=5650)
                                      ->  Bitmap Heap Scan on sample  (cost=1606.05..2651.62 rows=1 width=476) (actual time=2.389..2.498 rows=7 loops=5650)
                                            Recheck Cond: (expedition_id = event.expedition_id)
                                            Filter: (parent_identifier = event.local_identifier)
                                            Rows Removed by Filter: 430
                                            Heap Blocks: exact=392809
                                            ->  Bitmap Index Scan on idx_network_1_sample_local_identifier_expedition_id  (cost=0.00..1606.05 rows=312 width=0) (actual time=2.334..2.334 rows=437 loops=5650)
                                                  Index Cond: (expedition_id = event.expedition_id)
                                      ->  Index Scan using idx_network_1_tissue_local_identifier_expedition_id on tissue  (cost=0.41..458.27 rows=1 width=22) (actual time=1.067..2.262 rows=1 loops=38826)
                                            Index Cond: (expedition_id = sample.expedition_id)
                                            Filter: (parent_identifier = sample.local_identifier)
                                            Rows Removed by Filter: 484
                                ->  Index Scan using idx_network_1_fastqmetadata_local_identifier_expedition_id on fastqmetadata  (cost=0.28..7.15 rows=1 width=572) (actual time=0.031..0.031 rows=0 loops=38826)
                                      Index Cond: (expedition_id = tissue.expedition_id)
                                      Filter: (parent_identifier = tissue.local_identifier)
                                      Rows Removed by Filter: 3
                          ->  Index Scan using entitiy_identifiers_expediton_id_concept_alias_uniq on entity_identifiers sample_entity_identifiers  (cost=0.28..0.30 rows=1 width=22) (actual time=0.005..0.005 rows=1 loops=38826)
                                Index Cond: ((expedition_id = sample.expedition_id) AND (concept_alias = 'Sample'::text))
                    ->  Index Scan using entitiy_identifiers_expediton_id_concept_alias_uniq on entity_identifiers fastqmetadata_entity_identifiers  (cost=0.28..0.39 rows=1 width=22) (actual time=0.001..0.001 rows=0 loops=38826)
                          Index Cond: ((expedition_id = fastqmetadata.expedition_id) AND (concept_alias = 'fastqMetadata'::text))
Planning time: 6.090 ms
Execution time: 104704.794 ms

Edit:

CREATE TABLE public.expeditions (
    id integer NOT NULL,
    project_id integer NOT NULL,
    expedition_code text NOT NULL,
    public boolean DEFAULT true NOT NULL,
);
Indexes:
    "expeditions_pkey" PRIMARY KEY, btree (id)
    "expeditions_code_project_id_uniq" UNIQUE CONSTRAINT, btree (expedition_code, project_id)
    "expeditions_project_id_idx" btree (project_id)

CREATE TABLE public.entity_identifiers (
    id integer NOT NULL,
    expedition_id integer NOT NULL,
    concept_alias text NOT NULL,
    identifier text NOT NULL
);
Indexes:
    "entitiy_identifiers_expediton_id_concept_alias_uniq" UNIQUE CONSTRAINT, btree (expedition_id, concept_alias)
    "entity_identifiers_expedition_id" btree (expedition_id)

CREATE TABLE network_1.event (
    id integer NOT NULL,
    local_identifier text NOT NULL,
    expedition_id integer NOT NULL,
    data jsonb NOT NULL,
);
Indexes:
    "event_pkey" PRIMARY KEY, btree (id)
    "idx_network_1_event_local_identifier_expedition_id" UNIQUE CONSTRAINT, btree (local_identifier, expedition_id)
    "idx_network_1_event_data" gin (data)
Foreign-key constraints:
    "event_expedition_id_fkey" FOREIGN KEY (expedition_id) REFERENCES expeditions(id) ON DELETE CASCADE

CREATE TABLE network_1.sample (
    id integer NOT NULL,
    local_identifier text NOT NULL,
    expedition_id integer NOT NULL,
    data jsonb NOT NULL,
    parent_identifier text NOT NULL
);
Indexes:
    "sample_pkey" PRIMARY KEY, btree (id)
    "idx_network_1_sample_local_identifier_expedition_id" UNIQUE CONSTRAINT, btree (local_identifier, expedition_id)
    "idx_network_1_sample_data" gin (data)
Foreign-key constraints:
    "network_1_sample_parent_fkey" FOREIGN KEY (parent_identifier, expedition_id) REFERENCES network_1.event(local_identifier, expedition_id) ON DELETE CASCADE
    "sample_expedition_id_fkey" FOREIGN KEY (expedition_id) REFERENCES expeditions(id) ON DELETE CASCADE

CREATE TABLE network_1.tissue (
    id integer NOT NULL,
    local_identifier text NOT NULL,
    expedition_id integer NOT NULL,
    data jsonb NOT NULL,
    parent_identifier text NOT NULL
);
Indexes:
    "tissue_pkey" PRIMARY KEY, btree (id)
    "idx_network_1_tissue_local_identifier_expedition_id" UNIQUE CONSTRAINT, btree (local_identifier, expedition_id)
    "idx_network_1_tissue_data" gin (data)
Foreign-key constraints:
    "network_1_tissue_parent_fkey" FOREIGN KEY (parent_identifier, expedition_id) REFERENCES network_1.sample(local_identifier, expedition_id) ON DELETE CASCADE
    "tissue_expedition_id_fkey" FOREIGN KEY (expedition_id) REFERENCES expeditions(id) ON DELETE CASCADE

CREATE TABLE network_1.fastqmetadata (
    id integer NOT NULL,
    local_identifier text NOT NULL,
    expedition_id integer NOT NULL,
    data jsonb NOT NULL,
    parent_identifier text NOT NULL
);
Indexes:
    "fastqmetadata_pkey" PRIMARY KEY, btree (id)
    "idx_network_1_fastqmetadata_local_identifier_expedition_id" UNIQUE CONSTRAINT, btree (local_identifier, expedition_id)
    "idx_network_1_fastqmetadata_data" gin (data)
Foreign-key constraints:
    "fastqmetadata_expedition_id_fkey" FOREIGN KEY (expedition_id) REFERENCES expeditions(id) ON DELETE CASCADE
    "network_1_fastqmetadata_parent_fkey" FOREIGN KEY (parent_identifier, expedition_id) REFERENCES network_1.tissue(local_identifier, expedition_id) ON DELETE CASCADE

Best Answer

I don't think there is much you can do about the bad estimates, because of the lack of statistics gathered about the internal structure of jsonb data. In particular, the prevalence of 'urn:decimalLatitude' and 'urn:decimalLongitude' and the correlation between them.

But you should be able to speed up the query while still having bad estimates by adding an index:

->  Index Scan using idx_network_1_tissue_local_identifier_expedition_id on tissue  (cost=0.41..458.27 rows=1 width=22) (actual time=1.067..2.262 rows=1 loops=38826)
    Index Cond: (expedition_id = sample.expedition_id)
    Filter: (parent_identifier = sample.local_identifier)
    Rows Removed by Filter: 484

This index scan should much more efficient if you had an index on (parent_identifier, expedition_id), or on (expedition_id, parent_identifier)