Postgresql query speed

postgresql

This is my first time seriously attempting to get into the guts of the speed of a query on Postgres, and I was hoping for some more explanation of exactly what is going on. I am using Postgres v 9.1 on Windows x64 with 8 GB of RAM. I've updated my postgres.conf, changing the following relevant lines

effective_cache_size = 2GB
shared_buffers = 2GB            # min 128kB
work_mem = 256MB                # min 64kB
default_statistics_target = 100 # range 1-10000

My query looks like the following

SELECT MAX(fraction) AS last_fraction FROM (
SELECT object_mqs.id as object_id, dose_hsts.fractions_tx as fraction, MIN(multi_record_statuses.sanction_dttm) AS approved,
MIN(dose_hsts.tx_dttm) AS treated FROM "dose_hsts"
LEFT JOIN sites ON dose_hsts.site_id = sites.id
LEFT JOIN tx_plan_assocs ON sites.record_set_id = tx_plan_assocs.record_set_id
LEFT JOIN object_mqs ON tx_plan_assocs.object_mq_id = object_mqs.id
LEFT JOIN multi_record_statuses ON object_mqs.id = multi_record_statuses.object_mq_id
WHERE "dose_hsts"."institution_id" = 4 AND "dose_hsts"."was_qa_mode" = 'f'
AND ("dose_hsts"."tx_dttm" BETWEEN '2012-04-13 07:24:14.000000' AND '2012-04-20 07:24:14.000000')
AND (dose_hsts.fractions_tx > 0) AND (multi_record_statuses.staff_type = 'Doctor' OR multi_record_statuses.staff_type IS NULL)
AND (object_mqs.status_enum != 1) AND "object_mqs"."file_format" = 7
GROUP BY object_mqs.id, dose_hsts.fractions_tx ORDER BY object_mqs.id
) AS dose_hsts WHERE (approved >= treated OR approved IS NULL) AND (treated IS NOT NULL) GROUP BY object_id

I've already gone ahead and run explain (analyze, buffers), and the output can be seen, parsed here. I'm happy to supply any more information that is requested.

Since this is my first time really trying to squeeze performance out of Postgres, I'm hoping to get more information on exactly what is happening in my query. It looks like the most time is spent in an index scan of the dose_hsts table, and I really want to know if I can make this faster. If it makes any difference, the dose_hsts table has 170976 rows in it, but it will be growing by a lot, how much of a difference does this make? Also this query takes about 100 ms to run and returns only 5 rows.

EDIT

here is the output with EXPLAIN (ANALYZE, VERBOSE) on Depesz, and below is the raw output:

"GroupAggregate  (cost=716.73..716.85 rows=2 width=6) (actual time=55.117..55.194 rows=5 loops=1)"
"  Output: max(public.dose_hsts.fractions_tx), object_mqs.id"
"  ->  GroupAggregate  (cost=716.73..716.80 rows=2 width=22) (actual time=55.013..55.187 rows=17 loops=1)"
"        Output: object_mqs.id, public.dose_hsts.fractions_tx, min(multi_record_statuses.sanction_dttm), min(public.dose_hsts.tx_dttm)"
"        Filter: (((min(multi_record_statuses.sanction_dttm) >= min(public.dose_hsts.tx_dttm)) OR (min(multi_record_statuses.sanction_dttm) IS NULL)) AND (min(public.dose_hsts.tx_dttm) IS NOT NULL))"
"        ->  Sort  (cost=716.73..716.74 rows=2 width=22) (actual time=54.992..55.005 rows=618 loops=1)"
"              Output: object_mqs.id, public.dose_hsts.fractions_tx, multi_record_statuses.sanction_dttm, public.dose_hsts.tx_dttm"
"              Sort Key: object_mqs.id, public.dose_hsts.fractions_tx"
"              Sort Method: quicksort  Memory: 73kB"
"              ->  Nested Loop  (cost=581.41..716.72 rows=2 width=22) (actual time=11.445..54.589 rows=618 loops=1)"
"                    Output: object_mqs.id, public.dose_hsts.fractions_tx, multi_record_statuses.sanction_dttm, public.dose_hsts.tx_dttm"
"                    ->  Hash Join  (cost=581.41..648.17 rows=12 width=16) (actual time=8.551..10.224 rows=1734 loops=1)"
"                          Output: sites.id, object_mqs.id, multi_record_statuses.sanction_dttm"
"                          Hash Cond: (sites.record_set_id = tx_plan_assocs.record_set_id)"
"                          ->  Seq Scan on public.sites  (cost=0.00..56.66 rows=2666 width=8) (actual time=0.004..0.277 rows=2666 loops=1)"
"                                Output: sites.id, sites.patient_id, sites.site_name, sites.technique, sites.dose_ttl, sites.dose_tx, sites.fractions, sites.record_set_id, sites.created_at, sites.updated_at"
"                          ->  Hash  (cost=581.28..581.28 rows=10 width=16) (actual time=8.527..8.527 rows=1328 loops=1)"
"                                Output: tx_plan_assocs.record_set_id, object_mqs.id, multi_record_statuses.sanction_dttm"
"                                Buckets: 1024  Batches: 1  Memory Usage: 63kB"
"                                ->  Nested Loop Left Join  (cost=448.80..581.28 rows=10 width=16) (actual time=1.974..8.001 rows=1422 loops=1)"
"                                      Output: tx_plan_assocs.record_set_id, object_mqs.id, multi_record_statuses.sanction_dttm"
"                                      Filter: (((multi_record_statuses.staff_type)::text = 'Doctor'::text) OR (multi_record_statuses.staff_type IS NULL))"
"                                      ->  Hash Join  (cost=448.80..487.77 rows=110 width=8) (actual time=1.963..2.887 rows=1630 loops=1)"
"                                            Output: tx_plan_assocs.record_set_id, object_mqs.id"
"                                            Hash Cond: (tx_plan_assocs.object_mq_id = object_mqs.id)"
"                                            ->  Seq Scan on public.tx_plan_assocs  (cost=0.00..29.58 rows=1658 width=8) (actual time=0.002..0.164 rows=1658 loops=1)"
"                                                  Output: tx_plan_assocs.id, tx_plan_assocs.object_mq_id, tx_plan_assocs.record_set_id, tx_plan_assocs.pat_c_plan_id, tx_plan_assocs.created_at, tx_plan_assocs.updated_at"
"                                            ->  Hash  (cost=417.15..417.15 rows=2532 width=4) (actual time=1.950..1.950 rows=2521 loops=1)"
"                                                  Output: object_mqs.id"
"                                                  Buckets: 1024  Batches: 1  Memory Usage: 89kB"
"                                                  ->  Bitmap Heap Scan on public.object_mqs  (cost=51.97..417.15 rows=2532 width=4) (actual time=0.287..1.354 rows=2521 loops=1)"
"                                                        Output: object_mqs.id"
"                                                        Recheck Cond: (object_mqs.file_format = 7)"
"                                                        Filter: (object_mqs.status_enum <> 1)"
"                                                        ->  Bitmap Index Scan on index_object_mqs_on_file_format  (cost=0.00..51.34 rows=2545 width=0) (actual time=0.230..0.230 rows=2575 loops=1)"
"                                                              Index Cond: (object_mqs.file_format = 7)"
"                                      ->  Index Scan using index_multi_record_statuses_on_object_mq_id on public.multi_record_statuses  (cost=0.00..0.82 rows=2 width=21) (actual time=0.001..0.002 rows=2 loops=1630)"
"                                            Output: multi_record_statuses.id, multi_record_statuses.sanction_dttm, multi_record_statuses.staff_type, multi_record_statuses.is_required, multi_record_statuses.object_mq_id, multi_record_statuses.site_setup_hst_id, multi_record_statuses.status_enum, multi_record_statuses.created_at, multi_record_statuses.updated_at"
"                                            Index Cond: (object_mqs.id = multi_record_statuses.object_mq_id)"
"                    ->  Index Scan using index_dose_hsts_on_site_id on public.dose_hsts  (cost=0.00..5.70 rows=1 width=14) (actual time=0.024..0.025 rows=0 loops=1734)"
"                          Output: public.dose_hsts.id, public.dose_hsts.patient_id, public.dose_hsts.institution_id, public.dose_hsts.site_id, public.dose_hsts.tx_field_id, public.dose_hsts.tx_dttm, public.dose_hsts.was_qa_mode, public.dose_hsts.partially_treated, public.dose_hsts.was_overridden, public.dose_hsts.meterset, public.dose_hsts.is_port_field, public.dose_hsts.fractions_tx, public.dose_hsts.fx_number_in_effect, public.dose_hsts.created_at, public.dose_hsts.updated_at, public.dose_hsts.was_verified, public.dose_hsts.meterset_prev_delivered"
"                          Index Cond: (public.dose_hsts.site_id = sites.id)"
"                          Filter: ((NOT public.dose_hsts.was_qa_mode) AND (public.dose_hsts.tx_dttm >= '2012-04-13 07:24:14'::timestamp without time zone) AND (public.dose_hsts.tx_dttm <= '2012-04-20 07:24:14'::timestamp without time zone) AND (public.dose_hsts.fractions_tx > 0) AND (public.dose_hsts.institution_id = 4))"
"Total runtime: 55.416 ms"

Best Answer

At 55ms given the complexity of that query and the number of rows sorted through I doubt you will get it much faster.

On one hand the query plan is long and complex because of the large number of joins and we do see nested loops, but they are nested loops against bitmapped index scans, not the actual tables.

Unless you can simplify the question there isn't much you can do to make it faster. However at 55 ms/query that is approx 150 queries of this complexity per second on a single core (minus planning overhead, data transfer, etc).