Postgresql – How to increase performance for select in select statement

performancepostgresqlquery-performance

The item table is like this

id    category_id    name    registered

I have to show total number of items from the beginning. Suppose between 2013-05-01 to 20-05–05, I have to count all the items before 2013-05-01 and add that to count of 2013-05-01 's items. The result have to be grouped by registered and category_id. I have to do this in one query. This is my query

SELECT registered, me.category_id, 
    COUNT( * ) + (select count(*) from items where me.category_id = category_id 
    and registered < '2013-05-01 00:00:00') AS Value FROM items me

WHERE (
    (
      category_id IS NOT NULL
      AND 
     (me.registered BETWEEN '2013-05-01 00:00:00' AND '2013-05-02 00:00:00') 
    ) 
) 
GROUP BY registered, category_id

It takes almost 28 secs to generate the result. How can i increase the performance.

Update
I am using pgSQL. The registered and category_id is indexed. There are 568814 rows. I will be also very grateful, if any one can tell me that my sql is correct or not.

Update
Sorry for late reply.

CREATE TABLE items
(
id uuid NOT NULL DEFAULT uuid_generate_v4(),
registered timestamp without time zone NOT NULL,
modified timestamp without time zone NOT NULL,
published timestamp without time zone,
owner_id uuid,
status_id integer NOT NULL,
category_id uuid NOT NULL,
location_id uuid,
title text,
description text,
quantity integer NOT NULL DEFAULT 1,
unit text,
price_low bigint,
price bigint,
currency text NOT NULL DEFAULT 'BDT'::text,
ip text,
attributes_json text,
from_mobile boolean NOT NULL DEFAULT false,
CONSTRAINT items_pkey PRIMARY KEY (id),
CONSTRAINT items_fk_category_id FOREIGN KEY (category_id)
        REFERENCES categories (id) MATCH SIMPLE
        ON UPDATE CASCADE ON DELETE CASCADE DEFERRABLE INITIALLY IMMEDIATE,
CONSTRAINT items_fk_location_id FOREIGN KEY (location_id)
        REFERENCES locations (id) MATCH SIMPLE
        ON UPDATE NO ACTION ON DELETE NO ACTION DEFERRABLE INITIALLY IMMEDIATE,
CONSTRAINT items_fk_owner_id FOREIGN KEY (owner_id)
        REFERENCES users (id) MATCH SIMPLE
        ON UPDATE CASCADE ON DELETE CASCADE DEFERRABLE INITIALLY IMMEDIATE,
CONSTRAINT items_fk_status_id FOREIGN KEY (status_id)
        REFERENCES item_statuses (id) MATCH SIMPLE
      ON UPDATE CASCADE ON DELETE CASCADE DEFERRABLE INITIALLY IMMEDIATE
)

Best Answer

  1. What kind of column is [registered] -> datetime or varchar

  2. Why are you grouping by the full value of the [registered] column ?
    Ex. If you have the following values for this column:

        2013-05-01 00:00:00
        2013-05-01 00:00:01
        2013-05-01 00:00:02
        2013-05-01 00:00:03
        2013-05-01 00:00:04
    

    ... and you group by them, you will have 5 rows in the output.

    Grouping at the minute or hour level will get you less results -> faster query execution time ... if you have 568.814 rows, how many rows does your query return ? I suppose the result set is very close to 568.814 rows :)

  3. In the current form, for each row generated by the primary query, the subquery has to be run. Try to avoid that ...

Here my version of your query. Based on the same input, the output will be the same:

SELECT
  t1.registered,
  t1.category_id,
  (COUNT(*) + COALESCE(q2.c2,0)) AS Value
FROM
  items t1
  LEFT JOIN
  (
    SELECT
      t2.category_id,
      COUNT(*) AS c2
    FROM
      items t2
    WHERE 
      t2.registered < '2013-05-01 00:00:00'
    GROUP BY
      t2.category_id
  ) q2
  ON  t1.category_id = q2.category_id
WHERE
      t1.registered BETWEEN '2013-05-01 00:00:00' 
                        AND '2013-05-02 00:00:00'
GROUP BY
  t1.registered,
  t1.category_id,
  q2.c2 ;

Output

REGISTERED              CATEGORY_ID     VALUE
---------------------------------------------
May, 01 2013 00:00:00+0000  1           5
May, 01 2013 01:00:00+0000  1           5
May, 01 2013 02:00:00+0000  2           6
May, 01 2013 07:00:00+0000  3           3
May, 01 2013 08:00:00+0000  5           3
May, 01 2013 09:00:00+0000  1           5

Query Plan

HashAggregate (cost=35.27..35.32 rows=4 width=20)
-> Hash Right Join (cost=33.63..35.23 rows=4 width=20)
Hash Cond: (t2.category_id = t1.category_id)
-> HashAggregate (cost=20.82..21.49 rows=67 width=4)
-> Bitmap Heap Scan on items t2 (cost=6.27..19.52 rows=260 width=4)
Recheck Cond: (registered < '2013-05-01 00:00:00'::timestamp without time zone)
-> Bitmap Index Scan on idx_reg (cost=0.00..6.20 rows=260 width=0)
Index Cond: (registered < '2013-05-01 00:00:00'::timestamp without time zone)
-> Hash (cost=12.76..12.76 rows=4 width=12)
-> Bitmap Heap Scan on items t1 (cost=4.29..12.76 rows=4 width=12)
Recheck Cond: ((registered >= '2013-05-01 00:00:00'::timestamp without time zone) AND (registered <= '2013-05-02 00:00:00'::timestamp without time zone))
-> Bitmap Index Scan on idx_reg (cost=0.00..4.29 rows=4 width=0)
Index Cond: ((registered >= '2013-05-01 00:00:00'::timestamp without time zone) AND (registered <= '2013-05-02 00:00:00'::timestamp without time zone))

For anyone interested, here is the sql fiddle page with sample data: SQL-Fiddle-2