PostgreSQL 9.3 – Performance Issue: Counting table entries slower with jdbc postgres driver

I'm using PostgreSQL 9.3 and want to ask if it is possible to improve the performance of my queries based on the following database schema:

CREATE TABLE "entities" (
  "id" BIGSERIAL,
  "type" INT NOT NULL,
  "name" VARCHAR(255) NOT NULL,
  "sentence_id" BIGINT NOT NULL, -- REFERENCES sentences(sentence_id)
  "date" DATE NOT NULL
);

CREATE TABLE "relationships" (
  "id" BIGSERIAL,
  "type" INT NOT NULL,
  "entity1" BIGINT NOT NULL, -- REFERENCES entities(id)
  "entity2" BIGINT NOT NULL, -- REFERENCES entities(id)
  "sentence_id" BIGINT NOT NULL, -- REFERENCES sentences(sentence_id)
  "date" DATE NOT NULL
);

The primary key for entities is (id, name, sentence_id) and I do have index on the fields "id" and "date". The primary key for relationships is (id, entity1, entity2, sentence_id) and I also have index on "id" and "date".

I will briefly explain the domain below. Maybe some of you do know another better approach.

There are sentences containing Persons on a certain date:

sent_id, date, sentence
1, 27.09.1990, "Franz Beckenbauer bedankt sich bei Angela Merkel"
2, 27.09.1990, "Angela Merkel feiert ihren Abschluss in Physik"

From this sources I extract the following table entries for the table "entities":

entitie_id, name, sent_id, date
1, Angela Merkel, 1, 27-09-1990
1, Angela Merkel, 2, 27-09-1990
2, Franz Beckenbauer, 1, 27-09-1990

I use this query to get the frequency of entities within a given time period:

SELECT COUNT(*) FROM entities AS e 
WHERE e.id = {id} AND (e.date BETWEEN {from} AND {to})

and this one for the frequency of the relationship between them (how often they occurred together):

SELECT COUNT(*) FROM relationships AS r
WHERE r.id = {id} AND (r.date BETWEEN {from} AND {to})

Following a working example with explain analyze:

EXPLAIN ANALYZE SELECT COUNT(*) from entities as e 
WHERE e.id = 474118 AND 
(e.date BETWEEN '2010-01-01' AND '2010-12-01');

Aggregate  (cost=741.92..741.93 rows=1 width=0) (actual time=0.077..0.077   rows=1 loops=1)
  ->  Bitmap Heap Scan on entity e  (cost=6.66..741.46 rows=183 width=0) (actual   time=0.067..0.072 rows=4 loops=1)
     Recheck Cond: (id = 474118)
     Filter: (("date" >= '2010-01-01'::date) AND ("date" <= '2010-  12-01'::date))
     ->  Bitmap Index Scan on e_pkey_index  (cost=0.00..6.61 rows=190 width=0) (actual time=0.053..0.053 rows=4 loops=1)
           Index Cond: (id = 474118)
 Total runtime: 0.119 ms
(7 rows)

Is there a possibility to speed up this query or can I change the structure according to my domain to improve performance? I'm using this sort of query in my application very extensively to sort entities and relationships based on their individual frequency. In total I have run times of about more or less 6 seconds. If I can decrease a single query time this may also improve.

EDIT:

Explain Analyze after combined index (id, date) with another id:

Aggregate  (cost=707.84..707.85 rows=1 width=0) (actual time=0.108..0.108 rows=1 loops=1)
   ->  Bitmap Heap Scan on entities e  (cost=6.74..707.39 rows=181 width=0)    (actual time=0.081..0.104 rows=4 loops=1)
     Recheck Cond: ((id = 957604) AND (date >= '2010-01-01'::date) AND (date   <= '2010-12-01'::date))
     ->  Bitmap Index Scan on e_id_date_index  (cost=0.00..6.70 rows=181   width=0) (actual time=0.073..0.073 rows=4 loops=1)
           Index Cond: ((id = 957604) AND (date >= '2010-01-01'::date) AND (date <= '2010-12-01'::date))
 Total runtime: 0.168 ms
(6 rows)

EDIT 2

Found the culprit. It seems that some of the queries (here the entity query) takes more than a second. This would explain why my application function, which calls relation and entity query n times, takes more than 6 seconds. Found this with the "auto_explain" module of PostgreSQL. It logs all queries from my application which takes more than a seconds automatically and performs explain analyze on them.

2014-08-04 00:02:25 CEST LOG:  execute <unnamed>: SELECT COUNT(*)            FROM entities AS e           WHERE e.id = $1           AND           (e.date   BETWEEN $2 AN$
2014-08-04 00:02:25 CEST DETAIL:  parameters: $1 = '879', $2 = '2010-02-09', $3 = '2010-11-24'
2014-08-04 00:02:26 CEST LOG:  duration: 1266.621 ms
2014-08-04 00:02:26 CEST LOG:  duration: 1266.547 ms  plan:
        Query Text: SELECT COUNT(*)            FROM entities AS e           WHERE e.id = $1           AND           (e.date BETWEEN $2 AND $3)
        Aggregate  (cost=85495.58..85495.59 rows=1 width=0) (actual rows=1 loops=1)
           ->  Bitmap Heap Scan on entities e  (cost=4704.17..84977.27   rows=207326 width=0) (actual rows=219714 loops=1)
            Recheck Cond: (id = 879::bigint)
                Rows Removed by Index Recheck: 6437814
            Filter: ((date >= '2010-02-09'::date) AND (date <= '2010-11-  24'::date))
                 Rows Removed by Filter: 44104
                 ->  Bitmap Index Scan on e_pkey_index  (cost=0.00..4652.34   rows=251720 width=0) (actual rows=263818 loops=1)
                      Index Cond: (id = 879::bigint)

Sorry that it is another id but it's hard to get the same twice. It seems that the problem are id's that have many entries in the entities table. After recreating the index it looks like:

2014-08-04 22:38:23 CEST LOG:  duration: 697.026 ms
2014-08-04 22:38:23 CEST LOG:  duration: 697.002 ms  plan:
    Query Text: SELECT COUNT(*)            FROM entities AS e           WHERE     e.id = $1           AND           (e.date BETWEEN $2 AND $3)
        Aggregate  (cost=83520.53..83520.54 rows=1 width=0) (actual rows=1   loops=1)
           ->  Bitmap Heap Scan on entities e  (cost=1775.72..83319.78   rows=80297 width=0) (actual rows=81936 loops=1)
                Recheck Cond: (id = 514::bigint)
                 Rows Removed by Index Recheck: 3651177
                 Filter: ((date >= '2010-02-02'::date) AND (date <= '2010-11-  25'::date))
                Rows Removed by Filter: 11888
                 ->  Bitmap Index Scan on e_pkey_index  (cost=0.00..1755.64  rows=94828 width=0) (actual rows=93824 loops=1)
                  Index Cond: (id = 514::bigint)

Best Answer

There are too many different topics in one question, and even after many updates not all is clear. I'll just pick the elephant in th room and ignore the rest:

Query 1

->  Index Scan using r_pkey_index on relationships r (cost=0.43..57.53 rows=13 width=0)
                                                     (actual rows=1 loops=1)
         Index Cond: (id = 947367::bigint)
         Filter: ((date >= '2010-02-09'::date) AND (date <= '2010-11-24'::date))

Query 2

Filter: ((date >= '2010-02-09'::date) AND (date <= '2010-11-24'::date))
   Rows Removed by Filter: 44104
      ->  Bitmap Index Scan on e_pkey_index (cost=0.00..4652.34 rows=251720 width=0)
                                            (actual rows=263818 loops=1)
          Index Cond: (id = 879::bigint)

Obviously, a single row matches your criteria in the first query, while there are more than 200.000 in the second. Obviously, the first can be much faster.
Leaving aside that we are comparing two queries on two different tables.

EDIT:

EDIT 2

Best Answer

Related Solutions

Postgresql – Optimizing ORDER BY in a full text search query

PostgreSQL Indexing – How to Index WHERE (start_date >= ‘2013-12-15’)

Related Question