Postgresql – Speed up 180M+ rows returned by simple SELECT with PostgreSQL

google-cloud-sqlperformancepostgresqlpostgresql-9.6query-performance

I have performance issues on a SELECT statement which returns 180+ out of 220+ of millions of rows on my managed PostgreSQL. The table structure is the following:

CREATE TABLE members (
    id bigserial NOT NULL,
    customer varchar(64) NOT NULL,
    community varchar(64) NOT NULL,
    member varchar(64) NOT NULL,
    has_connected bool NULL DEFAULT false,
    CONSTRAINT members_customer_community_members_key UNIQUE (customer, community, member),
    CONSTRAINT members_pkey PRIMARY KEY (id)
);

The "guilty" SELECT query is:

SELECT
    community,
    member,
    has_connected
FROM
    members
WHERE
    customer = :customer;

I have already indexed the table:

CREATE INDEX members_idx ON members USING btree (customer, community, has_connected, member);

and the query behave well for most of the customer value. However, I have a customer, let's call it 1234, which represents 80 % of the table, so the query planner prefers to scan the whole table according to the following ̀explain analyze` result:

Seq Scan on public.members  (cost=0.00..5674710.80 rows=202271234 width=55) (actual time=0.018..165612.655 rows=202279274 loops=1)
  Output: community, member, has_connected
  Filter: ((members.customer)::text = '1234'::text)
  Rows Removed by Filter: 5676072
Planning time: 0.106 ms
Execution time: 175174.714 ms

As I said earlier, my PostgreSQL is a managed instance of PostgreSQL 9.6.14 hosted on Google Cloud Platform with 10 vCPUs and 30 GB RAM. I am rather limited to the available flags , so the only PostgreSQL options tuned on this instance are:

max_connections: 1000
work_mem: 131072 KB
maintenance_work_mem: 2000000 KB

What are my options to solve this issue and to strongly reduce the query time, preferably below 30 seconds if possible ?

Best Answer

As you mentioned you are trying to get 80% of the rows of your table and it is normal that postgreSQL optimizer decide to scan the table rather than get data from the index. Unfortunately postgres does not support queries with index hints like MS SQL server, so it is not possible to compare the results of index scan vs sequential scan but I am absolutely sure that you will not get better results with index scan.

Another question is why you need 180mln rows? What do you want to do with this data? I assume that this is an analytical query so you can try some column store technologies. Postgres have cstore_fwd extension available so you can try it out.