Postgresql – How to speed up counting and filtering on table with a lot of columns and filter conditions

countindexpostgresqlpostgresql-performancewhere

I have TABLE_A in my database that has a lot of columns.

Let's say there are 100 columns: COLUMN_1, COLUMN_2, COLUMN_3, .. COLUMN_100

There are so many columns not because of denormalized design, but because each row represent Entity with a lot of properties (in this example – with 99 properties, first column is just an id)

My Application should do the following tasks:

  • Receive a set of filter conditions from the user
  • Execute a query in the database that filters the data on given conditions and then should count the result rows

There are no limits on the number of filter conditions:

  • there may be no filter conditions
  • there may be filter conditions on every column of the table
  • there may be filter conditions on some columns of the table (for example on 50 columns out of all)

So below is an example query that my application executes:

SELECT 
    COUNT(*)
FROM 
    TABLE_A 
WHERE 
    COLUMN_1 = 'test'
    AND COLUMN_2 != 'q'
    AND COLUMN_45 > 5
    AND COLUMN_45 < 511
    AND COLUMN_92 LIKE '%ddd%'
    AND COLUMN_98 > 1000

TABLE_A doesn't have any indexes – each column can be used for filtering and there are no sets of queries that are run much more frequently than the rest.

I also don't use any kind of cache on any level: insert and update operations happen not very often but more often than at least 2 query with the same filter conditions occur.

So in case of every query the sequential search is executed. It was not a problem earlier, but now the run-times of the query became unacceptable (number of rows in the table increased a lot).

So..

Is there any solution on how to speed up the execution of the task?

Any suggestions would be appreciated – I can change anything except one thing – I should use SQL. Currently I use PostgreSQL, but I would change DBMS if it will help.

Best Answer

I feel your pain, nik. We have a similar use case here, and have been struggling through as best we can with a metric ton of indexes.

I know you said you need to stick with SQL, but I really think you might want to consider a column store (https://en.wikipedia.org/wiki/Column-oriented_DBMS), perhaps one SQL-like enough for you to still find workable.

MariaDB seems to be embracing this now too: https://mariadb.com/resources/blog/why-columnstore-important