Having trouble indexing and executing a query in O (log n)
time.
The query includes an INNER JOIN
, an ORDER BY
, and an equality operation. If I understand the laws of databases correctly, a query can be indexed and executed in O (log n)
time (or thereabouts) if a non-equality operator is not used on more than one column. In this case, I believe the INNER JOIN
does count as an equality operator and the non-equality operator would be the ORDER BY
portion of the query. This table has upwards of 10,000,000 rows and needs to handle several reads and writes per second.
Using PostgreSQL. This is what the table looks like. As you can see, the column 'Names' is a list property and it is the column that the INNER JOIN
goes against:
Age Names Date
34 ['carla', 'john', 'sam'] 3/13/2011
26 ['json', 'cindy', 'joel'] 3/13/2011
72 ['beth', 'amber', 'susie'] 3/13/2011
14 ['john', 'jim', 'debie'] 3/13/2011
This is the query that we are trying to do:
SELECT * FROM the_table WHERE Age==26 AND Names=='john' ORDER BY Date
My background is from using App Engine's Big Table, so I've used equality operators here to indicate that 'john'
should be one of the names in the Names
column. This would be an acceptable query in GAE's big table, it would execute in O (log N)
time as all Big Table queries are reqyured to do. I am assuming there is a way to do this in PostgreSQL as well since PostgreSQL accepts list data types as columns.
Is this possible to do in PostgreSQL?
If so, how should the index be set up (we can't figure out how to set up an index that takes into account the three properties)?
Best Answer
This is possible for PostgreSQL 8.4 or later. In order to create a multi-column GIN index, you need to install the additional module btree_gin.
Simple types like
integer
ordate
are usually better off with a (default) B-Tree index, so that's not installed with standard PostgreSQL. But for a case like this a multi-column index is fastest, so you need the additional index methods for the plain types.In PostgreSQL 9.1 or later you can install it with
CREATE EXTENSION
:In older versions you would run as privileged system user (like postgres):
For a PostgreSQL 8.4 installation on Debian,this might be:
Then, given this table:
... you can create this multi-column GIN index:
... which can be used by a query like:
Note that a GIN index caries a non-trivial cost for write operations.
But that's hardly worth mentioning in comparison to what a
SELECT
on 10 million rows without index would do to you.