Query cleanup
First, lets rewrite your query to be readable, by using table aliases, qualifying field names, and using an ANSI join:
select t.userID, t.date, t.time, t.servID, t.timestamp,
l.servID_HEX, l.SERV_LOCY, l.SERV_LOCX
from test t
inner join locations l on (t.servID=l.servID_HEX)
where t.userID='<someusers>'
and extract(dow from t.timestamp)=2;
What the query needs
To start breaking this problem down you'll want to do is look at the fields used and where they're used:
Tables: locations, test
Fields output: test.userID, test.date, test.time, test.servID, test.timestamp, locations.servID_HEX, locations.SERV_LOCY, locations.SERV_LOCX
Terms used in filter predicates: test.servID, locations.servID_HEX, test.userID, extract(dow from test.timestamp)=2
Indexes
Now, the most important thing is to make sure that you have indexes that target high-selectivity columns used in predicates.
"high selectivity" just means that there aren't tons of the same value - there's no point indexing a column if 50% of the time it's 'a' and 50% of the time it's 'b', because you just don't gain much benefit from the index.
Assuming that all these columns have lots of well distributed distinct values, so they're highly selective and don't have any values that're massively common, you'll want to create an ordinary b-tree index. That's enough for locations
because there's just one value used in a predicate:
CREATE INDEX locations_servid_idx ON locations(servID_HEX);
For test
it's more complex.
You could create indexes separately for each column, but it's more efficient to create a single composite index that has all the columns you're using in this query's predicates for a given table. So you want servId
, userID
and timestamp
. However, you don't use timestamp
directly - you use it to get the day of week, and an index on timestamp
can't be used to look up extract(dow from timestamp)
.
This is where expression indexes come in.
Indexes don't just cover columns. They can also cover arbitrary expressions. In this case, we'll create an index that includes extract(dow from timestamp)
.
So the index for test
will look like:
CREATE INDEX test_custom_idx ON test (
servID,
extract(dow from timestamp),
userID
);
As you'll be doing different kinds of timestamp based queries I'd consider creating multiple indexes for different timestamp expressions.
Which order to put the columns in depends on how selective each column is. Even though I'm sure the day-of-week will be more selective I've put the server ID first because it's used in a join condition that lets us completely disregard rows of the other table.
Since you didn't provide sample tables or sample data I can't really test it and I'm not keen on dummying some up. Adjustments may be required.
Partial indexes
Partial indexes can also be a big win, where the index only contains data for a subset of the data, e.g.:
CREATE INDEX test_custom_idx ON test (
servID, userID
) WHERE (extract(dow from timestamp) = 2);
contains only data for that day, so it's a lot faster to search for queries involving just that day, but cannot be used at all for queries that might involve different days, or for which the day is determined at runtime from the output of another part of the query.
Index-only scans and covering indexes
Another thing you can consider is that all those unused columns still have to be read from disk to fetch your rows. In PostgreSQL 9.2 and newer there's a way to avoid that using an index-only scan if all columns you are fetching are in an index.
So you could create a covering index, where you include columns that aren't really required for the actual search but are there so they can be output without reading from the main table. The first thing to do for that is to get rid of useless values in your SELECT
list, so we'll delete the redundant t.date
and t.time
from your query. Then create two new indexes:
CREATE INDEX locations_covering_idx
ON locations(
servID_HEX,
SERV_LOCY,
SERV_LOCX
);
CREATE INDEX test_covering_idx ON test (
servID,
extract(dow from timestamp),
userID,
timestamp
);
Reducing repetition
Since your dataset is static, having tons and tons of indexes isn't a big problem.
Each index has a cost for insert/update/delete. So for non-static data you want to minimize that by getting rid of indexes that don't provide enough improvement to be worth the cost.
One way to do that can be to split up composite indexes. In your case for example I might take the timestamp stuff out of the main index on location and put it in a bunch of new indexes instead. That way all the different timestamp expression indexes can be updated without rewriting a whole lot of unrelated data for userid and serverid as well. PostgreSQL can still often use the indexes together, combining them with a bitmap index scan.
Not a concern for your current dataset, but worth learning for later.
Data cleanup
This is a static data set. So if you don't need a column, drop it. Or better yet, create a new table with just the subset of data you need, pre-joined, using something like:
CREATE TABLE reporting AS
select t.userID, t.servID, t.timestamp, l.SERV_LOCY, l.SERV_LOCX
from test t
inner join locations l on (t.servID=l.servID_HEX)
ORDER BY t.userID, t.servID, t.timestamp;
which creates a new table that has the data pre-joined and sorted for the most useful grouping of access.
You can then write much simpler queries that'll be much, much faster, against the new table, and your indexes will be a lot simpler too.
I gave up on making the ODBC driver work in access to calculate this sum properly.
My solution was to create a view in PostgreSQL which calculated the sum of the hours. See below:
SELECT employees.empid,
employees.startdate,
employees.payrate,
employees.surname,
employees.first_name,
emphours.weekending,
emphours.department,
emphours.paytype,
sum(date_part('epoch'::text, (emphours.endtime - emphours.starttime) / 3600::double precision)) AS hours,
sum(date_part('epoch'::text, (emphours.endtime - emphours.starttime) / 3600::double precision)) * employees.payrate AS totalpay
FROM employees
JOIN emphours ON employees.empid = emphours.empid
GROUP BY employees.empid, employees.startdate, employees.payrate, employees.surname, employees.first_name, emphours.weekending, emphours.department, emphours.paytype;
Although I needed to change some access queries slightly, this seemed to be an effective workaround.
I also considered creating pass though queries but it seemed to be more work than creating the view.
Best Answer
That's not a date column, but a timestamp (date and time). Assuming type
timestamp
, nottimetamptz
? (You should always disclose actual table definitions.)See:
Or:
The first includes bounds and you can adjust as needed.
The second uses
BETWEEN
, which always includes bounds - so excludes them in the negated expression.You can still use a hack to include bounds building on the inside knowledge that Postgres stores times and timestamps with microsecond resolution (6 fractional decimal digits) in its current implementation.
But I strongly advice against the latter. Building on implementation details is brittle and ugly.
Casting the string literals to the right is optional as their type is derived from the typed column.
If we are, in fact, dealing with
timestamptz
you need to define where in the world it's supposed to be 22:00 etc. (You may need to think about that withtimestamp
, too.) See:About
BETWEEN
and including lower and upper bound:If you run this kind of queries a lot, consider an expression index: