PostgreSQL – How to Compare Time Portion of Multiple Dates

datepostgresql

I have a postgresql database with various records which have a date column which looks like 2020-10-12 12:45:55. I am wanting to construct a query that looks for activity with timestamps that are on any day, but between 22:00:00 of one day and 05:00:00 the next day in the actual time portion.

How do I ignore the date and just focus on the time portion of this column? I think I need to split up the date column, but still be able to use < and > operators on the time string which is left over.

Best Answer

which have a date column which looks like 2020-10-12 12:45:55

That's not a date column, but a timestamp (date and time). Assuming type timestamp, not timetamptz? (You should always disclose actual table definitions.)

SELECT *
FROM   tbl
WHERE  timestamp_in_disguise::time >= '22:00'
OR     timestamp_in_disguise::time <= '05:00';

See:

Postgres data type cast

Or:

...
WHERE  timestamp_in_disguise::time NOT BETWEEN '05:00' AND '22:00';

The first includes bounds and you can adjust as needed.

The second uses BETWEEN, which always includes bounds - so excludes them in the negated expression.
You can still use a hack to include bounds building on the inside knowledge that Postgres stores times and timestamps with microsecond resolution (6 fractional decimal digits) in its current implementation.

...
WHERE  timestamp_in_disguise::time NOT BETWEEN '05:00:00.000001' AND '21:59:59.999999';

But I strongly advice against the latter. Building on implementation details is brittle and ugly.

Casting the string literals to the right is optional as their type is derived from the typed column.

If we are, in fact, dealing with timestamptz you need to define where in the world it's supposed to be 22:00 etc. (You may need to think about that with timestamp, too.) See:

Ignoring time zones altogether in Rails and PostgreSQL

About BETWEEN and including lower and upper bound:

How to add a day/night indicator to a timestamp column?

If you run this kind of queries a lot, consider an expression index:

CREATE INDEX ON tbl ((timestamp_in_disguise::time));  -- parentheses needed

Query cleanup

First, lets rewrite your query to be readable, by using table aliases, qualifying field names, and using an ANSI join:

select t.userID, t.date, t.time, t.servID, t.timestamp, 
       l.servID_HEX, l.SERV_LOCY, l.SERV_LOCX 
from test t
     inner join locations l on (t.servID=l.servID_HEX)
where t.userID='<someusers>' 
  and extract(dow from t.timestamp)=2;

What the query needs

To start breaking this problem down you'll want to do is look at the fields used and where they're used:

Tables: locations, test

Fields output: test.userID, test.date, test.time, test.servID, test.timestamp, locations.servID_HEX, locations.SERV_LOCY, locations.SERV_LOCX

Terms used in filter predicates: test.servID, locations.servID_HEX, test.userID, extract(dow from test.timestamp)=2

Indexes

Now, the most important thing is to make sure that you have indexes that target high-selectivity columns used in predicates.

"high selectivity" just means that there aren't tons of the same value - there's no point indexing a column if 50% of the time it's 'a' and 50% of the time it's 'b', because you just don't gain much benefit from the index.

Assuming that all these columns have lots of well distributed distinct values, so they're highly selective and don't have any values that're massively common, you'll want to create an ordinary b-tree index. That's enough for locations because there's just one value used in a predicate:

CREATE INDEX locations_servid_idx ON locations(servID_HEX);

For test it's more complex.

You could create indexes separately for each column, but it's more efficient to create a single composite index that has all the columns you're using in this query's predicates for a given table. So you want servId, userID and timestamp. However, you don't use timestamp directly - you use it to get the day of week, and an index on timestamp can't be used to look up extract(dow from timestamp).

This is where expression indexes come in.

Indexes don't just cover columns. They can also cover arbitrary expressions. In this case, we'll create an index that includes extract(dow from timestamp).

So the index for test will look like:

CREATE INDEX test_custom_idx ON test (
    servID,
    extract(dow from timestamp),
    userID
);

As you'll be doing different kinds of timestamp based queries I'd consider creating multiple indexes for different timestamp expressions.

Which order to put the columns in depends on how selective each column is. Even though I'm sure the day-of-week will be more selective I've put the server ID first because it's used in a join condition that lets us completely disregard rows of the other table.

Since you didn't provide sample tables or sample data I can't really test it and I'm not keen on dummying some up. Adjustments may be required.

Partial indexes

Partial indexes can also be a big win, where the index only contains data for a subset of the data, e.g.:

CREATE INDEX test_custom_idx ON test (
    servID, userID
) WHERE (extract(dow from timestamp) = 2);

contains only data for that day, so it's a lot faster to search for queries involving just that day, but cannot be used at all for queries that might involve different days, or for which the day is determined at runtime from the output of another part of the query.

Index-only scans and covering indexes

Another thing you can consider is that all those unused columns still have to be read from disk to fetch your rows. In PostgreSQL 9.2 and newer there's a way to avoid that using an index-only scan if all columns you are fetching are in an index.

So you could create a covering index, where you include columns that aren't really required for the actual search but are there so they can be output without reading from the main table. The first thing to do for that is to get rid of useless values in your SELECT list, so we'll delete the redundant t.date and t.time from your query. Then create two new indexes:

CREATE INDEX locations_covering_idx 
ON locations(
    servID_HEX,
    SERV_LOCY,
    SERV_LOCX
);

CREATE INDEX test_covering_idx ON test (
    servID,
    extract(dow from timestamp),
    userID,
    timestamp
);

Reducing repetition

Since your dataset is static, having tons and tons of indexes isn't a big problem.

Each index has a cost for insert/update/delete. So for non-static data you want to minimize that by getting rid of indexes that don't provide enough improvement to be worth the cost.

One way to do that can be to split up composite indexes. In your case for example I might take the timestamp stuff out of the main index on location and put it in a bunch of new indexes instead. That way all the different timestamp expression indexes can be updated without rewriting a whole lot of unrelated data for userid and serverid as well. PostgreSQL can still often use the indexes together, combining them with a bitmap index scan.

Not a concern for your current dataset, but worth learning for later.

Data cleanup

This is a static data set. So if you don't need a column, drop it. Or better yet, create a new table with just the subset of data you need, pre-joined, using something like:

CREATE TABLE reporting AS
select t.userID, t.servID, t.timestamp, l.SERV_LOCY, l.SERV_LOCX 
from test t
     inner join locations l on (t.servID=l.servID_HEX)
ORDER BY t.userID, t.servID, t.timestamp;

which creates a new table that has the data pre-joined and sorted for the most useful grouping of access.

You can then write much simpler queries that'll be much, much faster, against the new table, and your indexes will be a lot simpler too.

PostgreSQL ODBC and Microsoft Access date/time in aggregate functions

I gave up on making the ODBC driver work in access to calculate this sum properly.

My solution was to create a view in PostgreSQL which calculated the sum of the hours. See below:

    SELECT employees.empid,
    employees.startdate,
    employees.payrate,
    employees.surname,
    employees.first_name,
    emphours.weekending,
    emphours.department,
    emphours.paytype,
    sum(date_part('epoch'::text, (emphours.endtime - emphours.starttime) / 3600::double precision)) AS hours,
    sum(date_part('epoch'::text, (emphours.endtime - emphours.starttime) / 3600::double precision)) * employees.payrate AS totalpay
   FROM employees
     JOIN emphours ON employees.empid = emphours.empid
  GROUP BY employees.empid, employees.startdate, employees.payrate, employees.surname, employees.first_name, emphours.weekending, emphours.department, emphours.paytype;

Although I needed to change some access queries slightly, this seemed to be an effective workaround.

I also considered creating pass though queries but it seemed to be more work than creating the view.