PostgreSQL – What Does POSIX/Europe/Paris Stand For?

postgresqltimezone

If you execute something like the following query on Postgres:

SELECT * FROM pg_timezone_names() WHERE name LIKE '%Rome%';

you get back a bunch of "weird" timezones:

postgres=# SELECT * FROM pg_timezone_names() WHERE name LIKE '%Rome%';
       name        | abbrev | utc_offset | is_dst
-------------------+--------+------------+--------
 Europe/Rome       | CET    | 01:00:00   | f
 posix/Europe/Rome | CET    | 01:00:00   | f
(2 rows)

I'm wondering why posix/ exists, what they stand for (they are not IANA official timezones right?) and when they are used. They look like a blend of POSIX standard and Olson DB, but posix should have -1 as utc_offset, right? I'm considering using the pg_timezone_names() function for giving the users the chance of choosing their own timezone, but I can't understand the meaning of this specific type of timezone in Postgres.

Best Answer

Generally, you shouldn't be using the Posix/ or Etc/ timezones if they're on your system. This isn't a PostgreSQL thing, it's your distributions libc database (sometimes packaged as tzdata/zoneinfo) for the internal timezone functions. Most of that POSIX stuff is nasty and old. PostgreSQL has a document on it which mentions it.

I guess to answer the question specifically,

Two different versions are provided: - The "posix" version is based on the Coordinated Universal Time (UTC). - The "right" version is based on the International Atomic Time (TAI), and it includes the leap seconds.

There really is no reason that I know of not to use the IANA names.

Related Solutions

PostgreSQL – Return Unique Column Combinations Based on Where Clause

The execution plan shown does not seem to match the big SELECT DISTINCT query because the Sort and Unique steps are missing. Anyway you are correct than when retrieving ~50% of a table, index don't help. The best strategy is a big sequential scan of the main table and only fast hardware helps with that.

For the 2nd part of the question:

How would I go about selecting only the unique combinations of adjacent columns? Is this too complicated a task to perform through a database query? Would it speed up the query?

To remove duplicate combinations of adjacent columns, the structure of the resultset should be changed so that each output row has only one couple of adjacent columns along with their corresponding dimensions in the parallel coordinates graph. Well, except that the dimension for the 2nd column is not necessary since it's always the dimension for the other column plus one.

In one single query, this could be written like this:

WITH logs as (
  SELECT log_time_mapped, syslog_priority_mapped, 
     operation_mapped, message_code_mapped, protocol_mapped, 
     source_ip_mapped, destination_ip_mapped, 
     source_port_mapped, destination_port_mapped, 
     destination_service_mapped, direction_mapped, 
     connections_built_mapped, connections_torn_down_mapped, 
     hourofday_mapped, meridiem_mapped 
  FROM firewall_logs_mapped 
  WHERE operation = 'Built')
SELECT DISTINCT 1, log_time_mapped, syslog_priority_mapped FROM logs
UNION ALL
SELECT DISTINCT 2, syslog_priority_mapped, operation_mapped FROM logs
UNION ALL
SELECT DISTINCT 3, operation_mapped, message_code_mapped FROM logs
UNION ALL
...etc...
SELECT DISTINCT 14,  hourofday_mapped, meridiem_mapped FROM logs
;

The first SELECT DISTINCT subquery extracts the lines to draw between dimensions 1 and 2, the next subquery between dimensions 2 and 3, and so on. DISTINCT eliminates duplicates, so the client side doesn't have to do it. The UNION ALL concatenates the results without any further processing.

However it's a heavy query and it's dubious that it would be any faster than what you're already doing.

The WITH subquery is likely to gets slowly materialized on disk, so it might be interesting to compare the execution time with this other form repeating the same condition:

SELECT DISTINCT 1, log_time_mapped, syslog_priority_mapped
   FROM firewall_logs_mapped WHERE operation = 'Built'
UNION ALL
SELECT DISTINCT 2, syslog_priority_mapped, operation_mapped
   FROM firewall_logs_mapped WHERE operation = 'Built'
UNION ALL
SELECT DISTINCT 3, operation_mapped, message_code_mapped
   FROM firewall_logs_mapped WHERE operation = 'Built'
...etc...
;

PostgreSQL what happens if TimeZone is not set

It seems like PostgreSQL 10 still

Looks at the environmental variable TZ first
Proceeds to use the localtime library function.
Looks under the shared directory or the system directory for matching timezone information.

So it seems like the 9.1 docs are still accurate as of PostgreSQL 10.

Internally, you can see the list of timezone information by using SELECT * FROM pg_timezone_names(); (here in the source) which calls pg_tzenumerate_start which the reads the shared directory or system directory above.

Best Answer

Related Solutions

PostgreSQL – Return Unique Column Combinations Based on Where Clause

PostgreSQL what happens if TimeZone is not set

Related Question