PostgreSQL – How to Get OS User in Postgres 9.4

postgresqlpostgresql-9.4

The title says it all. In Oracle, we can easily get the OS user using:

select sys_context( 'userenv', 'os_user' ) from dual;

I usually use that for logging purposes (know who inserted/updated what and when).
I searched for something similar for Postgres (I am new to it) but I could not find. Isn't that possible?

In general, if Postgres does not allow me to get the OS user, does it allow retreiving any kind of information outside the DB itself (e.g.: IP of host, .. etc.)?

Best Answer

PostgreSQL has few functions reporting information on the client environment, listed at https://www.postgresql.org/docs/current/static/functions-info.html

Concerning the remote IP address, it provides inet_client_addr().

Concerning the client username, I don't think it has this information at all, because it's not passed at connection time, unless it happens to be the same as the database user name, but the backend can't know that in general.

If the SQL functions need that information,it could be passed up front by your applications in the form of a SET statement. Current versions of PostgreSQL allow to instantiate any namespace for custom variables.

Example with version 9.3:

test=> set session.osuser to 'daniel';
SET

test=> show session.osuser;
 session.osuser 
----------------
 daniel
(1 row)

test=> select current_setting('session.osuser');
 current_setting 
-----------------
 daniel
(1 row)

Basic answers

Since you select a couple of big columns an index-only scan is probably not a viable option.

This code works (if no NULL values in data!)

While the column isn't defined NOT NULL, add NULLS LAST to the sort order to make it work in any case, even with NULL values. Ideally, use the clause in the corresponding index as well:

SELECT <some big columns>
FROM   my_table_ 
ORDER  BY when_ DESC NULLS LAST
LIMIT  1;

PostgreSQL sort by datetime asc, null first?

Without any index on when_ column, does this statement require a full scan of all rows?

Yes. Without index, there is no other option left. (Well, there is also table partitioning where an index on key columns(s) is not strictly required, and it could assist with partition pruning. But you would typically have an index on key columns there, too.)

With an index on when_ column, should I change this SQL to use some other approach/strategy of query?

Basically, this is the perfect query. There are options in combination with advanced indexing:

Advanced technique

Assuming a NOT NULL column. Else, add NULLS LAST to index and queries as suggested above.

You have a constant influx of rows with later when_. Assuming the latest _when constantly increases and never (or rarely) decreases (latest rows deleted / updated), you can use a very small partial index.

Basic implementation:

Run your query once to retrieve the latest when_, subtract a safe margin (to be safe against losing the latest rows) and create an IMMUTABLE function based on it. Basically a "fake global constant":
```
CREATE OR REPLACE FUNCTION f_when_cutoff()
  RETURNS timestamptz LANGUAGE sql COST 1 IMMUTABLE PARALLEL SAFE AS
$$SELECT timestamptz '2015-07-25 01:00+02'$$;
```
PARALLEL SAFE only in Postgres 9.6 or later.
Create a partial index excluding older rows:
```
CREATE INDEX my_table_when_idx ON my_table_ (when_ DESC)
WHERE when_ > f_when_cutoff();
```
With millions of rows, the difference in size can be dramatic. And this only makes sense with a much smaller index. Just half the size or something would not cut it. Index access itself is not slowed much by a bigger index. It's mostly the sheer size of the index, which needs to be read and cached. (And possibly avoiding additional index writes, but hardly in your case.)
Use the function in all related queries. Include the same WHERE condition (even if logically redundant) to convince the query planner the index is applicable. For the simple query:
```
SELECT <some big columns>
FROM   my_table_ 
WHERE  when_ > f_when_cutoff()
ORDER  BY when_ DESC
LIMIT  1;
```

The size of the index grows with new (later) entries. Recreate the function with a later timestamp and REINDEX from time to time with no or little concurrent access. Only reindex after a relevant number of rows has been added. A couple of thousand entries won't matter much. We are doing this to cut off millions.
The beauty of it: queries don't change.

Implementation with function to update the partial index automatically:

Get latest child per parent from big table - query is too slow

More general advice:

Index optimization with dates

Postgresql – Large batch INSERTs / UPDATEs and DELETEs Performance problem

Is vacuuming slow because you have inadequate IO throughput, or just because it is throttled too much?

The default throttling of autovacuum is not suitable for very write-intensive servers. You should probably decrease autovacuum_vacuum_cost_delay or increase vacuum_cost_limit. And I routinely set vacuum_cost_page_hit and vacuum_cost_page_miss to zero. Page misses are inherently self-limiting as the vacuum process can't proceed until the page is delivered; so there is little reason to add intentional throttling on top of that.

Best Answer

Related Solutions

PostgreSQL Query Performance – How to Get Row with Latest Timestamp

Basic answers

Advanced technique

Postgresql – Large batch INSERTs / UPDATEs and DELETEs Performance problem

Related Question