[can I] be 100% sure that the DB server has no memory of successfully executed queries and therefore no one else can get access to this information?
No, you can never be 100% sure of that.
PostgreSQL maintains an internal plan cache. Currently this cache is only used for prepared statements; one-shot statements do not go into the cache. If you use a prepared statement though, the plan will be cached, and when no longer required the cached plan is invalidated but not necessarily deleted until the PostgreSQL backend that cached it exits.
PostgreSQL stores the query string in an internal debug variable. When a new query replaces the old one the debug variable is overwritten. If the new query is shorter than the old one, some of the old query won't be overwritten.
A query could be running a backend that has pages of memory written to disk in swap space because of memory pressure. That memory might contain the query string and/or parameters. This swap space isn't necessary overwritten even after the query has finished and the backend has exited. If the server is powered off abruptly the hard drive could be forensically analyzed and might discover query text.
When PostgreSQL frees internal query planner/executor objects, it does not zero them. It just marks the memory as free for re-use. This memory remains mapped until the PostgreSQL backend exits, and may contain query text, parameter values, etc.
When a PostgreSQL backend exits, the kernel doesn't generally zero its memory, just marks it as freed. An attacker who can exploit the kernel to gain raw memory read access could then see some past (and, of course, all current) queries.
So if:
You have encrypted swap space with a password supplied interactively on boot;
You accept that anyone with physical access to the machine will probably be able to exploit it to retrieve recent query history;
You understand that anyone with root access, or access as the postgres
system user, can trace PostgreSQL's execution and capture query data (just not retroactively); and
You only care about query text recoverability after the individual PostgreSQL backend that ran that query has exited
then turning off logging is probably good enough. You also need to avoid using extensions like pg_stat_plans
and pg_stat_statements
that might capture query text and record it in the database.
Note that if PL/Python, PL/Perl, etc are installed, or some contrib modules are present, then it might be fairly easy to escalate access from the postgres
database user (or other superuser) to the postgres
shell user. From there you can reconfigure the database to enable logging, install extensions, inject C code, etc. There is no guarantee you would detect such an exploit, so they might be logging activity for some time.
An attacker who gets access even as the table owner might install a FOR EACH STATEMENT
trigger on tables of interest that logs queries affecting those tables. That won't work for SELECT
, but will for everything else. They don't even need to be database superuser for that.
Generally your queries shouldn't be security sensitive so much as your query parameter data. You can greatly reduce the chances of it being captured or logged by using proper bind parameters, e.g. libpq
's PQexecParams
, PgJDBC's PreparedStatement
class, etc.
Most importantly, though ... if you want to maintain database security, don't lose privileged access credentials. Don't use a superuser account except from the DB server and set pg_hba.conf
up so the superuser can't log in remotely. Don't have applications connect to the database as the same account that owns the tables they use, use a different account and GRANT
only the rights they require. etc.
IMO PostgreSQL needs an extension in the protocol that lets you mark query parameters as "sensitive". Such query parameters should never be emitted in log files, shown in pg_stat_activity
or otherwise recorded. This feature does not currently exist, though, and I'm not aware of anybody working on it
At the scope at which you're working. I think JSONB is ideal. It handles deeply nested structures and structures with array-keys. It's also standardized and in the spec for sql2016.
In addition, as I answered here, there is an extension that should help you with space consumption called ZSON,
ZSON is a PostgreSQL extension for transparent JSONB compression. Compression is based on a shared dictionary of strings most frequently used in specific JSONB documents (not only keys, but also values, array elements, etc).
In some cases ZSON can save half of your disk space and give you about 10% more TPS. Memory is saved as well. See docs/benchmark.md. Everything depends on your data and workload, though. Don't believe any benchmarks, re-check everything on your data, configuration, hardware, workload and PostgreSQL version.
You may want to look into ZSON.
Best Answer
To start, and this is just to get you somewhere,
Here, we serialize the purchases into
cumpurchased
. This seems simple, now we can blow this up by expandingcumpurchased
to the individual transactions.You should be ale to see where I'm going with this...
From this point, you need to filter out the -2, we wrap the query again and put that conditional in there. Then we just select from that.