Postgresql – Why does PostgreSQL require a database connection

postgresql

A more precise version of my question might be, does the fact that PostgreSQL requires a database connection (as opposed to some sort of generic database cluster connection) follow directly from some basic design decisions that could be understood by a novice database user?

I am one such novice database user; I don't have a good understanding of database internals. The SQL database I'm most familiar with is SQLite (although I wouldn't say I'm very familiar with how SQLite works either) and I also use PostgreSQL occasionally.

Something that I find conceptually distressing about PostgreSQL is that it requires you to connect to a specific database in order to execute commands even if the commands in question are database agnostic. For example, even if I all I want to do is to print a list of the databases in a database cluster (i.e. psql -l) I'm still required to connect to a specific database within that cluster; why?

Another case which I find somewhat more upsetting is that some operations require a connection to one database in order to perform operations on another, such as database restoration with the pg_restore utility. From the pg_restore man page:

-C, --create

    Create the database before restoring into it. If --clean is also
    specified, drop and recreate the target database before connecting to it.

    When this option is used, the database named with -d is used only to
    issue the initial DROP DATABASE and CREATE DATABASE commands. All data
    is restored into the database name that appears in the archive.

Apparently the connection database does not matter if the -C flag is used, but does matter if the -C flag is not used. This just strikes me as a needlessly confusing and roundabout way of doing things.

Best Answer

I don't think there's a fundamental limitation there, and it's something I'd like to see addressed. I agree with you that it's a bit ugly.

That said, the convention to just use the postgres database as a maintenance database is so strong that the real world impact is negligible.

PostgreSQL has shared catalogs, things like pg_authid and pg_database. It's already possible to start a backend or background worker that connects to no particular database by passing InvalidOID as the database oid to InitPostgres(...) or the bgworker function wrappers. Such a backend can query and AFAIK modify shared catalogs, so I don't see why the executor couldn't be used to service frontend requests instead. Transaction state, user identity, databases, etc are all global.

It might be issues with pg_class and similar relations. Despite the fact that pg_database is a global shared catalog relation, it's listed in pg_class, which gets cloned from the template database. There's no global shared variant of pg_class.

So it's possible the relation cache doesn't handle the "no database" case. I haven't investigated closely. I'll be doing more in 9.5, since some fixes to the background worker API will let me avoid doing ugly hacks relating to shared catalog access by starting a shared-relations-only background worker.

Best Answer

Related Solutions

Postgresql – How does PostgreSQL handle Checkpoints in the middle of a WAL-enabled backup

PostgreSQL Replication for small, low-usage database

Related Question