Postgresql – the best way to store X509 certificate in PostgreSQL database

postgresqlpython

I'm working on web authenticate system where users will digitally sign random tokens and this will be checked with X509 certificates stored on the server.

Therefore I have to store several X509 certificates (PEM or DER format) in PostgreSQL database. Sounds easy, but I want to have possibility to search certificates with subject, issuer, notBefore, notAfter and similar criteria.

My idea is to have following columns in database: X509data, notAfter, notBefore, subject, issuer etc. Than I will create object (in SQL alchemy) representing X509 certificate with methods like add_new_X509(), find_X509(search criteria) etc. So whenever I will add new certificate with method add_new_X509() it will automatically reads from certificate all data and fill up rest of the columns and put raw certificate into X509data column.

Unfortunately this solution have two disadvantages:

I will store the same information twice (in X509 certificate itself and in separate columns for easy searching)
Whenever I want to read X509 certificate, my application will have to crosscheck notAfter, notBefore, subject, issuer with information stored in original certificate (this is for security reasons, in case someone would try to modify this fields).

So.. anybody have better idea, or suggestion? Maybe somebody see any other security issue that can arise with this solution?

Best Answer

Duplication isn't ideal, but in this case is probably the best choice. Set the table permissions so that the table owner is not the operational day to day database user your app runs as, and only GRANT your app the ability to write to the certificate data column, not the "cache" columns with expiry etc. Have a SECURITY DEFINER trigger function intercept writes to the certificate field and as a privileged user update the indexed cache columns by using an X.509 library to extract the fields from the certificate after verifying it.

Alternately, you could write a PL/Python, PL/Perl, or even a C SQL function that calls an X.509 certificate parser library to extract fields and return them. So you'd say extract_x509_field(cert, 'subject'). Or perhaps even a row-returning form like SELECT subject, issuerName FROM (SELECT extract_x509_fields(cert) FROM the_table) where extract_x509_fields returns a row of all relevant cert data. With this approach you could create functional indexes like CREATE INDEX cert_issuer ON certificate_table( extract_x509_field(cert,'issuer') ); that could be used to match WHERE expressions. You wouldn't need to have table columns for the extracted data at all. The downside is that this would likely be slower as the cert would get parsed multiple times during index creation, during index re-checks, etc.

Either way, it's vital that your application operate as a PostgreSQL user that is not the database owner, not a superuser, and not the owner of the tables and indexes of concern. It should be GRANTed the minimum rights necessary and no more. If you have quite separate tasks (say, read-only vs write-and-update) consider using different database users for them so that even if the "read" part of your app is tricked into trying to write a field/update a cert/etc, it doesn't have permission to.

Custom `DOMAIN`s

I don't think using citext (case-insensitive) is enough^[1]. Using PostgreSQL we can create a custom domain which is essentially some defined constraints over a type. We can create a domain for instance over the citext type, or over text.

Using HTML5 `type=email` spec

Currently the most correct answer to the question what is an e-mail address is specified in RFC5322. That spec is insanely complex^[2], so much so that everything breaks it. HTML5 contains a different spec for email,

This requirement is a willful violation of RFC 5322, which defines a syntax for e-mail addresses that is simultaneously too strict (before the "@" character), too vague (after the "@" character), and too lax (allowing comments, whitespace characters, and quoted strings in manners unfamiliar to most users) to be of practical use here. [...] The following JavaScript- and Perl-compatible regular expression is an implementation of the above definition.
/^[a-zA-Z0-9.!#$%&'*+/=?^_`{|}~-]+@[a-zA-Z0-9](?:[a-zA-Z0-9-]{0,61}[a-zA-Z0-9])?(?:\.[a-zA-Z0-9](?:[a-zA-Z0-9-]{0,61}[a-zA-Z0-9])?)*$/

This is likely what you want, and if it's good enough for HTML5, it's probably good enough for you. We can make use of that directly in PostgreSQL. I also use citext here (which technically means you can simply the regex a bit visually by removing either the upper-case or lower-case).

CREATE EXTENSION citext;
CREATE DOMAIN email AS citext
  CHECK ( value ~ '^[a-zA-Z0-9.!#$%&''*+/=?^_`{|}~-]+@[a-zA-Z0-9](?:[a-zA-Z0-9-]{0,61}[a-zA-Z0-9])?(?:\.[a-zA-Z0-9](?:[a-zA-Z0-9-]{0,61}[a-zA-Z0-9])?)*$' );

Now you can do...

SELECT 'asdf@foobar.com'::email;

But not

SELECT 'asdf@foob,,ar.com'::email;
SELECT 'asd@f@foobar.com'::email;

Because both of those return

ERROR:  value for domain email violates check constraint "email_check"

Because this is based on citext too

SELECT 'asdf@foobar.com'::email = 'ASdf@fooBAR.com';

returns true by default.

Using `plperlu`/`Email::Valid`

As an important note, there is a more correct method of doing this that is far more complex using plperlu. If you need this level of correctness you do not want citext. Email::Valid can even check if the domain has an MX record (example in docs of Email::Valid)! First, add plperlu (requires superuser).

CREATE EXTENSION plperlu;

Then create the function, notice we mark at as an IMMUTABLE:

CREATE FUNCTION valid_email(text)
  RETURNS boolean
  LANGUAGE plperlu
  IMMUTABLE LEAKPROOF STRICT AS
$$
  use Email::Valid;
  my $email = shift;
  Email::Valid->address($email) or die "Invalid email address: $email\n";
  return 'true';
$$;

Then create the domain,

CREATE DOMAIN validemail AS text NOT NULL
  CONSTRAINT validemail_check CHECK (valid_email(VALUE));

Footnotes

Using citext is technically wrong. SMTP defines local-part as being case sensitive. But, again, this is a case of the spec being stupid. It contains its own identity crises. The spec says local-part (the part before the @) "MAY be case-sensitive" ... "MUST BE treated as case sensitive" ... and yet "exploiting the case sensitivity of mailbox local-parts impedes interoperability and is discouraged."
The spec for an email address is so complex, it's not even self-contained. Complex is truly an understatement, those making the spec don't even understand it.. From the docs on regular-expression.info

Neither of these regexes enforce length limits on the overall email address or the local part or the domain names. RFC 5322 does not specify any length limitations. Those stem from limitations in other protocols like the SMTP protocol for actually sending email. RFC 1035 does state that domains must be 63 characters or less, but does not include that in its syntax specification. The reason is that a true regular language cannot enforce a length limit and disallow consecutive hyphens at the same time.

Postgresql – Guidelines on best indexing strategy for varying searches on 20+ columns

Capture the queries that people actually make.
Analyze data from #1; then devise short, multi-column, indexes (2, maybe 3, columns each)
No. Especially not for yes/no flags. It's ok to combine such, as in #2.
See #3.
The order of columns is important in an index. A 5-column index is (roughly) equivalent to the 1-col index on its first column, but is not useful if you don't query on that column.
Probably.
Forcing into memory is counter-productive because it takes away space for caching other stuff.

More discussion: http://mysql.rjweb.org/doc.php/eav

Best Answer

Related Solutions

PostgreSQL – Best Way to Store an Email Address

Custom DOMAINs

Using HTML5 type=email spec

Using plperlu/Email::Valid

Footnotes

Postgresql – Guidelines on best indexing strategy for varying searches on 20+ columns

Related Question

Custom `DOMAIN`s

Using HTML5 `type=email` spec

Using `plperlu`/`Email::Valid`