Custom DOMAIN
s
I don't think using citext
(case-insensitive) is enough[1]. Using PostgreSQL we can create a custom domain which is essentially some defined constraints over a type. We can create a domain for instance over the citext
type, or over text
.
Using HTML5 type=email
spec
Currently the most correct answer to the question what is an e-mail address is specified in RFC5322. That spec is insanely complex[2], so much so that everything breaks it. HTML5 contains a different spec for email,
This requirement is a willful violation of RFC 5322, which defines a syntax for e-mail addresses that is simultaneously too strict (before the "@" character), too vague (after the "@" character), and too lax (allowing comments, whitespace characters, and quoted strings in manners unfamiliar to most users) to be of practical use here. [...] The following JavaScript- and Perl-compatible regular expression is an implementation of the above definition.
/^[a-zA-Z0-9.!#$%&'*+/=?^_`{|}~-]+@[a-zA-Z0-9](?:[a-zA-Z0-9-]{0,61}[a-zA-Z0-9])?(?:\.[a-zA-Z0-9](?:[a-zA-Z0-9-]{0,61}[a-zA-Z0-9])?)*$/
This is likely what you want, and if it's good enough for HTML5, it's probably good enough for you. We can make use of that directly in PostgreSQL. I also use citext
here (which technically means you can simply the regex a bit visually by removing either the upper-case or lower-case).
CREATE EXTENSION citext;
CREATE DOMAIN email AS citext
CHECK ( value ~ '^[a-zA-Z0-9.!#$%&''*+/=?^_`{|}~-]+@[a-zA-Z0-9](?:[a-zA-Z0-9-]{0,61}[a-zA-Z0-9])?(?:\.[a-zA-Z0-9](?:[a-zA-Z0-9-]{0,61}[a-zA-Z0-9])?)*$' );
Now you can do...
SELECT 'asdf@foobar.com'::email;
But not
SELECT 'asdf@foob,,ar.com'::email;
SELECT 'asd@f@foobar.com'::email;
Because both of those return
ERROR: value for domain email violates check constraint "email_check"
Because this is based on citext too
SELECT 'asdf@foobar.com'::email = 'ASdf@fooBAR.com';
returns true by default.
As an important note, there is a more correct method of doing this that is far more complex using plperlu
. If you need this level of correctness you do not want citext
. Email::Valid
can even check if the domain has an MX record (example in docs of Email::Valid)! First, add plperlu (requires superuser).
CREATE EXTENSION plperlu;
Then create the function, notice we mark at as an IMMUTABLE
:
CREATE FUNCTION valid_email(text)
RETURNS boolean
LANGUAGE plperlu
IMMUTABLE LEAKPROOF STRICT AS
$$
use Email::Valid;
my $email = shift;
Email::Valid->address($email) or die "Invalid email address: $email\n";
return 'true';
$$;
Then create the domain,
CREATE DOMAIN validemail AS text NOT NULL
CONSTRAINT validemail_check CHECK (valid_email(VALUE));
Footnotes
- Using
citext
is technically wrong. SMTP defines local-part
as being case sensitive. But, again, this is a case of the spec being stupid. It contains its own identity crises. The spec says local-part
(the part before the @
) "MAY be case-sensitive" ... "MUST BE treated as case sensitive" ... and yet "exploiting the case sensitivity of mailbox local-parts impedes interoperability and is discouraged."
- The spec for an email address is so complex, it's not even self-contained. Complex is truly an understatement, those making the spec don't even understand it.. From the docs on regular-expression.info
Neither of these regexes enforce length limits on the overall email address or the local part or the domain names. RFC 5322 does not specify any length limitations. Those stem from limitations in other protocols like the SMTP protocol for actually sending email. RFC 1035 does state that domains must be 63 characters or less, but does not include that in its syntax specification. The reason is that a true regular language cannot enforce a length limit and disallow consecutive hyphens at the same time.
Best Answer
Duplication isn't ideal, but in this case is probably the best choice. Set the table permissions so that the table owner is not the operational day to day database user your app runs as, and only
GRANT
your app the ability to write to the certificate data column, not the "cache" columns with expiry etc. Have aSECURITY DEFINER
trigger function intercept writes to the certificate field and as a privileged user update the indexed cache columns by using an X.509 library to extract the fields from the certificate after verifying it.Alternately, you could write a PL/Python, PL/Perl, or even a C SQL function that calls an X.509 certificate parser library to extract fields and return them. So you'd say
extract_x509_field(cert, 'subject')
. Or perhaps even a row-returning form likeSELECT subject, issuerName FROM (SELECT extract_x509_fields(cert) FROM the_table)
whereextract_x509_fields
returns a row of all relevant cert data. With this approach you could create functional indexes likeCREATE INDEX cert_issuer ON certificate_table( extract_x509_field(cert,'issuer') );
that could be used to matchWHERE
expressions. You wouldn't need to have table columns for the extracted data at all. The downside is that this would likely be slower as the cert would get parsed multiple times during index creation, during index re-checks, etc.Either way, it's vital that your application operate as a PostgreSQL user that is not the database owner, not a superuser, and not the owner of the tables and indexes of concern. It should be
GRANT
ed the minimum rights necessary and no more. If you have quite separate tasks (say, read-only vs write-and-update) consider using different database users for them so that even if the "read" part of your app is tricked into trying to write a field/update a cert/etc, it doesn't have permission to.