CHECK Constraint for ###.###.#### Pattern in SQL Server

constraintsql serversql-server-2012

I would like to have a check constraint that enforces this regex pattern for a multi-part code:

^\d{3}\.\d{3}\.\d{4}$

…three numbers, a period, three numbers, a period, four numbers.

Do I need to create a CLR function to enforce the pattern matching, and can the CLR function be referenced in DDL?

Is there a way to enforce the pattern in another way, using LIKE?

Best Answer

The regex in your question is not entirely unambiguous

In most flavors that support Unicode, \d includes all digits from all scripts. Notable exceptions are Java, JavaScript, and PCRE. These Unicode flavors match only ASCII digits with \d.

So in many flavours it would match ١١١.١١١.١١١١ (that character being Arabic-Indic Digit One)

I'm assuming that you just want to match latin 0-9. You can use

DECLARE  @T TABLE
(
Col CHAR(12) CHECK (
                   Col LIKE REPLACE('ddd.ddd.dddd','d','[0-9]') 
                                  COLLATE Latin1_General_100_BIN2
                   )
)

I'm using the binary collate clause so strings like ¾¾¾.¾¾¾.¾¾¾¾ or 10².10².1000 don't pass the check (as can happen on some collations).

Neither of these would match the desired CLR regex and although using the set syntax ([0123456789]) rather than the range syntax ([0-9]) resolves the first one it is still collation dependent whether 2 matches ² or not.

I'm using REPLACE rather than hard-coding the LIKE pattern because I find it more readable what the pattern is checking. It is easier to see three digits and a dot followed by three digits another dot then four digits in that format than parsing [0-9][0-9][0-9].[0-9] [0-9][0-9].[0-9][0-9][0-9][0-9].

It also makes it easier to swap out implementation.

I you do decide that you want the full blown semantics of the CLR regular expression after all you could just change '[0-9]' above to the below (and likely change the column to NCHAR(12) in that case)

'[' +
 /*48-57 Basic Latin*/
N'0-9'+
 /*1632-1641 Arabic-Indic*/
N'٠-٩'+
 /*1776-1785 Extended Arabic-Indic*/
N'۰-۹'+
 /*1984-1993 Nko*/
N'߀-߉'+
 /*2406-2415 Devanagari*/
N'०-९'+
 /*2534-2543 Bengali*/
N'০-৯'+
 /*2662-2671 Gurmukhi*/
N'੦-੯'+
 /*2790-2799 Gujarati*/
N'૦-૯'+
 /*2918-2927 Oriya*/
N'୦-୯'+
 /*3046-3055 Tamil*/
N'௦-௯'+
 /*3174-3183 Telugu*/
N'౦-౯'+
 /*3302-3311 Kannada*/
N'೦-೯'+
 /*3430-3439 Malayalam*/
N'൦-൯'+
 /*3558-3567 Sinhala*/
N'෦-෯'+
 /*3664-3673 Thai*/
N'๐-๙'+
 /*3792-3801 Lao*/
N'໐-໙'+
 /*3872-3881 Tibetan*/
N'༠-༩'+
 /*4160-4169 Myanmar*/
N'၀-၉'+
 /*4240-4249 Myanmar Shan*/
N'႐-႙'+
 /*6112-6121 Khmer*/
N'០-៩'+
 /*6160-6169 Mongolian*/
N'᠐-᠙'+
 /*6470-6479 Limbu*/
N'᥆-᥏'+
 /*6608-6617 New Tai Lue*/
N'᧐-᧙'+
 /*6784-6793 Tai Tham Hora*/
N'᪀-᪉'+
 /*6800-6809 Tai Tham Tham*/
N'᪐-᪙'+
 /*6992-7001 Balinese*/
N'᭐-᭙'+
 /*7088-7097 Sundanese*/
N'᮰-᮹'+
 /*7232-7241 Lepcha*/
N'᱀-᱉'+
 /*7248-7257 Ol Chiki*/
N'᱐-᱙'+
 /*42528-42537 Vai*/
N'꘠-꘩'+
 /*43216-43225 Saurashtra*/
N'꣐-꣙'+
 /*43264-43273 Kayah Li*/
N'꤀-꤉'+
 /*43472-43481 Javanese*/
N'꧐-꧙'+
 /*43504-43513 Myanmar Tai Laing*/
N'꧰-꧹'+
 /*43600-43609 Cham*/
N'꩐-꩙'+
 /*44016-44025 Meetei Mayek*/
N'꯰-꯹'+
 /*65296-65305 Fullwidth*/
N'0-9'+
N']'  COLLATE Latin1_General_100_BIN2