SQL Server – Why Object Names Can’t Start with a Number

identifiersql server

For example, if I'm creating view with a name '4aii', why does SQL Server care that it starts with a 4? I could call the table Fouraii or IVaii.

Additionally, what does [] do behind the scenes to allow for any string to be used as a name?

A string's a string, amirite?

Best Answer

A string's a string, amirite?

Yes and No: a string is a string, but object / item names are not strings. So while that statement is true, it is also not relevant to the behavior that you are seeing.

Ignoring the conceptual reasoning for the specific rules, the technical answer to "why one works and not the other" is that SQL Server follows (with minimal customization), the Unicode Standard's guidelines for identifiers. The Unicode documentation can be found here:

Unicode® Standard Annex #31: UNICODE IDENTIFIER AND PATTERN SYNTAX

Identifiers that are not enclosed in either [...] or "..." are "regular" identifiers, while those that are enclosed are "delimited" identifiers. Regular identifiers are names that are valid in all contexts (i.e. these are the rules for naming things in this language, software, etc). Delimited identifiers are everything else: names that are not valid and should not work, however, they are given an exemption if you wrap them in either of those delimiters. Most identifiers can be delimited; it is only GOTO labels and variables (including table variables) / parameters that cannot be delimited. The distinction seems to be that identifiers that exist purely for use in the T-SQL language (i.e. not a name that will ever be stored in a data file or log file as meta-data) cannot be delimited (much as you would expect in any language).

Now, the SQL Server documentation isn't exactly complete / correct, but it is correct about the classification of what is a valid "identifier" character (both starting and continuing) coming from the Unicode 3.2. If you want the actual list of rules for both regular and delimited identifiers, I have them documented here:

Completely Complete List of Rules for T-SQL Identifiers

To see the research that proves the relationship between the Unicode 3.2 categorizations and what SQL Server accepts for regular identifiers, please visit:

  1. The Uni-Code: The Search for the True List of Valid Characters for T-SQL Regular Identifiers, Part 1
  2. The Uni-Code: The Search for the True List of Valid Characters for T-SQL Regular Identifiers, Part 2

Addressing concerns noted in comments on this answer:

  1. Yes, even allowing for non-delimited identifiers to start with _, #, and @ is accounted for in the Unicode spec. Section 1.2 addresses customizations to the base rules, and even provides four example customizations: _, #, @, and $. Those 4 four "potential" customizations are the exact same 4 that SQL Server uses. Hence, SQL Server allowing @Variable and #TempTable does not point away from this Unicode document as being the source of the rules.
  2. As noted above, the SQL Server documentation does state that the categorizations used are from version 3.2 of the Unicode Character Database, and they are currently on version 10. You cannot use the current definitions of Ident_*, as found on the Unicode website, as indicating valid / invalid characters. Characters are added to Ident_Start and Ident_Continue in each new version of the Unicode Standard. The only way to see the correct set of characters matching these properties is to download Unicode Version 3.2.
  3. Both of the above two points are dealt with in the two blog posts noted directly above (named "The Uni-Code: The Search for the True List of Valid Characters for T-SQL Regular Identifiers"). Please read those two posts before dismissing this answer as incorrect. There is a lot of nuance behind what is actually going on here which I address in those two posts, showing step-by-step how to match up the list of valid characters.

ALSO, with regards to the question as stated in the title, it depends on how loosely you define "number". Meaning, if you follow along with the research steps as shown in the two posts noted directly above, such that you have created a table to hold the Unicode Character Database v3.2 and a few additional properties, you can get a list of 52 non-letters (mostly "numbers") that are valid characters for starting an identifier via the following query:

SELECT ucd.*
FROM   [v3-2].UnicodeCharacterDatabase ucd
WHERE  ucd.[IDStart] = 1
AND    ucd.[GeneralCategory] NOT LIKE 'L%';

Choosing a few of those characters to test, we can see that they do indeed work:

USE [tempdb];
CREATE TABLE dbo.Ⅳaii ([Col1] INT); -- ROMAN NUMERAL FOUR (U+2163)

CREATE TABLE dbo.ↂaii ([Col1] INT); -- ROMAN NUMERAL TEN THOUSAND (U+2182)

CREATE TABLE dbo.〤aii ([Col1] INT); -- HANGZHOU NUMERAL FOUR (U+3024)

And, just to show that they are "numbers" in more than just their names, the following query proves that they are assigned a numeric value (as shown in the NumericValue column of the [v3-2].UnicodeCharacterDatabase table:

SELECT 1 WHERE N'〤' LIKE N'[3-5]'; -- HANGZHOU NUMERAL FOUR (U+3024)
-- 1

However, they are not numbers that can be used in numeric operations:

SELECT 〤 + 0;
/*
Msg 207, Level 16, State 1, Line 23
Invalid column name '〤'.
*/

Regarding the issue of parsing and needing to be able to determine if 3e2 is a number or identifier: while this is a consideration, and possibly why numbers are excluded from the "Ident_start" Unicode general category, it is not a universal, and not necessarily why SQL Server excludes them. Three points to consider:

  1. While 3e2 by itself is ambiguous, if it were qualified with at least a schema name then it would not be: dbo.3e2
  2. The name 4aii is not really ambiguous at all. Internal parsing would be able to identify that easily enough as not being a potential number
  3. MySQL / MariaDB do not have this restriction. They allow for non-delimited identifiers such as 4aii and 3e, but not 3e2 or 300. I was able to successfully execute the following in MySQL:

    create table 4aii (3e int);
    

So again, the reason that you cannot do this in SQL Server is because SQL Server adheres to the Unicode Standard's recommendation for identifiers. Why those characters were chosen by the Unicode Consortium is not specifically stated, but seems to be at least "best practice". Still, as proven with MySQL, it is possible to parse identifiers that start with a number.