What do you call numerical coding systems that are left-aligned

database-designnumber formatting

I'm writing an importing mechanism for the NAICS database. I have a few questions about this code-format. I've seen it before and I like it the setup. I'm going to ask some other questions about best-practices and navigation of this data, and I'd like to simply refer to it by the right name.

Essentially this is an example of the data

CODE, TITLE
"21","Mining, Quarrying, and Oil and Gas Extraction"
"212","Mining (except Oil and Gas)"
"2121","Coal Mining"
"21211","Coal Mining"
"212111","Bituminous Coal and Lignite Surface Mining "
"212112","Bituminous Coal Underground Mining "

So if Bituminous Coal Underground Mining was your organization type, your code would be 212112. You could look up to find you were in the business-genre of Coal Mining, and up again to find you were in the business of Mining (except Oil and Gas), and up again to find you were in the business of "Mining, Quarrying, and Oil and Gas Extraction".

What is such a scheme called, is there a term to refer to this kind of organization of data?

I want to call something like recursive-base10 or recursive-decimal is there a name of it though?

Best Answer

That sort of coding format is simply a representation of a tree or forest - each code represents the node's location in a hierarchy where each node (except the root node(s)) has exactly one parent.

That hierarchy could be directly modelling the physical world (as Dewey Decimal Classification does for books in a library) or it could be something more "virtual".

I don't think this sort of coding has a generally recognised name, so if you need a concise phrase for use in your documentation just pick something short that results in a relatively unambiguous acronym, be consistent with its use, and make sure it is included in your terms of reference. Perhaps "Hierarchical Encoded Identifiers", "Hierarchical Classification", "Business Location Codes", or similar.

Don't be hung up on it being numerical: this is not significant as there are no mathematical operations that are meaningful when applied to these codes. The codes could be using any set of characters, even multiple characters per level in the hierarchy, and for this system someone has chosen to limit the alphabet of the encoding to the characters 1, 2, 3, ... It would be just as meaningful for the encoding to use other characters or to have separators between each position, so your last two examples could be MQE/EOG/CM/CM/BaL & MQE/EOG/CM/CM/BUM or MXCCL & MXCCU in similar schemes with different alphabets - all operations would be equally meaningful (searching, lexical sorting, extracting the meaning back from the code, and so forth) or not (mathematical operations) in these encodings.

Related Solutions

Sql-server – Many:Many with Shared Relation

The question I would ask is whether the direct relationship between Anthology and Composer is "important" to the system? There are all kinds of incidental relationships between tangible things that are recorded in any system. However, only certain of these are important for the purposes of the system itself. These are the ones that belong in a relational schema.

If anthologies will always be made up of compositions, and compositions will always have composers, then you can always derive the relationship between anthologies and composers using a query. If you did it that way there would be no risk of inconsistent relationships.

This model would look like this:

ERD

You would have table definitions something like this:

create table Composer
( composer_id int IDENTITY
, composer_name nvarchar(50)
, biography nvarchar(1000) null
, other_composer_info nvarchar(1000) null
, constraint pk_composer primary key(composer_id)
);

create table Composition
( composition_id int IDENTITY
, composer_id int
, composition_name nvarchar(50)
, composed_on datetime
, constraint pk_composition primary key(composition_id)
, constraint fk_composition_composer 
    foreign key (composer_id) references Composer(composer_id)
);

create table Anthology
( anthology_id int IDENTITY
, anthology_name nvarchar(50)
, constraint pk_anthology primary key (anthology_id)
);

create table AnthologyItem
( anthology_id int
, composition_id int
, constraint pk_anthology_item primary key 
   (anthology_id, composition_id)
, constraint fk_item_anthology (anthology_id)
   references Anthology(anthology_id)
, constraint fk_item_composition (composition_id)
   references Composition(composition_id)
);

The good thing about this is that it has no redundant data to get out of synch. The problem is that it doesn't quite fit Shannon's requirement that an anthology be about one composer and that all of the compositions in that anthology must be from the same composer.

Unfortunately, this is not an easy problem to solve with declarative referential constraints. Declarative constraints are great for making sure that everything within a row makes sense. What they aren't built to do is enforce rules between rows.

There is a declarative way to solve this problem, but it involves a trade-off that many people wouldn't like, because it smells an awful lot like violating normalization. Some people would argue (Mike Sherril comes to mind) that this solution doesn't literally violate normalization rules, but people who are less well attuned to the actual rules of normalization will probably look at this solution with skepticism.

So what is this controversial solution? It looks like this:

ERD2

Note that the primary keys of some of these tables have been modified. Here is the SQL DDL for the solution: (You need to scroll it to the bottom to see the magic.)

create table Composer
( composer_id int IDENTITY
, composer_name nvarchar(50)
, biography nvarchar(1000) null
, other_composer_info nvarchar(1000) null
, constraint pk_composer primary key(composer_id)
);

create table Composition
( composition_id int IDENTITY
, composer_id int
, composition_name nvarchar(50)
, composed_on datetime
, constraint pk_composition 
    primary key(composer_id, composition_id) -- NOTE CHANGE!
, constraint fk_composition_composer 
    foreign key (composer_id) references Composer(composer_id)
);

create table Anthology
( anthology_id int IDENTITY
, composer_id int                            -- THIS IS NEW!
, anthology_name nvarchar(50)
, constraint pk_anthology 
    primary key (composer_id, anthology_id)  -- THIS IS DIFFERENT
);

create table AnthologyItem
( composer_id int                            -- THIS IS NEW!
, anthology_id int
, composition_id int
, constraint pk_anthology_item primary key   -- THIS HAS CHANGED.
   (composer_id, anthology_id, composition_id)
, constraint fk_item_anthology 
    foreign key (composer_id, anthology_id)
    references Anthology(composer_id, anthology_id)
, constraint fk_item_composition
    foreign key (composer_id, composition_id)
    references Composition(composer_id, composition_id)
);

Note that the way this works is you have to impose a single composer on an anthology by making the composer part of the anthology's primary key. You do the same thing with composition. Then, when you create an intersection between composition and anthology, you have the composer ID twice. You can then use a check constraint to declaratively enforce that compositions and anthologies have not only a single composer, but the same one.

NOTE: I'm not saying you should do this, I'm just saying you could do this. YMMV etc.

Database Design – When to Use Business Application Layer vs Database Constraint

My 2 cents, although I am not nearly as experienced as many of the others on this site.

If it is related to data integrity (either from a relationship standpoint OR a standardized formatting standpoint), then I firmly believe that it should be handled by the database. If it's simply for a nice presentation, then I think that should be handled by the presentation layer.

In the example you site where the value caused a division by 0 error in a calculation .. if that calculation happened outside the database, then I do not believe a constraint on the data in the database is valid at that point. The database shouldn't need to know what the code does with its values and the code shouldn't need to know what the database does with its values...

The short answer (again, in my opinion) - the database should be responsible for delivering clean and reliable data in a standardized format ... what happens WITH that data is up to the person/code that is requesting it.

Best Answer

Related Solutions

Sql-server – Many:Many with Shared Relation

Database Design – When to Use Business Application Layer vs Database Constraint

Related Question