What do you call numerical coding systems that are left-aligned

database-designnumber formatting

I'm writing an importing mechanism for the NAICS database. I have a few questions about this code-format. I've seen it before and I like it the setup. I'm going to ask some other questions about best-practices and navigation of this data, and I'd like to simply refer to it by the right name.

Essentially this is an example of the data

CODE, TITLE
"21","Mining, Quarrying, and Oil and Gas Extraction"
"212","Mining (except Oil and Gas)"
"2121","Coal Mining"
"21211","Coal Mining"
"212111","Bituminous Coal and Lignite Surface Mining "
"212112","Bituminous Coal Underground Mining "

So if Bituminous Coal Underground Mining was your organization type, your code would be 212112. You could look up to find you were in the business-genre of Coal Mining, and up again to find you were in the business of Mining (except Oil and Gas), and up again to find you were in the business of "Mining, Quarrying, and Oil and Gas Extraction".

What is such a scheme called, is there a term to refer to this kind of organization of data?

I want to call something like recursive-base10 or recursive-decimal is there a name of it though?

Best Answer

That sort of coding format is simply a representation of a tree or forest - each code represents the node's location in a hierarchy where each node (except the root node(s)) has exactly one parent.

That hierarchy could be directly modelling the physical world (as Dewey Decimal Classification does for books in a library) or it could be something more "virtual".

I don't think this sort of coding has a generally recognised name, so if you need a concise phrase for use in your documentation just pick something short that results in a relatively unambiguous acronym, be consistent with its use, and make sure it is included in your terms of reference. Perhaps "Hierarchical Encoded Identifiers", "Hierarchical Classification", "Business Location Codes", or similar.

Don't be hung up on it being numerical: this is not significant as there are no mathematical operations that are meaningful when applied to these codes. The codes could be using any set of characters, even multiple characters per level in the hierarchy, and for this system someone has chosen to limit the alphabet of the encoding to the characters 1, 2, 3, ... It would be just as meaningful for the encoding to use other characters or to have separators between each position, so your last two examples could be MQE/EOG/CM/CM/BaL & MQE/EOG/CM/CM/BUM or MXCCL & MXCCU in similar schemes with different alphabets - all operations would be equally meaningful (searching, lexical sorting, extracting the meaning back from the code, and so forth) or not (mathematical operations) in these encodings.