SQL Server and Entity Framework – Self Referencing Tables

entity-frameworksql serversql-server-2008

I have many tables in a database and the tables are nearly identical. There are limitations currently on what objects reference other objects. For example I have company,contact and project. Contact is a child of company but no relation to project although some of our users would like there to be.

Since the tables are all similar I wondered if it would be a good idea to create a self referencing table with a type property on each record. Each record in the table would be a specific type ie company,contact etc and have X number of foreign key fields which would reference the same table.

This gives our users the ability to create their own record types and link them however they choose.

Is this going to cause me any problems in the future that I haven't thought of? Our system allows users to create their own fields on an object so this seems perfect. But they can't also report on this data. Performance wise I am thinking this should be OK as data would only ever be in 1 table.

Best Answer

Relational modelling starts from the identification of classes of "things." To these we apply normalisation to remove data update anomolies. The results are implemented as tables with a column for each attribute we identify. We do not start from a bunch of domains, look around for "things" with the same bunch of domains, shoehorn them into the same table and call it an Entity. This is an anti-pattern generally called "One True look-up Table". Implementing it compromises what constraints can be defined & how foreign keys can be established. Tables can become bloated with NULL-able columns. This increases the IO required to fulfil queries. Index depth increases, slowing even key lookups.

Now, at some stage in every project the science of normalisation bumps up against the art of model design. Decisions have to be made and compromises have to be accepted. In your case I can see one of three models emerging.

The first is the normalised one you have currently. This is how RDBMS's are supposed to work. Mappings are obvious; queries are fast. Exansion, however, requires schema changes and these can be slow to implement, depending on your company's abilities.

The second is a super type / sub type model. You treat Company, Contact etc. as specialisations of another type, say LegalPerson. This super type contains the common attriubtes. Each sub type has the attributes specific to it. There is a one-to-one relationship between the super type and the sub type i.e. a given "thing" must have a row in the super type and exactly one sub type. This can be difficult to enforce in current SQL products. This allows recognition of common columns between entity types. Relationships between the types are held in specific foreign key columns or intersection entities. The insertion and manipulation of data is more complex. Again, expansion of the schema will reqire development work.

Finally there is the entity-attribute-value (EAV) model. Here you are effectively building a database-within-a-databse. The objects you model are "Table" and "Column" instead of "Customer" and "Company". Thus your application can ammend the logical schema at will. The price is application complexity and run-time performance.

EAVs have a bad reputation. There are many posts explaining why perdition will descend upon you simply by uttering the name aloud. Others defend the practice. I myself have an EAV systems holding 300M values. It works well for us. But I did this intentionally, in full understanding of the implications, with a DBA team to support me, to solve a specific problem we could not address otherwise. It was not done because some of the tables looked a bit like others.