Sql-server – Database design considerations for unused columns with every table has same schema

database-designschemasql serversql-server-2012

I am a web developer so I don't know a lot about databases. The company I joined recently has a very mature Desktop ERP built in .NET and SQL Server, they are providing services to huge corporate clients and there design is working fine. But they didn't develop any web based system. But there database design is quite unusual. Let me explain it and then I will post my questions.

So, Now I joined them to develop them a web based ERP (a replica of there desktop system in web). Since I am building the application from scratch, they have given me liberty to revamp any thing which I think would effect positively.

Now the design is,

  • They have around 150 tables in database.
  • Every table has the same schema definition.
  • They have divided the fields in three categories.
    • Strings (so they assign 50 varchar(250) fields in database).
    • DateTime (so they assign 15 smalldatetime fields in database).
    • Numeric (so they assign 30 Numeric() fields in database).
  • All columns have names as (these names don't terrify the developers, in a week or two they are accustomed and even remembered many of the fields associations):
    • Strings (S1, S2, S3, S4 and so on).
    • DateTime (D1, D2, D3, D4 and so on).
    • Numerics (N1, N2, N3, N4 and so on).
  • As I have told you the schema. Every table consist of 95 columns. And only 15-20 columns are actually been used. The remaining 75-80 columns are NULL.
  • The tables are well normalized and indexes are maintained.
  • Number of rows in most of the tables are less than 1000. Only the transaction table records touches several hundred thousands.
  • The precision for numeric columns are by default (1, 0). When any field is selected to be used then the precision is adjusted as per the requirement.
  • An empty database is of ~4MB.
  • This design makes there development quite easy. Since they have a number of columns and whenever they need a field. They just select the data type, i.e. String or Numeric or DateTime and the next available column is assigned.

I think this information is quite enough. Now I want to ask

  • Since, I don't know a lot about SQL. Is this design viable for web environment (web API which will be called from web client as well as from mobile)?
  • Since, every table has 75-80 NULL columns. Does they cost us a lot of memory in future, when transaction records will touch millions.
  • What are your suggestions to improve this design?

THANKS.

Best Answer

1) Yes. It's a maintenance/documentation nightmare but technically there's no reason it wouldn't work.

2) In general each null will be one bit of storage. So 80 null fields might be 10 bytes per row. The full answer is that it varies depending on data type but with varchar for the most part it's a good rule of thumb.

Some alternatives where you expect fields to be null, are column sets (see https://msdn.microsoft.com/en-AU/library/cc280521.aspx); where the engine munges those often unused columns into a single XML column. I don't think it's a very good fit for your purposes because it messes up indexing and introduces a bunch of limitations - but it's something to be aware of.

I don't think performance is going to be affected though. It will mean that you'll have additional lookups if queries are doing select * instead of just the columns they need (and so it would be hard to create proper covering indexes). But that's a generic problem and not specific to this design.

3) I think you should probably try to forget refactoring it and just go ahead building your application as best as you can with what you have.

A database developer would probably start by importing one of the existing databases into SSDT (a Visual Studio database project) which will create the schema, then they'd do any minor fixes required to get a build going, and see that it can create a usable empty database for use with the software.

After that they'd go over the source code and start using SSDT's right click refactor functionality to give the tables and columns proper names; which will rename all the procedures/functions etc that use it.

This still leaves any dynamic SQL broken, external reports broken, and anything that lives outside of the database broken (the application, web services, queries from customers with direct access!) This means it's a pretty big and collaborative effort to fix.

I'm almost certain they will run across fields which have multiple meanings - because that's what happens when people build software this way. Luckily if they were doing all of this then they can use that opportunity to split dual-usage fields into separate fields.

But can you do that? I imagine it's outside of your scope, and businesses generally don't like paying for such things when "why would you need to do that, the software works as-is!"

Related Question