Sql-server – the best approach to managing daily excel flat file imports into SQL Server with varying column counts per file

sql server

I need to load excel spreadsheets into SQL Server on a daily basis. It's simple enough to build an SSIS package but, the problem that I am facing is that the client is allowed to name the column headers as they like, the columns can be in whatever order, and the column count can vary.

This is the best I can get the owner to agree to.

  1. Each client's excel file will have one column that is the same no matter what. This column will be the Tax ID Number.
  2. Each client must use the same format on their excel spreadsheet each time they submit their file.

This is what the final product must look like.

  1. I need to be able to load automate loading all the excel data and headers into a SQL Server database without losing any data with each client having varying column counts.
  2. the asp.net mvc application must be able to query a tax id that is entered by the customer. The query must return the headers from the excel spreadsheet and all the corresponding data associated with that record.

I tried to get better consistency but, this is the best I can get. I have normally worked with data that had predefined parameters so, I would appreciate any help with suggestions to help me move this forward.

Best Answer

You're dealing with unstructured data here I'm afraid and honestly, I feel your pain. I don't think you can really do what you want in SSIS - partially because it will expect your Excel format to be fixed. Even if you get it fixed, with SSIS and Excel - you'll have problems with how it guesses the metadata types for columns based on a sample of the data in the sheet - with the Excel provider, you only have 2 choices for text data for example - NVARCHAR(255) and NVARCHAR(MAX). It can get ugly pretty fast if you try to enforce data-typing at a data level even with fixed columns but varying lengths.

As such, I think you'll need to embrace the unstructured nature of the data and store it as such - storing it as an XML/JSON data types and converting the data from the spreadsheet and letting the MVC application parse the XML. The only other way I can suggest is using something like a key-value store. There's huge drawbacks to this from a relational perspective though. This is commonly known as EAV (Entity-Attribute-Value)

You might also want to consider non-SQL Server alternatives that can take unstructured data and search on it as well.