Ms-access – Normalizing a database without access to source data

ms access

I've started a new role which deals with large amount of related data. Our source of all this data is various Excel dumps pulled from databases we do not have access to. The previous person filling this role used a dozen or so Excel files to gather these data files, manipulate them, and create reports.

I have started moving the dumps to an Access database. I have noticed a lot of the Excel data is related and should probably be normalized. What I have currently been doing is creating a table for every data dump, and importing them to Access, and using a number of queries to replicate the dozens of data manipulations and reporting.

Are there still benefits to normalizing data where my only source are Excel dumps out of a warehouse?

How would I normalize data, when I do not have the ability to change the format of how the dumps are sent to me?

Additionally, my plan (budget-depending) is to move from Access to a MS SQL Database.

Best Answer

Yes there are significant benefits to data normalization if you are willing to do the work to achieve and maintain it. The two fundamental benefits to normalization are:

  • Data Integrity
  • Query Flexibility

The simple approach to normalization is to create a table for each person, place, thing, concept, or event. By doing this, you have each characteristic of each person, place, thing, concept, or event of interest to you in one and only one place in the database. The benefit of data integrity is achieved for when you insert and update data. Because you have normalized you don't have redundant copies of the same characteristic in many places across the database, each with a different value, that you have to remember to update and keep in sync. Second, you have only one place in the database to program checks when inserting or updating data to make sure the data is valid. Since your data is coming in from many excel spreadsheets, normalizing your data gives you the opportunity to implement data integrity checks on the load to make sure you analysis is based on accurate data. The benefit of query flexibility is achieved for when you want to read and analyze the data. Because you have normalized the data you can connect up the tables in a flexible way based upon the question about the data you want answered, including only what you need to answer the specific question. Second, this enables the database to return the answers to your questions much faster than if it had to scan through all the data, including the data not relevant to your question, in your un-normalized tables .

Access is a simplified DBMS and does include a basic SQL processor that allows you to write queries and thus take advantage of the benefits of normalized data. If you are eventually going to move to SQL Server, which is a full featured DBMS, then normalizing your data now will ease the transition and let you take advantage of the full capabilities of SQL Server and its very rich implementation of SQL.

As I mentioned in the beginning, to achieve these benefits you must be willing to do the up front programming to translate the data coming in from your various excel dumps and map the rows and columns in those spreadsheets to your normalized tables. This is not a trivial exercise but doable using Access programming. One approach would be to create tables that replicate the data as in the source and load the data into them. These are known as stage tables. Once you have the un-normalized data in Access tables you can then more easily write access code using SQL to extract the data from those stage tables, normalize it, identify data quality issues (say the same characteristic in two different excel dumps that should have the same value but do not), and load it into your normalized tables. This is the common method for normalizing data coming from an un-normalized source very common in subject area based data warehouses.

You will find this additional level of effort to be well worth it though once you have high quality, normalized data in your access database. You report consumers will see that you are a real data professional when you show them examples where data quality was poor and you discovered that fact so it can be corrected in the sources. Likewise, when they ask for a new report that analyzes the data in a very different way, you can rapidly create the new report using SQL to combine the data in the normalized tables in this very different way that wasn't originally anticipated. They will be very impressed that you are able to do this quickly and easily!

I hope this helps explain why normalization would be of benefit to you.

Related Question