SQL – How to Design Database Tables Using Normalization

denormalizationnormalizationsql server

I have designed my tables in below manner.

Countries Table:-

PkId --- + Country Name --- + StatusFlag
1        | India            | L
2        | China            | L
3        | Sri Lanka        | L

StateNames Table:-

PkId --- + CountryFkId --- + State Name --- + StatusFlag
1        | 1               | Maharastra     | L
2        | 1               | Madhya Pradesh | L 
3        | 1               | Utter Pradesh  | L

Cities Table:-

PkId --- + StateFkId --- + City Name --- + StatusFlag
1        | 1             | Mumbai        | L
2        | 1             | Pune          | L 
3        | 1             | Nagpur        | L

PinCodes Table:-

PkId --- + CityFkId --- + PinCode --- + Area ---       + StatusFlag
1        | 1            | 400037      | Antop Hill     | L
2        | 2            | 412206      | Ambade         | L 
3        | 3            | 441108      | Ashta          | L

Customers Table:-

 PkId --- + CustomerId --- + CustomerName --- + PinCodeFkId--- + StatusFlag
 1        | C00001         | John             | 1              | L
 2        | C00002         | Ram              | 2              | L 
 3        | C00003         | Anwar            | 3              | L

Query:-

Select C.CustomerId, C.CustomerName, P.Area, CT.CityName, S.StateName, CNT.CountyName
From dbo.Customers C
Inner Join dbo.PinCodes P On C.PinCodeFkId = P.PkId
Inner Join dbo.Cities CT On P.CityFkId = CT.PkId
Inner Join dbo,StateNames S On CT.StateFkId = S.PkId
Inner Join dbo.Countries CNT On S.CountryFkId = CNT.PkId

But my colleagues told that if I write like this, then huge logical reads will occur. Also they told that these table structures follow too much normalisation. Instead of designing Customers table like above, they have given other suggestion. i.e., Maintaining CountryName, StateName, CityName, PinCode and Area in Customers table itself.

Restructured Customers Table:-

PkId --- + CustomerId --- + CustomerName --- + PinCode ---   + Area ------- + CityName --- + StateName --- + CountryName--- + StatusFlag
1        | C00001         | John             | 400037        | Antop Hill   | Mumbai       | Maharastra    | India          | L
2        | C00002         | Ram              | 412206        | Ambade       | Pune         | Maharastra    | India          | L     
3        | C00003         | Anwar            | 441108        | Ashta        | Nagpur       | Maharastra    | India          | L

Restructured Query:-

Select CustomerId, CustomerName, Area, CityName, StateName, CountyName
From dbo.Customers

In above query, I didn't put any join conditions with other relevant tables. Which kind of table design best suitable for normalisation and better performance when table has huge number of records?

Best Answer

The table structure that your colleagues have suggested might be appropriate for a data warehouse, but it would typically not reduce I/O in an OLTP database.

Consider this--if you run a query for all of the customers in a city that returns 1,000 rows, with the denormalized table, the database engine will have to read the CityName, StateName, and CountryName 1,000 times. Fewer rows will fit on a database page, so it will end up reading more pages to fulfill the query.

With normalized tables, it will only have to read the IDs for those columns (which is very small compared to text fields), and read the actual CityName, StateName, and CountryName only once. More rows will fit on a database page, so fewer pages will need to be read to fulfill the query.

Additionally, think what the denormalized table does to the buffer. If there are 1,000 customer records in the buffer, then the city, state, and country names are also in the buffer 1,000 times. With normalized tables, those names are only in the buffer one time, using much less space. So you'll have more data in buffer with normalized tables.

Related Question