Database normalization, 3 mistake [stucked] – Homework

database-designnormalizationoracleschema

I've been given a homework at university, which says I should make a database with 7 tables [see picture below]. The scheme I made is good, except some normalization is needed, as my tutor told me. (Side note: I have already done normalization with CONTINENTS and COUNTRIES tables)

After I have contacted with my tutor, he told me I'm missing two normalization in this schema and there is a database modelling mistake. However, this is all I know about my mistakes.

My thoughts about my possible mistakes:

  1. In CUSTOMERS table COUNTRY_NAME should be a FK, referencing COUNTRY table's COUNTRY_NAME.

    (In this case the CONTINENT_ID is not necessary in my opinion)

  2. In ORDERS table the TOTAL_AMOUNT is redundant because of ITEM table's QUANTITY. Is it a problem, and if so, how can I get rid of it?

  3. There are lots of 'CITY' columns with VARCHAR2 type. This can be a problem when 1) the user misspell them in other tables or 2) VARCHAR2 is taking up too much storage. Should I make a ZIPCODES table with zipcode and city columns and use it instead?

  4. Repeating records: In WORKERS table there is a column called EMAIL. In my opinion a worker can have multiple e-mail addresses, so in this case: Can this cause me trouble, i.e. causing repeating records?

  5. CUSTOMER/WORKER table and DIVISION/WORKER table have a N:M (?, not so sure) relationship. So is it necessary to make another table between them?

enter image description here

I would be really thankful if somebody can find the missing normalizations and answer my questions.

Responses to clarification requests

About the modelling mistake: I don't have much information about business rules. It's because of this lesson is about normalization, so this is kind of a cherry on the cake. Not a big problem if I won't find it.

About the normalization: I'm pretty sure, that I should decompose the current structure up as mentioned via comments. The question is, what I'm missing here. I'm starting to go blind because of staring too much at them.

Best Answer

Note: my university days are long behind me - I'm coming at this from a real-world perspective. Doesn't mean the answers are the expected ones, but may help give you an argument for your choices. A good instructor would consider such thing....

  1. Agreed - a customer logically would be in a country, not a continent.

  2. While TOTAL_AMOUNT is presumably the sum of QUANTITY_SHIPPED * PRICE for all Items in an Order, I wouldn't consider it a normalization issue. It's a stored calculation. In a real system, the total amount could be impacted by discounts or coupons, and (in the other direction) by sales tax. So, it's arguably appropriate.

  3. First, the CITY fields in customer, country, and division wouldn't necessarily match each other anyway. Second, a given city can have multiple zip codes, and a given zip code can cover multiple cities (at least in the United States). If you repeated the customer's address in the order, that would probably be considered a normalization error. (Arguably - I have had Amazon orders shipped to several different addresses....)

  4. While a person certainly can have multiple email addresses, unless the assignment specified that you had to account for multiple email addresses, I would leave that be.

  5. I assume each customer has one and only one sales rep, whereas each sales rep (worker_id) might have multiple customers; and, that each worker belongs to one and only one division. No need for a table in between, then. Many-to-many relationships are where you normally need some sort of mapping table to define the connections.

I assume that the tool you are using points the connections at essential random points on each table's box; if so, it's hard to be certain all the relationships are indeed set up with the correct directionality; if you haven't, I'd double-check that.

You have a salesrep_id in both customer and order. Again, real-world experience tells me that's wise (Real sales reps often move from one territory to another, and, if customers have real sales reps, you might be very interested in just who sold Customer X what, years later). However, in the context of your assignment, that might be considered a denormalization.

The only other thing that leapt out at me as I looked at the diagram was that there was no product table. I find that somewhat surprising. If there was a separate product table, then of course some of the fields in the items table would move there.