Can a 2NF relation be M:M

database-design

I have recently learned about 1NF, 2NF, and 3NF. I understand the definitions and the differences. I also had learned earlier about removing M:M relationships from a conceptual model by using bridging 1:M entities (but at that time I was not aware of normal forms). The cool thing that I see now about normalization is that if you start with one big messy relation, the normalization steps automatically take care of the M:M for you, so you don't have to consciously think "ok I am bridging away my M:M."

However, I decided to ask myself hypothetically WHICH level of normalization specifically is responsible for the removal of M:M relationships. I know it is not 1NF, because I could easily come up with examples with are both 1NF and M:M. However, in all the simple examples I contrived, bringing them to 2NF made the M:M go away. But, I am not sure this is definitive since I seem to be not very creative in coming up with exhaustive examples. So I pose this question: is there a 2NF relation which exists and is still M:M, which needs to go through 3NF to have the M:M removed? Or does 2NF consistently disallow M:M by itself?

Thanks!

UPDATE: Let me try to explain myself better. Consider this simple example:

Table BOOK_AUTHOR

| ISBN       | Title     | AuthorID  | AuthorName|
|------------|-----------|-----------|-----------|
|  0001      |  Book1    | A01       | King      |
|  0001      |  Book1    | A02       | Tolkien   |
|  0002      |  Book2    | A01       | King      |
|  0003      |  Book3    | A02       | Tolkien   |
|------------|-----------|-----------|-----------|

This is in 1NF and the PK is {ISBN, AuthorID}.

To go to 2NF we remove the partial dependencies ISBN->Title and AuthorID->AuthorName and end up with:

BOOK_AUTHOR (ISBN, AuthorID)
BOOK (ISBN, Title)
AUTHOR (AuthorID, AuthorName)

Now we have two real "entity" tables, BOOK and AUTHOR, plus the artificial bridge entity BOOK_AUTHOR. And we got there just by going to 2NF.

However, if I had started in a different manner and made an ER with BOOK and AUTHOR and a M:M relationship between them, I'd have had to create the artificial BOOK_AUTHOR table myself, which happened above automatically at 2NF. My question: does it always happen at 2NF, or do you sometimes need to get to 3NF to create the bridge?

However, now that I spell my own question out more I think I see a huge error in it. I was asking at which xNF does the bridge get created, but the fact of the matter is that my 1NF example above is nothing more than a massive bridge. Going to 2NF doesn't "create the bridge"… it's more like it creates the land on either side of the bridge by pulling the real entities out of the bogus huge table. So going to 2NF (and higher) seems to be less about bridging and more about removing data redundancies (which is of course how it was presented to me in the first place)!

Best Answer

The normal forms don't really have anything to do with many-to-many relationships. If you happen to lose some as a byproduct of the normalization process, that's fine, but you won't generally do so. If we consider that we have two tables: Salesman and Product which each have ID fields as their primary key and we have a third table called Specializes which shows which Salesmen specialize in selling which products. This Specializes table would represent the many-to-many relationship since each salesman can specialize in multiple products and each product can be specialized in by more than one salesmen. It would probably look something like this (excuse the awkward formatting, we can't do real tables on StackExchange):

| SalesmanID | ProductID |
|------------|-----------|
|  1         |  1        |
|  1         |  2        |
|  2         |  1        |
|  2         |  2        |
|  2         |  3        |
|------------|-----------|

Obviously, the lack of nulls and repeated rows means that this table is in 1NF. In this table, the only candidate key is {SalesmanID, ProductID} and as such, there are no non-prime attributes. It also contains no non-trivial functional dependencies. Thus, it is necessarily in 2NF, 3NF, and BCNF. I'm also going to assert without proof that it's in 4NF, 5NF, 6NF, and DKNF (to avoid having to explain all of the details thereof). So really, no normal form removes many-to-many relationships, nor are they meant to. The purpose of normal forms is not to remove many-to-many relationships (and I am not actually clear on why you would want to) but rather to remove potential insert anomalies, update anomalies, and deletion anomalies. The primary role of the normal forms is to ensure that each piece of information is represented in a database table precisely once. Having the same information embedded in multiple places leads to problems. But that has nothing to do with many-to-many relationships.

I think that you mean to be asking a slightly different question about something like situations where a many-to-many relationship is embedded in a table which also tries to contain other information, such as if the table above contained the product name in addition to the product number (where name is functionally determined by number). A table like that would either violate 2NF (if the name did not also functionally determine the number) or Boyce Codd Normal Form (if the name did functionally determine the number).

You could also perhaps be thinking of a different situation: when we have two unrelated 1:M relationships in the same table, such as if we were to add a third column to identify which language or languages each salesman speaks.

| SalesmanID | ProductID | Language |
|------------|-----------|----------|
|  1         |  1        | English  |
|  1         |  2        | English  |
|  2         |  1        | Spanish  |
|  2         |  2        | Spanish  |
|  2         |  3        | Spanish  |
|  2         |  1        | French   |
|  2         |  2        | French   |
|  2         |  3        | French   |
|------------|-----------|----------|

As you can see, that table is quite problematic, since we need 6 entries to express that Salesman 2 specializes in 3 products and speaks 2 languages. This is a fourth normal form violation.

Edit:

Upon clarification, it's clear that what he's asking about is a table like the Specializes table, but with extra information about the salesmen and the products, essentially, a table which contains two entity sets and their many-to-many relationship in a single table. So to answer that question directly, yes, you can have lousy tables like that which are in 3NF. The normal form which guarantees that that won't happen is Boyce-Codd Normal Form (BCNF). Here's an example of a lousy table like that which is vulnerable to all kinds of anomalies (insert, update, and delete), but is in 2NF and 3NF.

| SalesmanName | SalesmanID | ProductID | ProductName |
|--------------|------------|-----------|-------------|
|   Alex       |  1         |  1        |  Thingy     |
|   Alex       |  1         |  2        |  Whatsit    |
|   Barb       |  2         |  1        |  Thingy     |
|   Barb       |  2         |  2        |  Whatsit    |
|   Barb       |  2         |  3        |  Whoosit    |
|--------------|------------|-----------|-------------|

So, looking at this table, it's obviously in 1NF. Further, we can identify the non-trivial functional dependencies very straightforwardly. SalesmanName -> SalesmanID. SalesmanID -> SalesmanName. ProductID -> ProductName. ProductName -> ProductID. Next we need to identify the candidate keys. There are four: {SalesmanName,ProductID}, {SalesmanName,ProductName}, {SalesmanID, ProductID}, and {SalesmanID, ProductName}. As such, we have no non-prime attributes. Thus, we are necessarily in 2NF (no functional dependencies between non-prime attributes) and 3NF (no non-trivial functional dependencies where the left-hand-side is not a super key and the right hand side contains a non-prime attribute). However, we are not in BCNF because there do exist non-trivial functional dependencies whose left-hand-side is not a superkey.

Any similar situation will also always not be in Boyce-Codd Normal form because there will be some non-trivial functional dependency whose left-hand-side is not a superkey. Any table like this will essentially have two entity sets each of whom have some attributes. Basically, it will have a left entity set and a right entity set. The left entity set will have some attributes which uniquely identify each left entity and the right entity set will have some attributes which uniquely identify each right entity. Those will be involved in functional dependencies. However, they will each not be candidate keys because you'll have to combine them to get a candidate key for the whole table. As such, they won't be superkeys and there will be a Boyce-Codd Normal Form violation. So BCNF will stop it cold. Anything less than that will only catch some cases. Really, if you only remember one normal form, it should be BCNF.