Sql-server – Minimizing/avoiding data repetition and choosing between compound keys and unique indexes

database-designsql serversql server 2014unique-constraint

I am absolutely new to database development and i'm trying to build a question bank basically.

The questions have to be categorized based on Class, Subject and Topic.
Since each Class can have many Subjects and each Subject can be in many Classes I made a junction table called tb_SubjectClass with Subject and Class being primary keys.

To uniquely identify any Chapter I would need a combination of Subject and Class so I created tb_Chapters with Chapter, Subject and Class as primary keys with Subject and Class coming from tb_SubjectClass. I then repeated the same thing with tb_Topics as each topic can only be identified with a combination of Subject, Class and Chapter.

I have placed the relationship diagram below. It may also be relevant to know that I am expecting around 50000 questions (with equations and diagrams) to be stored in this database.

db relationship

Main question: If I decide to stick to compound/composite keys, is my design proper or can it be improved further ?

Side question: what would be the best approach in my case, with the following considerations in mind:

  • expected records around 50000
  • questions table (tb_QnA) is going to be the most queried table (for selecting and for adding/deleting data)
  • the tables tb_Class, tb_Subjects and tb_SubjectClass are not going to change at all
  • ease of writing queries when connecting the database to a c# winform application (for data entry and generating reports)
  • the project is going to be desktop based

I have mainly been following the Database design tutorial on tekstenuitleg.net and video tutorials by WiseOwlTutorials. I have also recently started reading the book Relational Database Design Clearly Explained by Jan L. Harrington.

Best Answer

Does making foreign keys lead to data duplication / repetition or is it just a reference / pointer to data stored in the parent table ?

Foreign key restricts the possible values in one table's column accordingly to the presence of the same values in another table's column. Values are stored in the both tables so they are duplicated. To avoid (in fact - to reduce) duplications normalization should be used. That's means you have the reference table where all distinct values are stored and all other tables contains only IDs of that values.

Should compound keys be avoided wherever possible ?

No. Even more, compound keys are the only way to speed up the query with complex conditions that involve multiple columns. Compound keys can be avoided by the price of performance.