Design / Normalization of a Database

database-designsqlite

I’ve got a question concering the design of a SQLite database. I’m building a database for analyzing purposes that constists of works of literature and metadata concering those entities.
The thing I’m asking is wether it makes sense to split my table into several tables, e.g. buliding a relational database. Right now I’ve got one main table in which my entites are stored. The columns are a unique title, one for the author (who may quite well not be qunique in the database), a publication year, publisher (also not unique) and a lot of specialized fields containing tags connected with the work. Those tags are organized in three categories and each entity can be connected with multiple tags (therefore Tag A 1 and so on). One entity would look like this:

Title | Author | Year | Publisher | Tag A 1 | Tag A 2 | Tag A 3 | Tag B 1 | Tag B 1 | Tag B 3 | Tag C 1 | ...

Is it useful to build a table with authors, one for publishers and than a table for each category of tags and than link those back with foreign keys? What would be the advantages besides no more doubling of data? I’m going to query the database for things like "Titles with more than 2 manifestations of Tag A from 1980 to 1990" or "Authors using Tags A and B".

All my columns are filled with atomic values. If I’m correct I’m looking at brining my database into the first normal form by trying to "eliminate repeating groups in individual tables" – is that correct?

Thanks for your help!

Best Answer

Is really up to you if you want to normalize or not, depends largely on the requirements of your implementation and how are you displaying (or not) the data...

I think an alternative approach to the dataset you present would benefit a lot of a structure such as:

      ----------------
      |   Authors    |
      ----------------
      | AuthorId     |<-.
      | Name         |  |
      ----------------  |
                        |
      ----------------  |
      |  Publishers  |  |
      ----------------  |
      | PublisherId  |<-+-.
      | Name         |  | |
      | IsActive     |  | |
      ----------------  | |
                        | |
      ----------------  | |
      |    Tags      |  | |
      ----------------  | |
  .-->|  TagId       |  | |
 | .--|  CategoryId  |  | |
 | |  |  TagValue    |  | |
 | |  ----------------  | |
 | |                    | |
 | |  ----------------  | |
 | |  |  Categories  |  | |
 | |  ----------------  | |
 | `->| CategoryId   |  | |
 |    | CategoryValue|  | |
 |    ----------------  | |
 |                      | |
 |    ----------------  | |
 |    |    Titles    |  | |
 |    ----------------  | |
 | .->| TitleId      |  | |
 | |  | AuthorId     |--` |
 | |  | PublisherId  |----`
 | |  | TitleName    |
 | |  | Year         |
 | |  ----------------
 | |                 
 | |  ----------------
 | |  |   TitleTags  |
 | |  ----------------
 | `--| TitleId      |
 `----| TagId        |
      ----------------

A junction table will store any amount of tags per title and an additional [tag]Category table could group this tags so you'll be able to query for entire groups instead of 'per tag' search.

Note that by, also, separating the Publishers you could add more fields (like IsActive in the example), so then you can have more granular queries, be more specific on what your looking for.