How to handle variations of the same data point

database-designrelational-theory

I have a database with three tables:

  • books,
  • authors and
  • books_authors

books_authors is a “junction” table (i.e., it represents a conceptual level many-to-many relationship), so if it has a row with book_id=1 and author_id=5, it means that author 5 is an author of book 1.

Now, the problem is that author names are sometimes written differently depending on the book. For example, in book 1 the name of author 5 is written “洋子”, but in book 2 the name of author 5 is written “ようこ”.

Sometimes these are variations of the same name (ようこ = 洋子) but sometimes they are completely different (e.g., an author changes his or her name after getting married.)

What's the best way to retain these name variations? Should I enter separate rows for the values “洋子” and “ようこ”, and then link them with another ID? Or should I make an array to keep all the variations in a single row?

Additional information

In response to some clarification requests made via comments, I am going to explain several significant aspects in order to contextualize the scenario more fully:

  • Authors can actually have six separate name values (First, First-reading, Last, Last-Reading, Full, Full-Reading) but the only one required is Full. The rest "can be" null.

  • An Author does not have a, say, "base name" in the business domain of interest.

  • I retain Author Names only when they are associated with a Book.

  • Authors can be Artists, Editors, etc. I use (what I call) the "junction" table to keep track of which Role they play for each Book.

  • I think that, usually, multiple Name Variations represent different values of the same domain, as in the case of “ようこ=洋子”, but I believe that sometimes they are different domains (e.g., as in the situation mentioned above, when an Author changes their Last Name after getting married).

  • I want to manipulate the corresponding tables via SELECT operations that, e.g., include the column(s) containing the Authors Names as (a) condition(s) in the WHERE clauses.

  • The full properties of interest regarding an Author are just the six Name (what I call) "fields", their Variants, and the Books with which they are associated.

  • The property whose values uniquely identify an Author instance is only the ID, as far as I can tell.


Note: I'm using Postgres but this question could apply to other relational database management systems.

Best Answer

I may just be repeating @jeffmc's answer, but I would add an author_names table that has a one-to-many relationship with authors, and then make your junction table (books_authors) a junction of books and author_names. (For clarity you may want to rename it books_author_names.) This way you are able to group authors by all their various identities/pseudonyms while being to tell exactly which moniker was used in the publication of a specific book.

I think this is a more precise way of doing it than using a time-based structure, as an author may use multiple identities simultaneously (e.g., Stephen King published both as Richard Bachman and Stephen King for several years).