Mysql – Storing small array of many possibilities

MySQLoptimizationPHP

Let's say I need to store the text of a series of books, line per line in a database, and present them in a (CMS-based) website. I thought the correct setup would be:

  • a 'library' table: containing the ISBN-type info (per book-ID: author, title, year, maybe blurb) plus an array with the empty pages of that book.
  • a 'lines-of-text' table for each book.

The 'array' question here is different from the 'Storing integers for efficient select' and various similar questions (like 'match elements to dates of week') in that the set of (integer) options is not limited… it's sparse in a large and unbounded range AND because the entire use is not within SQL, but within PHP.

I now have my doubts:
[1] Should I put all the book-lines-tables together in one single table (adding a column containing the book-ID)? Or does it not matter really? There's cleanliness/readability, there's performance (once you've chosen a 'book', you're not going to switch — so you'd be searching in a smaller table if there is many, is my idea). [Further info: I have to add (and occasionally delete) books. I also need to assign an owner to each book (that can edit the 'lines'), but that is outside the scope of Mysql I think.]

[2] How should I save the 'empty pages' array of each book?
–a Convert a php array to string (serialize/unserialize) and store as varchar (or tinytext)? Or
–b Make another table of books:empty-pages matches? Or
–c Use SET('1','2','3', …, '999') to foresee all possible empty pages [and swear/blaspheme/explode violently the day I have to add a 1001-page-long book with an empty 1000th page]? Or
–d Calculate on-the-fly from the database?

I guess the 'SET'-approach protects against impossible arrays, from badly-intentioned 'owners' of books (which is rather improbable in my specific case); but sparse it's a lot of overhead and can easily break by extra-long entry. I guess on-the-fly is needlessly slow as it's an unchanging list (calculate once per 'book', not once every 'page'-request!). I have the feeling solution (a) is the best given the circumstance, as I then put it in PHP which will be what's used to create the whole structure and linking (in HTML).

Best Answer

The MySQL SET datatype is a disaster - as are any type of array datatypes (supported unfortunately by many RDBMSs). Check out my answer to another question here. As I note, MySQLs SET is a breach of Codd's second rule - no repeating group datatypes. It is also completely non-portable should you wish to change RDBMS.

If you wish to store that data this way is to have a BOOK table with book_id, name, author, country_of_publication, subject.... Then have a CHAPTER table with book_id, chapter_id, chapter_name, chapter_sub_subject.... and then a PAGE chapter with book_id, chapter_id, page_id, pager_number, ... images... other stuff... then LINE book_id, chapter_id, page_id, line_id, line_text...

This conforms to the relational model. I'm not clear on why you'd want to do this. What's wrong with TEXT datatype? Also, you might want to look into FTS (Full Text Search) - MySQL's native FTS (available on both MyISAM and InnoDB now). There's also Lucene, Sphinx and Spider (there may well be others - Google is your friend).

[EDIT in response to OP's comment]

I had the feeling it was wrong to make identically-structured tables for each book

The whole point is that you should have identically structured tables for all books - you make the structure general enough to cope with (virtually) every possibility.

but I also though it would be slower to search through larger table (containing ALL sentences not just 1book's worth) to (re)construct a page, as well as doing an extra WHERE book_id = 'N' match...

Use indexes on book_id, chapter_id and maybe even page_id. Also, if you're going to be searching in your entire corpus for a particular word, then condsider Full Text Indexing.

Presumably this extra overhead is near-identical to me first sending mysql to the single-book table. This leaves me with the query of how to store a small array for use in PHP --- I'm leaning towards VARCHAR with (un)serialize.

Be aware that the maximum size of VARHCHAR is 65535 bytes - some of James Joyce's sentences might go beyond that :-), but it's unlikely that even Joyce went beyond the 4GB of a TEXT datatype :-)

I Googled "storing books in MySQL" and came up with this which you may find of interest, and interestingly up popped this on the first page of the same search.