Mysql – Should I normalize the string arrays, or just keep it a simple string

arraylikeMySQLnormalizationperformance

I am working on a project involving cards from the Magic: The Gathering game. To be of any use, I must load the basic information about each card into my database, but some of the fields may contain arrays of text, that is, they may contain multiple values .

For example, a card can be classified to be both "green" and "blue". Or it can have several different creature types specified, and so on.

My first thought was to simply store the values as a string, and then search using the LIKE keyword. In combination with the NOT keyword, it should be possible to search for any desired combination of words, and also exclude certain words.

However, LIKE is going to be (relatively) slow as it can't use an index and has to rely on text pattern matching, and probably has other drawbacks as well. With normalization I could improve speed by assigning an integer key to each keyword, and then create a table that connects each card with the relevant keys (see EBrowns answer for a clearer description).

But with normalized tables I have to deals with joins (severely increased code complexity, not only in the SQL part), transactions (to update all relevant tables or rollback everything), and possibly decreased performance due to joins (see To normalize or not).

AFAIK there are roughly 20000-25000 different magic cards in existence (and more being released every year). With one row for each card, is it worth it overall to normalize the data, or is the benefit not worth the costs? Why/Why not? And how do you decide this question in general?

EDIT: I am currently using a MySQL database, but I would also appreciate answers that apply to any of the other non-commercial/open source databases out there. Who knows – I might switch to a different database at some point.

Answers should not provide a technical solution only, but explain why it would/might be the right solution in terms of best/good enough performance for least amount of effort.

Best Answer

A couple of things you might want to consider:

  1. Maintenance of the data.

    If the data is going to change often then it would be easier and quicker to have the data normalised so you only have to change it in one place and have all the usages of it automatically update. Conversely, if the data hardly ever changes then this is not a consideration.

  2. Full text searching.

    This should be quicker than doing like '%string%' searches and, depending on your database and what's available, may dictate the format you have store your data in.

    In addition there's the obvious thing that storing the text repeatedly in the table(s) will increase the size of your database. This may have an impact on performance or cost you more to make sure you have enough disk space and/or memory available on your server.

    Wikipedia has an article on the basics. It's not an area I'm terribly familiar with, just that a colleague is looking into this and there are trade-offs with how the data is organised when doing other searches. Other links - MSDN, MySQL and PostgreSQL