Mysql – Storing arrays of integers in database (for efficient select)

MySQLnormalization

I am creating a database that will store 100.000 (and probably more in the future) users. While this obviously happens in a table with 1 row per user, every user can (and will) store hundreds of items. In programming language this would mean the user has 2 arrays (or one 2-dimensional array) of integers: a column for the itemid's and a column for the amounts.

My instincts tell me to create a table to hold all these items, with rows like (userid, itemid, amount). However this would result in a huge table. 200.000 users with 250 items each… that's 50 million entries in one table. This, plus the fact that the table will undergo continuous and rapid change, frightens me. (How rapid? I estimate up to 100 modifications per second.)

Typically there will be anywhere between 100 and 2000 users, all adding and removing items, and modifying amounts. These actions can and will happen in programming code. It would go as follows:

  • User starts session, program loads all the users items from the database
  • User modifies the item list
  • Every few minutes, the changes are saved into the database
  • When the user ends the session, it is also saved into the database

It is worth noting that there is a maximum to the number of items a user can store.

Are there any alternatives to using a separate table? Perhaps save the values in a formatted text string? Or is this one of the instances where using a MySQL database is actually a Bad Idea™?

Thank you for your time and insights.

Best Answer

RDBMs were built to handle these kinds of relations and operations. Plus, storing the users and items separately will help with detailed reporting (eg: number of items added today).

With optimized indexes, reading the many items per user is just one lookup away. The items table can be indexed to store the related items (per user) consecutively, thereby giving a array like reading scenario. Just make sure to use page padding to allow for the future updates/inserts/deletes.

RDBMs are pretty fast and can handle hundred-thousand operations per second, so this may not be an issue. When your items are on a different table, updates done per item will perform better than if the items were stored in a text column (as an array) in the same rows as the user. This will give you faster updates, lesser memory required for the operation, quicker locks and releases (and you may not have to lock your respective user), smaller read/writes, etc...