Mysql – Storing arrays in MYSQL

MySQLnosql

I've googled. I keep getting answers that have to do with a PHP function named "serialize". I won't be working with PHP, so that solution won't work for me sadly.

I worked with a NoSQL db (MongoDB I think) before, and I basically define a collection as such

images [
    {"title": "Cute puppy", "file"="cute.png", "tags": ["cute", "puppy", "summer", "pool"]}
]

I'm wanting to store multiple values in one row for the tags column (honestly I don't know if I said that right, just began searching about MySQL)

+------------+----------+--------------------------+
| title      | file     | tags                     |
+------------+----------+--------------------------+
| Cute Puppy | cute.png | cute,puppy,summer,pool   |
+------------+----------+--------------------------+

The only idea that's coming to me is if I were to use MySQL to read the row, then just split the tags by a comma, that'd work most likely, but I'd be working with many more "tags" possibly in the millions.

My question really is, is MySQL a good choice if you want to store multiple values in one column?

Best Answer

My question really is, is mysql a good choice if you want to store multiple values in one column?

I would suggest that the answer to this question is no - there is a better choice on all 3 levels for the treatment of arrays in database servers.

1. Simple arrays:

MySQL doesn't really have an array datatype - the closest thing they have is the SET datatype, the functionality of which is very limited. As you note in your question, a search leads to lots of links to PHP code which implements array functionality in the app rather than the db.

PostgreSQL has a an ARRAY datatype with a large range of functions to deal with manipulating, comparing and finding elements in arrays.

2. JSON:

Nowadays, arrays have been largely superseded by JSON - hardly surprising since JSON is basially an array++ (but XML-- :-)). This is true for MySQL and for PostgreSQL.

There are a large number of JSON functions available on both systems - MySQL and PostgreSQL - but more on PostgreSQL. I'll use the terms JSON and JSONB (binary - example is for mongodb but applies to binary JSON generally) interchangeably. MySQL JSON is only binary, PostgreSQL has both - but JSONB is generally regarded as much more efficient. See this post for a good discussion of what PostgreSQL "NoSQL types" are good for. The "NoSQL" is a misnomer; you can do SQL JOINs using these types.

With respect to indexing - MySQL essentially implements this by means of a GENERATED column hack whereas PostgreSQL implements 3 sorts of index (Gin, Btree and Hash - see section 8.14.4. jsonb Indexing here) for JSONB.

Joins can be done between JSON and tables in both MySQL and PostgreSQL.

Overall, it is fair to say that JSON has been around longer and is more mature in PostgreSQL than in MySQL.

Classic SQL:

You can of course implement some "multi-element" functionality by using a classic Associative Entity approach. Associative entities are otherwise known as joining, bridging, many-to-many or m-to-n tables. This strategy is outlined here with an example using students enrolling in university courses. 1 course can have many students and 1 student can attend many courses. The joining table contains the PRIMARY KEY from both the course and student tables and the PK of the joining table itself is the two together - a student can't be in the same course twice!

In favour of the classic approach, Erwin Brandtstetter (a guy well worth listening to about databases!) pointed out in his answer referring to a JSON solution: (Aside: A normalized DB design with basic data types would be way more efficient for this.). JSON is not a panacea and judgement must be exercised in its use. There's been a lot of bandwagon-hopping in recent years with the NoSQL "Cambrian Explosion" from ~ 2008 - 2015 (Mongodb especially springs to mind). Now virtually all of the NoSQL systems have or are trying to put SQL interfaces and instrumentation into their systems. For my money, if your arrays are relatively simple, stick with the classic approach, otherwise go with JSONB.

SQL wasn't designed for and is not good at manipulating comma-separated lists of values. In any case they are a breach of Rule 2 of Codd's Rules which states that Each and every datum (atomic value) in a relational data base is guaranteed to be logically accessible by resorting to a combination of table name, primary key value and column name.. With .csv lists a further positional parameter is required which SQL doesn't deliver but various systems have "tricks" to get round this. SQL's forté is "slicing and dicing" records made up of small simple atomic datums.

PostgreSQL is the F/LOSS (Free/Libre and Open Source Software) system which comes closest to offering the best for all 3 array methods. MySQL is more widespread and possibly has better inbuilt and 3rd party replication and/or sharding, but PostgreSQL is more standards compliant and has been making strides in those areas recently. It also has superior JSONB functionality.

p.s. - oh, yes, I forgot - welcome to the forum! :-)