SQL Server – Benefits of Storing Multiple Values in One Field

database-designdenormalizationsql server

During our last weekly meeting, a person that has no background experience in Database Administration brought up this question:

"Would there be a scenario that justifies storing data in-line (string) instead of several lines?"

Let us assume a table called countryStates where we want to store the states of a country; I'll use USA for this example and will not list all the States for the sake of laziness.

There we would have two columns; one called Country and the other called States. As discussed here, and proposed by @srutzky's answer, the PK will be the code defined by ISO 3166-1 alpha-3.

Our table would look like this:

+---------+-----------------------+-------------------------------------------------------+
| Country | States                | StateName                                             |
+---------+-----------------------+-------------------------------------------------------+
| USA     | AL, CA, FL,OH, NY, WY | Alabama, California, Florida, Ohio, New York, Wyoming |
+---------+-----------------------+-------------------------------------------------------+

When asking this same question to a friend developer, he said that from the data traffic size point of view, this might be useful, but not if we need to manipulate this data. In this case there would have to be an intelligence on the application code which could transform this string in a list (let's say that the software that has access to this table needs to create a combo box).

We concluded that this model is not very useful, but I got suspicious that there might be a way to make this useful.

What I'd like to ask is if any of you already saw, heard or done something like this in a way that really works.

Best Answer

To start with, the current Question title referring to "storing data as string instead of columns" is a little confusing. When speaking of storing data as strings instead of something else, that usually refers to serializing everything to a string format instead of a proper / strong datatype (e.g. INT or DATETIME). But if asking about storing data as multiple values in a single field as opposed to separate rows, that is a bit different. And to be fair, while concatenating values is most easily done with strings, it can also be done with INT and BINARY types as well, either by bit-masking or similarly reserving certain positions to have different meanings. Since the second interpretation is what is actually being asked about, based on the text of the Question, let's address that.

In a word: No. If you are storing actual data points then it will only bring pain (in terms of code and performance) as it is unnecessary complication. If it is a value that will only ever be stored as a single unit, updated as a single unit, and never disassembled within the database, then that could be ok as it is roughly analogous to storing an image or PDF. Otherwise, any attempt to parse the data will invalidate using any indexes (e.g. using LIKE '%something%', or CHARINDEX, or PATINDEX, or SUBSTRING, etc).

If you need to store separate values in a single field of a single row then there are more appropriate means of doing that: XML or JSON. These are parseable formats (XML / JSON) and XML can even be indexed. But ideally this data would be stored in properly-typed fields so that it can be truly useful.

And please do not forgot that the purpose of an RDBMS is to store data such that it can be retrieved and manipulated as efficiently as possible, within the constraints imposed by being ACID-compliant. Retrieving concatenated values is bad enough due to the need to parse the values first, and that is not indexable. But manipulating often means replacing the entire blob just to update a part of it (assuming that no pattern exists to use with a REPLACE function). The XML datatype at least allows for XML DML for simplistic updates, though those are still not as fast as a simple update of properly modeled data.

Also, given a scenario such as what is shown in the Question above, by concatenating all of the StateCodes together, you would be unable to Foreign Key (in either direction) those values.

And what if the business requirements change over time and you need to track additional properties of these items? In terms of "states", what about the capitals, or population, or a sort-order, or anything else? Stored properly as rows you can add more columns for additional properties. Sure, you can have multiple levels of parsable data, such as |StateCode,Capital,Population |StateCode,Capital,Populate|... but hopefully anyone can see the problem growing exponentially out of control. Of course, this particular issue is rather easily dealt with the XML and JSON formats, and that is their value as mentioned above. But you would still need a very good reason for using either of those as an initial means of modeling as neither will ever be as efficient as using discrete fields in separate rows.