Database Design – Databases for Representing Ordered Sets

database-designdatabase-recommendation

In a SQL database, the only way to represent an arbitrarily ordered set is to give every record an "Order" and every time you update or move an item around in this ordering you have to update or somehow maintain the entire list of ranks with a nightly job or something like that.

For example, I can represent the ordered set [C, B, D, A] in this way in a SQL database:

ID  Name   Order
1   A      4
2   B      2
3   C      1
4   D      3

If I want to move an item to a different position in the set, or prepend a new item, I may have to update a lot of items. In general there is a lot of maintenance overhead with this approach.

Querying the data once it is in the database is not an issue for SQL, the issue is the significant maintenance overhead of reordering the set. There is no simple operation in SQL to move an item to a new position in the set. The ordering is arbitrary and user-defined.

I realize that this can be accomplished using SQL, it's just very clunky to perform certain operations like prepending items, or moving an item to a new position. Even this example operation of reversing the order of the set requires a pretty lengthy, complex, query. The type of database I'm looking for might support such an operation natively, or at least more elegantly.

So, if I am designing an application (like Trello, for example) that very heavily involves ordered sets, it seems SQL is not the ideal database technology for me. Are there any databases whose syntax support ordered sets in a more natural way?

These are some CQL3 queries from the Cassandra documentation that seem close to what I'm looking for. This prepends an item to an ordered set.

UPDATE users SET top_places = [ 'the shire' ] + top_places WHERE user_id = 'frodo';

This one will set the value of the item at position 2 in the set. I suspect I could use this to easily perform arbitrary swaps/reorders.

UPDATE users SET top_places[2] = 'riddermark' WHERE user_id = 'frodo';

Unfortunately the documentation also states

And while we may (or may not) relax that rule a bit in the future,
this still means that collections are not meant to be excessively
large. They are not a replacement for a proper modelisation into
tables.

which seems to suggest that ordered sets are not (yet) first class citizens in CQL3.

Best Answer

Most of the major relational DBMSs support structured types in the form of XML or JSON. These are order-preserving. Typically the corresponding programming language (T-SQL, PL/SQL) will have built-in features to manipulate these types much as SQL manipulates columns and rows.

Some relational stores also support ARRAY data types (one example). An item will retain its before and after relationship to those items adjacent to it irrespective of what happens in other parts of the array. Unlike, say, JSON the array itself cannot contain complex types so holding an array of surrogate IDs and pulling the remaining data on demand may be necessary.

If you choose to adopt structured types why not go the whole hog and use a DBMS designed around them. This is the realm of NoSQL. There are any number of products, each with advantages and drawbacks.

Finally I'd mention graph stores. Their schtick is to focus on the connectedness of items. This is appropriate for ordered lists as the defining characteristic is how one item follows another. So the items could be modelled as graph nodes with edges, specifically "follows" edges, linking nodes in the desired sequence.

Having worked with each of these to a greater or lesser extent it is my opinion that none of them is significantly less work in application programming terms, including the straight-up SQL approach which you dislike.