Database Design – ‘Lists’ storage

database-design

I'm "about" to implement a feature on one of my apps which requires storing a lot of data (PlayLists).

Assume the following scenario:

  • A user can have multiple playlists
  • These playlists have up to 500k entries (maybe even more)
  • I must be able to get, remove, re-sort or query the whole playlist or by using indexes
  • Currently a playlist for each user; but
  • I'm about to implement a feature that will allow users to have more than 1 playlist.
  • About 3500 entries per playlist. Some can go as high as 590k.

Playlist entries are represented by the following class:

public class Item {
  private Long owner; // User
  private String data;
  private String name;
  private String author;
  private int index; // Index of this item on the list
}

TL;DR: List< Item > for every user (more than 1 per user) where this list can extend to 500k+ entries.

Currently, I have SQL-based storage for this, however, it's slow as heck, specially when I need to shift indexes – slow up to the point of my db driver thinking that the connection failed and leaked.

To explain "shift indexes": Let's say I'm removing an item at the 3rd index (i = 3), this means that all other items after it need to shift one index down, so I can later on request index 3 and get the item that was previously index 4.

Currently, I decrement the column value Order for that playlist for all elements after the one I deleted.

Current PlayList Items Table design

Id (Ordering/Index) | PlayList (Id) | Owner (User) | Name | Author | Data

Question

Which would be a better/more efficient way of storing this kind of "bulk" data, while still being able to query and modify it?

I'm ok with changing/implementing a new database software for this kind of data, since it's really big (I currently use Redis and MariaDB). However, I would love to avoid having to.

Best Answer

Hmmm... 1 person can have 590k tracks? Difficult to imagine, however, I have come up with a schema which might make a good starting point. You may want to have a playlist table for each user depending on how many users you have.

I have used PostgreSQL - you should be able to readily translate to another RDBMS - but, if you are going the Open Source route, I would recommend PostgreSQL.

I have also provided a minimal amount of DML (INSERT statements) which should let you see where I'm coming from.

The playlist table is an example of an Associative Entity - otherwise known as a joining table or a many-to-many table - they are very useful.

Obviously, I don't have a real dataset, but I believe that the SQL updating a user's order by 1

UPDATE playlist SET pl_order = pl_order + 1 
WHERE pl_user = 123 
AND pl_order > order_to_be_deleted

should not be too onerous. Get back to us if you're still having problems. Note, that it should be within a single transaction!

Note also that you can put a lot of your application logic into the database creation script itself - various rules can be enforced without having to write a word of application code!

CREATE TABLE my_user  -- "user" is a keyword in PostgreSQL
(
  user_id SERIAL, CONSTRAINT user_pk PRIMARY KEY (user_id),
  user_name VARCHAR(25)
  -- other fields?
  --
);

CREATE TABLE song
(
  song_id SERIAL, CONSTRAINT song_pk PRIMARY KEY (song_id),
  song_name VARCHAR(50) NOT NULL,
  song_artist VARCHAR(50) NOT NULL,
  song_writer VARCHAR(50), -- may be unknown, i.e. NULL
  song_label VARCHAR(50) 
);


CREATE TABLE playlist
(
  pl_user_id INTEGER, -- NOT NULL not required, see CONSTRAINTs.
  pl_song_id INTEGER, 
  pl_order   INTEGER,

  CONSTRAINT playlist_pk PRIMARY KEY (pl_user_id, pl_song_id),
  CONSTRAINT playlist_user_fk FOREIGN KEY (pl_user_id) REFERENCES my_user (user_id), 
  CONSTRAINT playlist_song_fk FOREIGN KEY (pl_song_id) REFERENCES song (song_id),

  CONSTRAINT pl_user_song_uq UNIQUE (pl_user, pl_song) 
  -- can't have same user and song combo!

  CONSTRAINT pl_user_order_uq UNIQUE (pl_user, pl_order) 
  -- can't have ties of preference

);

CREATE INDEX pl_user_ix ON playlist (pl_user_id);
CREATE INDEX pl_song_ix ON playlist (pl_song_id);

CREATE INDEX pl_user_song_ix on playlist (pl_user_id, pl_song_id); 
-- this is an index which may help with the UDATE query above.


INSERT INTO user (user_name) VALUES ('Fred'), ('Bill'), ('Mary');

INSERT INTO song VALUES ('U2',          'Streets have no name',  NULL, NULL); 
INSERT INTO song VALUES ('Thin Lizzy',  'Boys are back in town', NULL, NULL); 
INSERT INTO song VALUES ('Cranberries', 'Something else',        NULL, NULL); 


INSERT INTO playlist VALUES 
(1, 1, 1),
(1, 2, 2),
(1, 3, 3),
(2, 2, 1),
(2, 3, 2),
(2, 1, 3),
(3, 3, 1),
(3, 2, 2),
(3, 1, 3);