Storing a music playlist, best approach

database-designusers

I am working on a web project that creates playlists online based on youtube links.

My question is how do I store a playlist on MYSQL. What I was thinking about doing was having a table with basically all information. something like this

example table

I could query this and get information "WHERE playlist_name=bob" and get every song in bobs playlist. Does this seem like a good route? Is it a problem that the table will eventually get huge? what about storing each playlist in a separate table, so bobs_playlist would be a whole new table. This would end up with MANY tables, is that a problem? are both approaches valid? is one better than the other? I just cant determine the best way to organize the information.

Best Answer

I would approach it with this structure:

Diagram

This way you can have a more normalized structure, non-repeating data and simple tables.

In this method you would also be using WHERE PlaylistID=[bobs playlist id] instead.

With this structure's simplicity, you wont have to worry about any size issues for a long time. This structure will quickly handle millions of records. If you get bigger than that you can look into partitioning; that's not necessary for now.

Related Solutions

Should I Link a Fact to Hierarchical Dimension at ALL Levels or only the most granular

I would link at all/most levels. This denormalized star means that yes, the data is redundant, but it typically makes the reporting and analysis a lot easier. Note that this is very different from OLTP normalization, and you don't typically have to worry about redundant data getting out of sync because in a DW scenario data never changes. New facts get added and dimensions get expired and new ones created.

I don't see a Dim_Folder. I would assume that the actual path of the folder would be an attribute of the Dim_Folder. Only the numeric quantity and any degenerate dimensions (http://en.wikipedia.org/wiki/Degenerate_dimension) would be in the fact table. I wouldn't think of the folder path as a degenerate dimension because it keeps coming back in each snapshot (an a folder isn't a transaction).

So you could do something like this:

SELECT AVG(bytes_on_disk)
FROM FACT_Folder
INNER JOIN DIM_Folder
    ON FACT_Folder.FolderDimID = DIM_Folder.DimID
INNER JOIN DIM_Date
    ON FACT_Folder.SnapshotDateID = DIM_Date.DateID
WHERE DIM_Date.Date BETWEEN '20120101' AND '20121231'
    AND DIM_Folder.FolderPath = '/usr/bin/'

See how the DIM_Folder usage makes the set of dim ids small and then, we're assuming some kind of index on snapshot date and then folder dim id (or vice versa).

See how you also now don't need to join on folder at all if you just want the data at a higher level. Since you usually know all this at ETL time, there is a different motivation than in OLTP systems where you want everything to move together when something is changed (leg bone connected to the thigh bone, etc.). In DW scenario, you really don't want anything to move.

So, bam! - total Farm usage analysis:

SELECT DIM_Farm.Farm_Name, SUM(bytes_on_disk)
FROM FACT_Folder
INNER JOIN DIM_Farm
    ON FACT_Folder.FarmDimID = DIM_Farm.DimID
INNER JOIN DIM_Date
    ON FACT_Folder.SnapshotDateID = DIM_Date.DateID
WHERE DIM_Date.Date BETWEEN '20120101' AND '20121231'
GROUP BY DIM_Farm.Farm_Name

Remember stars are really simple for analysis. You NEVER need to worry about inadvertent cross joins in a single non-snowflaked star. When linking different stars, you DO have to watch out. So queries in MOST cases are MUCH simpler in star-schemas. No network traversal and worrying about many-many relationships like in a normalized model.

Sql-server – Best way to store immutable/read-only copies of data for logging

You could treat your reference tables as slowly changing dimensions and perform type 2 maintenance i.e. add a new reference row when any reference data changes. Over time you may choose to horizontally partition your reference tables into "active" and "archive" parts, depending on the data churn you experience. This can be achieved using using SQL Server's built-in partitioning functionality or a roll-your-own approach with two separate tables. Your needs at the time will dictate which.

There is no good reason why your active OLTP table and your logging table should look the same. They perform different roles and have different read and write requirements. If the logging table needs to be wide and sparse then that's what it needs to be. Create the objects to solve the problem you have.

And one last suggestion, which is defininetly from the "clutching at straws" bucket: define TASK_DONE (id int, reference_values xml). Extract your pertinent values at the point in time and save them away. This allows for changes in the reference data schema without having to bring existing log records up to the new schema. That will make historical searching more complicated, of course.

Best Answer

Related Solutions

Should I Link a Fact to Hierarchical Dimension at ALL Levels or only the most granular

Sql-server – Best way to store immutable/read-only copies of data for logging

Related Question