Database Schema – Design for Video Statistics

database-designMySQL

I'm searching for the best way to store the following information:

  • There are videos with different length
  • There are users
  • Users watch videos and may stop watching videos at some point or only watch a part of the video or skip a part

I want to analyse/store which part of a video is seen more often and which are skipped.

Here are the two ideas I have

  1. Create a table with VideoId and section1, section2, … Each section could be 10 seconds and in every row I store the videoId and a int for each section representing the number of views.
    The problem is, that I don't know which is the longest video I have to store statistics for. If it is 1h, I need 361 columns.

  2. Create a table with VideoId, section and views. This would allow videos with all lengths but will result in a really long table.

Bonus question: How to also store which user saw which video at which positions?
My idea: Don't store an int for views but an array of userIds?

Which option is better or perhaps my ideas are completely wrong and you have a better idea?

Best Answer

Idea-1 does not sound good as you have realized yourself, as each video will probably have a different length. You would have numerous rows with empty/null columns.

Idea-2 sounds better and is a more appropriate solution for an SQL database. Also regarding your bonus question, this approach would allow you to create a table with many-to-many relations users<->video_sections, thus you could have the per user, per video section data.

Also, for me, a good fit for this problem seems to be using a NoSQL database (for example MondoDB). NoSQL is better suited to data with non-concrete structure, and also with NoSQL db it would be easier to scale horizontally in case the data grows exponentially (a concern you have identified in your Idea-2). Here is a quick example structure that came off the top of my head:

videos: [
    {
        id: 1,
        name: Video1,
        length: 238,
        views: 3125,
        sections: [
            {
                seq_id: 1   
                views: 10
            },
            {
                seq_id: 2   
                views: 5
            },
            { ... },
            { ... }
        ]

    },
    {
        id: 2,
        name: Video2,
        ...
    }
]

users: [
    {
        id: 1,
        name: User1,
        videos_watched: [
            {
                video_id: 1
                sections_watched: [
                    1, 2, 3, ...
                ]
            },
            {
                video_id: 2
                sections_watched: [
                    1, 2, 3, 4, 5, ...
                ]
            }
        ]

    },
    {
        id: 2,
        name: User2,
        videos_watched: [
            ...
        ]

    },
    ...
]

With NoSQL db and this structure you will be able add sections to each specific video dynamically when needed. As each user goes through a video, section, your application logic will just increment the views for each particular section and add the video/section ID to the list of watched videos/sections for that particular user.

Related Question