Track daily view count of youtube videos

database-designsqlite

I am making program that searches youtube for given keyword and need to get links of active videos. Term "active" means that video must have 1000 views per day. Youtube API doesn't give such statistics, only total view count. Now I want to run Java program every day to search for given term and to save data into database. I made database in SQLite and want to ask you if my database model is ok.
So, I have just one table VIDEO. Primary key is "saying key", based on date. For example: 201108040001, 201108040002, 201108040003, 201108050001, 201108050002, 201108050003.
Second attribute is Address, for example: http://www.youtube.com/watch?v=pqha-F3KYHk&feature=youtube_gdata_player, http://www.youtube.com/watch?v=FAIgWGCYGJU&feature=youtube_gdata_player, etc.
Third attribute is Date.
Forth attrubute is TotalViewCount.
Fifth attribute is SearchTerm.

Now I wonder if this database model is good for getting statistical data. For example, I want to get videos that have more than 1000 view per day. To do this I have to subtract TotalViewCounts from 2 dates, and so for every row in a table.

Here is table DDL:

CREATE TABLE VIDEO (
 VideoID        INTEGER         PRIMARY KEY AUTOINCREMENT NOT NULL,
 Address        VARCHAR( 100 )  NOT NULL,
 Date           DATE            NOT NULL,
 TotalViewCount INT             NOT NULL,
 SearchTerm     VARCHAR( 100 )  NOT NULL
);

Best Answer

Well, the biggest thoughts I have are regarding the application and data collection. For example, why are you storing the entire address? You really should only store the "v=" option of the querystring. That's the part of the web address to tells youtube to go to the video. If you store the entire address, then you may have one video stored multiple times when new attributes are added to the querystring (such as &feature=youtube_gdata_player changes to &feature=embedded_player or something). I would just pull of the string between "v=" and the next "&" to get a "VideoId" column.

Secondly, I would include a main table "Video" as such:

CREATE TABLE Video
(
    VideoPK INTEGER PRIMARY KEY IDENTITY,
    VideoId VARCHAR(100),
    PreviousViewCount BIGINT,
    CurrentViewCount BIGINT
)

This would allow you to do the screen scrape from Youtube, get the VideoID ("v=" parameter from the query string), set the PreviousViewCount to the CurrentViewCount and set the CurrentViewCount to whatever you find from the youtube screenscrape. If you wanted to get fancy, you could add in a "SnapshotDate DATETIME" column to insure that at least 20 hours has passed since the last snapshot.

Then you could run a query against the Video table, something like:

SELECT VideoId 
FROM Video
WHERE CurrentViewCount - PreviousViewCount > 1000

That would work out well for your purposes.

If you really wanted to log historical data, I would throw in a second table to do that:

CREATE TABLE VideoHistory
(
    HistoryID INTEGER PRIMARY KEY IDENTITY,
    VideoPK INTEGER 
        FOREIGN KEY REFERENCES Video(VideoPK),
    SnapshotDate DATETIME,
    ViewCount BIGINT
)

Then, when you did the screenscrape, add a new entry into the VideoHistory table.

At least, that's how I would approach it.


Edit: First, I didn't realize this is SQLite, so that syntax won't work for you. Secondly, I misunderstood the question


Since you want to view all videos that have had a count of 1000 or more for the past N days, I would stick with something like this:

CREATE TABLE VideoHistory
(
    HistoryID INTEGER PRIMARY KEY AUTOINCREMENT,
    VideoId VARCHAR(100),
    SnapshotDate DATETIME,
    ViewCount BIGINT,
)

The query for pulling the data is going to be slow (due to large amounts of data and correlated subqueries). But a covering index should help out.

Also, I wouldn't put SearchTerm in the table since your search term probably won't change from one video to another. I would add a secondary table, such as:

CREATE TABLE SearchTerm
(
    SnapshotDate DATETIME PRIMARY KEY,
    SearchTerm VARCHAR(100),
)

It will help remove the duplicate data.