Mysql – Can filtering be done more efficiently with/without sql joins

indexmariadbMySQLnosql

I have a simple problem that has a simple solution with SQL, but would like to explore alternative ways to solve it if they turn out to be more efficient on large scale.

Let's assume that we have a system where we have users, videos and list of videos users have viewed:

CREATE TABLE `video` (
  `id` BIGINT UNSIGNED NOT NULL AUTO_INCREMENT,
  `rank` INT NULL,
  `created_at` DATETIME NULL,
  PRIMARY KEY (`id`),
  INDEX `idx_rank` (`rank` ASC));

CREATE TABLE `user` (
  `id` BIGINT UNSIGNED NOT NULL AUTO_INCREMENT,
  `created_at` DATETIME NULL,
  PRIMARY KEY (`id`));

CREATE TABLE `user_view` (
  `user_id` BIGINT NOT NULL,
  `video_id` BIGINT NOT NULL,
  PRIMARY KEY (`user_id`, `video_id`));

CREATE TABLE `user_friend` (
  `user_id` BIGINT NOT NULL,
  `friend_id` BIGINT NOT NULL,
  PRIMARY KEY (`user_id`, `friend_id`));

The question is: How would we find all the videos that friends of a user have not viewed?

In my solution, I would get the ids of friends (Let's assume user has 3 friends, with ids 1, 2, 3) and build a query:

SELECT
    v.id,
    uv1.video_id
FROM
    video v
    LEFT JOIN user_view uv1 ON (v.id = uv1.video_id AND uv1.user_id = 1)
    LEFT JOIN user_view uv2 ON (v.id = uv2.video_id AND uv2.user_id = 2)
    LEFT JOIN user_view uv3 ON (v.id = uv3.video_id AND uv3.user_id = 3)
WHERE 1
    AND v.id > 100
    AND uv1.video_id IS NULL
    AND uv2.video_id IS NULL
    AND uv3.video_id IS NULL
ORDER BY
    v.rank
LIMIT
    30

Above query will work, but more friends a user has, more joins we'd have to add to the query.

Let's assume we're dealing with 1 billion videos, 100 million users and on averahe user having 50 friends.

Is there a more efficient way to do this in SQL?

Is there a way to do this with non traditional SQL way? Perhaps with noSql with mongodb, cassandra, riak, redis, couchdb or anything else? I'm wondering if there is anything else more efficient purpose built for this.

Any other programming/processing technique that would prove to be more efficient?

I would really appreciate your input.

Best Answer

I don't understand your two choices, but I feel sure that a LEFT JOIN is involved.

Ponder something like this: Step 1:

SELECT uv.video_id
    FROM (
        SELECT uf.friend_id    -- all your friends
            FROM user_friend AS uf
            WHERE uf.user_id = u.id
              AND u.id = 123   -- you
         ) AS f
        LEFT JOIN user_view AS uv  ON uv.user_id = f.friend_id
        WHERE uv.id IS NULL    -- that they did not view

But, beware, the output could be very long.

I don't know if you need DISTINCT after SELECT.

Step 2: Now to peel off the 30 highest ranked ones:

SELECT v.id
    FROM ( the above query ) AS b
    JOIN video AS v  ON b.video_id = v.id
    ORDER BY rank DESC
    LIMIT 30

Unfortunately, the query must find all the un-viewed videos before it can sort to locate the top 30.