Mysql – Query optimization (from joining two tables)

join;MySQLunion

Alright so I have this query which regularly shows up in mysql slow log:

    (
        SELECT  user_visit_logs.date,user_visit_logs.user_id,invalid_hits,
                unique_hits,non_unique_hits,earned,sites_surfed,earnings
            FROM  user_visit_logs
            LEFT OUTER JOIN  surfer_stats
               ON  user_visit_logs.user_id = surfer_stats.user_id
              AND  user_visit_logs.date = surfer_stats.date
            WHERE  user_visit_logs.user_id = '218' 
    )
    UNION  
    (
        SELECT  surfer_stats.date,surfer_stats.user_id,invalid_hits,unique_hits,
                non_unique_hits,earned,sites_surfed,earnings
            FROM  user_visit_logs
            RIGHT OUTER JOIN  surfer_stats
               ON  user_visit_logs.user_id = surfer_stats.user_id
              AND  user_visit_logs.date = surfer_stats.date
            WHERE  surfer_stats.user_id = '218' 
    ) 

Which basically joins daily data from two tables, as follows:

+------------+---------+--------------+-------------+-----------------+--------++--------------+----------+
|           DATA FROM TABLE 1                                                  ||    DATA FROM TABLE 2    |
+------------+---------+--------------+-------------+-----------------+--------++--------------+----------+
|    date    | user_id | invalid_hits | unique_hits | non_unique_hits | earned || sites_surfed | earnings |
+------------+---------+--------------+-------------+-----------------+--------++--------------+----------+

As you can see, "date & user_id" are the common columns on which both tables are joined.

The data is then used to generate a HTML table through PHP using mysqli_fetch_assoc.

Mysql explain extended output:

+----------+-------------+-----------------+------+------------------+---------+---------+------------------------------------+------+----------+------------+
|    id    | select_type | table           | type | possible_keys    | key     | key_len | ref                                | rows | filtered | Extra
+----------+-------------+-----------------+------+------------------+---------+---------+------------------------------------+------+----------+------------+
|    1     | PRIMARY     | user_visit_logs | ref  | user_id,LinkDate | user_id | 8       | const                              | 30   | 100.00   | 
+----------+-------------+-----------------+------+------------------+---------+---------+------------------------------------+------+----------+------------+
|    1     | PRIMARY     | surfer_stats    | ref  | user_id,date     | user_id | 8       | const                              | 6    | 100.00   | USING WHERE
+----------+-------------+-----------------+------+------------------+---------+---------+------------------------------------+------+----------+------------+
|    2     | UNION       | surfer_stats    | ref  | user_id          | user_id | 8       | const                              | 6    | 100.00   | 
+----------+-------------+-----------------+------+------------------+---------+---------+------------------------------------+------+----------+------------+
|    2     | UNION       | user_visit_logs | ref  | user_id,LinkDate | LinkDate| 11      | const,rotate_ptp.surfer_stats.date | 1    | 100.00   | 
+----------+-------------+-----------------+------+------------------+---------+---------+------------------------------------+------+----------+------------+
|    NULL  | UNION RESULT| <union1,2>      | ALL  | NULL             | NULL    | NULL    | NULL                               | NULL | NULL     | 
+----------+-------------+-----------------+------+------------------+---------+---------+------------------------------------+------+----------+------------+

So question is, is there a way to further optimize this query?
Thanks.

Indexes for TABLE 1 (user_visit_logs):

+-------------+-------+--------+--------+---------------+-------------+-----------+------+
|    Keyname  | Type  | Unique | Packed | Column        | Cardinality | Collation | Null | 
+-------------+-------+--------+--------+---------------+-------------+-----------+------+
|    PRIMARY  | BTREE | Yes    | No     | id            | 29097       | A         | No   |
+-------------+-------+--------+--------+---------------+-------------+-----------+------+
|    user_id  | BTREE | No     | No     | user_id       | 4849        | A         | No   | 
+-------------+-------+--------+--------+---------------+-------------+-----------+------+
|    LinkDate | BTREE | No     | No     | user_id, date | 4849, 29097 | A         | No   | 
+-------------+-------+--------+--------+---------------+-------------+-----------+------+

Indexes for TABLE 2 (surfer_stats):

+-------------+-------+--------+--------+---------------+-------------+-----------+------+
|    Keyname  | Type  | Unique | Packed | Column        | Cardinality | Collation | Null | 
+-------------+-------+--------+--------+---------------+-------------+-----------+------+
|    PRIMARY  | BTREE | Yes    | No     | id            | 235428      | A         | No   |
+-------------+-------+--------+--------+---------------+-------------+-----------+------+
|    user_id  | BTREE | No     | No     | user_id       | 78476       | A         | No   | 
+-------------+-------+--------+--------+---------------+-------------+-----------+------+
|    date     | BTREE | No     | No     | date          | 1183        | A         | No   | 
+-------------+-------+--------+--------+---------------+-------------+-----------+------+

user_visit_logs

CREATE TABLE `user_visit_logs` (
 `id` int(255) NOT NULL AUTO_INCREMENT,
 `user_id` bigint(20) NOT NULL,
 `date` date NOT NULL,
 `invalid_hits` int(50) NOT NULL,
 `unique_hits` int(255) NOT NULL,
 `non_unique_hits` int(255) NOT NULL,
 `earned` varchar(120) NOT NULL,
 PRIMARY KEY (`id`),
 KEY `user_id` (`user_id`),
 KEY `LinkDate` (`user_id`,`date`) USING BTREE
) ENGINE=InnoDB AUTO_INCREMENT=604253 DEFAULT CHARSET=latin1

surfer_stats

CREATE TABLE `surfer_stats` (
 `id` bigint(20) NOT NULL AUTO_INCREMENT,
 `user_id` bigint(20) NOT NULL,
 `date` date NOT NULL,
 `sites_surfed` int(11) NOT NULL,
 `earnings` decimal(10,8) NOT NULL,
 PRIMARY KEY (`id`),
 KEY `user_id` (`user_id`),
 KEY `date` (`date`)
) ENGINE=InnoDB AUTO_INCREMENT=235787 DEFAULT CHARSET=latin1

Best Answer

You have the intersection twice, then go through a de-dup process (UNION DISTINCT). Instead, see the code for FULL OUTER JOIN in

        SELECT  vl.date, vl.user_id, invalid_hits, unique_hits, non_unique_hits,
                earned, sites_surfed, earnings
            FROM  user_visit_logs AS vl
            LEFT OUTER JOIN  surfer_stats AS ss
               ON  vl.user_id = ss.user_id
              AND  vl.date = ss.date
            WHERE  vl.user_id = '218' 
    )
    UNION  ALL   -- avoid dedup pass
    (
        SELECT  ss.date, ss.user_id, invalid_hits, unique_hits, non_unique_hits,
                earned,sites_surfed, earnings
            FROM  user_visit_logs AS vl
            RIGHT OUTER JOIN  surfer_stats AS ss
               ON  vl.user_id = ss.user_id
              AND  vl.date = ss.date
            WHERE  ss.user_id = '218' 
              AND  vl.user_id IS NULL   -- avoid dup
    ) 

Also, have this on both tables:

INDEX(user_id, date)

Note that the (255) after INT means nothing. Any INT is a 4-byte (32-bit)` integer.

earnings cannot be bigger than 99.99999999 ?