Mysql – Computational power required for this query

gaps-and-islandsMySQLperformancequery-performance

I am not sure if the query I am trying to run is unreasonably large, in terms of computational power. The query will identify islands (and their duration) of a certain condition in my db. Things work great on the following "test" set. You will see that there are two wellsites, a timestamp, and 4 columns of *_sd values. Timestamps are mostly identical for each of the two wellsites.

CREATE TABLE t
    (`wellsite` int, `ts` varchar(19), `EngOilP_sd` int, `CompOilLVL_sd` int, `Brake_sd` int, `HiTemp_sd` int)
;

INSERT INTO t
    (`wellsite`, `ts`, `EngOilP_sd`, `CompOilLVL_sd`, `Brake_sd`, `HiTemp_sd`)
VALUES
  (1, '2015-06-24 20:00:00', 0, 0, 0, 0),
  (1, '2015-06-24 20:30:20', 1, 0, 1, 0),
  (1, '2015-06-24 20:36:47', 1, 1, 1, 0),
  (1, '2015-06-24 20:41:11', 1, 1, 1, 0),
  (1, '2015-06-24 20:43:29', 1, 1, 0, 1),
  (1, '2015-06-24 20:45:42', 1, 0, 0, 1),
  (1, '2015-06-24 20:47:51', 0, 0, 0, 1),
  (1, '2015-06-24 20:49:59', 0, 1, 0, 1),
  (1, '2015-06-24 20:52:01', 1, 1, 0, 1),
  (1, '2015-06-24 20:53:01', 1, 1, 0, 1),
  (1, '2015-06-24 20:54:01', 1, 1, 0, 0),
  (1, '2015-06-24 20:56:01', 1, 1, 1, 0),
  (1, '2015-06-24 20:57:01', 1, 1, 1, 0),
  (1, '2015-06-24 20:58:01', 1, 1, 1, 0),
  (1, '2015-06-24 21:02:01', 1, 1, 0, 0),
  (1, '2015-06-24 21:32:01', 0, 1, 0, 0),
  (1, '2015-06-24 21:52:01', 0, 0, 0, 0),
  (1, '2015-06-24 22:12:01', 0, 0, 1, 0),
  (1, '2015-06-24 22:20:01', 0, 0, 1, 0),
  (1, '2015-06-24 22:49:01', 0, 1, 1, 0),
  (1, '2015-06-24 23:52:01', 0, 1, 1, 0),
  (1, '2015-06-24 23:59:59', 0, 0, 0, 0),
  (2, '2015-06-24 20:00:00', 0, 0, 0, 0),
  (2, '2015-06-24 20:30:20', 1, 0, 0, 1),
  (2, '2015-06-24 20:36:47', 1, 0, 0, 1),
  (2, '2015-06-24 20:41:11', 1, 0, 0, 1),
  (2, '2015-06-24 20:43:29', 1, 0, 0, 1),
  (2, '2015-06-24 20:45:42', 1, 0, 0, 1),
  (2, '2015-06-24 20:47:51', 1, 0, 0, 1),
  (2, '2015-06-24 20:49:59', 1, 0, 0, 0),
  (2, '2015-06-24 20:52:01', 1, 1, 0, 0),
  (2, '2015-06-24 20:53:01', 1, 1, 0, 0),
  (2, '2015-06-24 20:54:01', 0, 1, 0, 0),
  (2, '2015-06-24 20:56:01', 0, 1, 0, 0),
  (2, '2015-06-24 20:57:01', 0, 1, 1, 0),
  (2, '2015-06-24 20:58:01', 0, 1, 1, 0),
  (2, '2015-06-24 21:02:01', 0, 1, 1, 1),
  (2, '2015-06-24 21:32:01', 0, 1, 1, 1),
  (2, '2015-06-24 21:52:01', 0, 1, 1, 1),
  (2, '2015-06-24 22:12:01', 0, 0, 1, 1),
  (2, '2015-06-24 22:20:01', 0, 0, 1, 0),
  (2, '2015-06-24 22:49:01', 0, 1, 1, 0),
  (2, '2015-06-24 23:52:01', 0, 1, 0, 0),
  (2, '2015-06-24 23:54:17', 0, 0, 0, 0)
;

Now, creating the following pivot table;

create view t_pivot as
    select wellsite, 'EngOilP_sd' as sd, ts, EngOilP_sd as val from t
    union all
    select wellsite, 'CompOilLVL_sd', ts, CompOilLVL_sd from t
    union all
    select wellsite, 'Brake_sd', ts, Brake_sd from t
    union all
    select wellsite, 'HiTemp_sd', ts, HiTemp_sd from t;

I can identify my islands with the following query; (The starttime and wellsite conditionals are commented out, but they work and will eventually be needed. Also, wellsites is a table elsewhere that gives the wellsite id in the pivot table a name)

SELECT ws.name, t.*
FROM (
  SELECT sd, wellsite, TIMEDIFF(MAX(ts), MIN(ts)) AS duration, MIN(ts) AS starttime, MAX(ts) AS endtime
  FROM (
         SELECT
           tp.*
           , @val_change := IF(@prev_val != tp.val, @val_change + 1, @val_change) AS vc
           , @prev_val := tp.val
         FROM
           t_pivot tp
           , (SELECT @prev_val := NULL, @val_change := 0) var_init_subquery
         ORDER BY wellsite, sd, ts
       ) sq
  WHERE val = 1
  GROUP BY sd, vc
) t
JOIN wellsites ws ON ws.id = wellsite
# AND wellsite = 1
# WHERE t.starttime > '2015-06-24 20:40:00'
# AND t.starttime < '2015-06-24 21:00:00';

This gives my desired result

+-------------------+---------------+----------+-----------------+---------------------+---------------------+
| name              | sd            | wellsite | duration        | starttime           | endtime             |
+-------------------+---------------+----------+-----------------+---------------------+---------------------+
| Charlesburg 4     | HiTemp_sd     |        2 | 01:10:00.000000 | 2015-06-24 21:02:01 | 2015-06-24 22:12:01 |
| Charlesburg 4     | HiTemp_sd     |        2 | 00:17:31.000000 | 2015-06-24 20:30:20 | 2015-06-24 20:47:51 |
| Charlesburg 4     | EngOilP_sd    |        2 | 00:22:41.000000 | 2015-06-24 20:30:20 | 2015-06-24 20:53:01 |
| Charlesburg 4     | CompOilLVL_sd |        2 | 01:03:00.000000 | 2015-06-24 22:49:01 | 2015-06-24 23:52:01 |
| Charlesburg 4     | CompOilLVL_sd |        2 | 01:00:00.000000 | 2015-06-24 20:52:01 | 2015-06-24 21:52:01 |
| Charlesburg 4     | Brake_sd      |        2 | 01:52:00.000000 | 2015-06-24 20:57:01 | 2015-06-24 22:49:01 |
| Yancer 8&6        | HiTemp_sd     |        1 | 00:09:32.000000 | 2015-06-24 20:43:29 | 2015-06-24 20:53:01 |
| Yancer 8&6        | EngOilP_sd    |        1 | 00:10:00.000000 | 2015-06-24 20:52:01 | 2015-06-24 21:02:01 |
| Yancer 8&6        | EngOilP_sd    |        1 | 00:15:22.000000 | 2015-06-24 20:30:20 | 2015-06-24 20:45:42 |
| Yancer 8&6        | CompOilLVL_sd |        1 | 00:42:02.000000 | 2015-06-24 20:49:59 | 2015-06-24 21:32:01 |
| Yancer 8&6        | CompOilLVL_sd |        1 | 00:06:42.000000 | 2015-06-24 20:36:47 | 2015-06-24 20:43:29 |
| Yancer 8&6        | CompOilLVL_sd |        1 | 01:03:00.000000 | 2015-06-24 22:49:01 | 2015-06-24 23:52:01 |
| Yancer 8&6        | Brake_sd      |        1 | 01:40:00.000000 | 2015-06-24 22:12:01 | 2015-06-24 23:52:01 |
| Yancer 8&6        | Brake_sd      |        1 | 00:02:00.000000 | 2015-06-24 20:56:01 | 2015-06-24 20:58:01 |
| Yancer 8&6        | Brake_sd      |        1 | 00:10:51.000000 | 2015-06-24 20:30:20 | 2015-06-24 20:41:11 |
+-------------------+---------------+----------+-----------------+---------------------+---------------------+

Here is the problem; The actual table I am working with (what is t above) is much larger and there are more sd values being examined. See the comparison below. MySQL (actually MariaDB) dies after fighting this query for about 30 minutes even when I limit the search range to a SINGLE DAY. From what I can tell, MySQL restarts itself because all the connections shown in show processlist; disappear.

+-------------------+---------------+----------------+-----------------+---------------------+ 
| table name        | # of rows     | # of wellsites | # of sd columns | # of total columns  | 
+-------------------+---------------+----------------+-----------------+---------------------+ 
| t (example)       | 44            |   2            | 4               | 4 (just example sd) | 
| dr (actual)       | 1.8 M         |   50           | 38              | 125                 | 
+-------------------+---------------+----------------+-----------------+---------------------+ 

The testing database server I am running this on has RAM=512Mb, #CPUs=1. Production is not much better at RAM=1Gb, #CPUs=1

Is this query a lost cause? I've got one of my guys spending almost 4 hrs / week doing these by hand and I miss him. I need him back.

I do not fully understand the output of EXPLAIN when added to the top of that query, but here is its output when run against the test table t;

+------+--------------+----------------+--------+---------------+------+---------+---------------+------+----------------------------------------------+
| id   | select_type  | table          | type   | possible_keys | key  | key_len | ref           | rows | Extra                                        |
+------+--------------+----------------+--------+---------------+------+---------+---------------+------+----------------------------------------------+
|    1 | PRIMARY      | ws             | index  | PRIMARY       | name | 53      | NULL          |   50 | Using index                                  |
|    1 | PRIMARY      | <derived2>     | ref    | key0          | key0 | 5       | ers_DEV.ws.id |   10 |                                              |
|    2 | DERIVED      | <derived3>     | ALL    | NULL          | NULL | NULL    | NULL          |  176 | Using where; Using temporary; Using filesort |
|    3 | DERIVED      | <derived4>     | system | NULL          | NULL | NULL    | NULL          |    1 | Using filesort                               |
|    3 | DERIVED      | <derived5>     | ALL    | NULL          | NULL | NULL    | NULL          |  176 |                                              |
|    5 | DERIVED      | t              | ALL    | NULL          | NULL | NULL    | NULL          |   44 |                                              |
|    6 | UNION        | t              | ALL    | NULL          | NULL | NULL    | NULL          |   44 |                                              |
|    7 | UNION        | t              | ALL    | NULL          | NULL | NULL    | NULL          |   44 |                                              |
|    8 | UNION        | t              | ALL    | NULL          | NULL | NULL    | NULL          |   44 |                                              |
| NULL | UNION RESULT | <union5,6,7,8> | ALL    | NULL          | NULL | NULL    | NULL          | NULL |                                              |
|    4 | DERIVED      | NULL           | NULL   | NULL          | NULL | NULL    | NULL          | NULL | No tables used                               |
+------+--------------+----------------+--------+---------------+------+---------+---------------+------+----------------------------------------------+
  • Many thanks to users @tombom and @Lennart for helping me develop these queries in the first place, here and here.

Clarifications from question comments

These servers have so little RAM because we are starting out. We started with only what we need, and we will add on as the need arises. My question here is "has the need surfaced?" I do not yet know the rules of thumb regarding power requirements.

This is not a virtual machine, it is a discrete server (droplet) from Digital Ocean. Going to 4Gb RAM and 2 CPU cores bumps the cost from $5/mo to $40/mo.

We have been growing quickly and sometimes things aren't as easy to change as one would like. I can likely get more hardware if we can justify it. So, to ask the question again, is this a hardware issue or the way the query is written? Would a proper machine solve this issue? If so, how would I spec the machine? What do the specs depend on?

This isn't a budget issue. I am wondering if upgrading the hardware will solve this or we need to re-examine the particular query. If upgrading is the route, what resources should I use in specing the hardware? I don't know if I need 4GB or 32GB RAM for this query. That's where I'm stuck. Obviously something needs to change, but I'm not sure what.

The database has 18 tables, of which dr is by far the largest. When we spec'd our hardware, we had no idea what kinds of queries would be run. We didn't even know if we'd be running next month. I mention this one because it is the biggest to date, and our current bottleneck. As far as performance, I'll let this run overnight as long as it runs. The rule of thumb I may be looking for is the "database fits in RAM", correct?

The machine I develop on has 32GB RAM. Would there be a way to throttle that down to better represent what a 4GB machine would do?

Best Answer

Given your sample data, your view t_pivot has the following EXPLAIN plan:

|     id |  select_type |          table | type | possible_keys |    key | key_len |    ref |   rows |           Extra |
|--------|--------------|----------------|------|---------------|--------|---------|--------|--------|-----------------|
|      1 |      PRIMARY |              t |  ALL |        (null) | (null) |  (null) | (null) |     44 |          (null) |
|      2 |        UNION |              t |  ALL |        (null) | (null) |  (null) | (null) |     44 |          (null) |
|      3 |        UNION |              t |  ALL |        (null) | (null) |  (null) | (null) |     44 |          (null) |
|      4 |        UNION |              t |  ALL |        (null) | (null) |  (null) | (null) |     44 |          (null) |
| (null) | UNION RESULT | <union1,2,3,4> |  ALL |        (null) | (null) |  (null) | (null) | (null) | Using temporary |

It's doing a full-table scan four times just to create the view.

Create the following indexes on your table t:

CREATE INDEX idx_t_ws_ts_engoil ON t(wellsite, ts, EngOilP_sd);
CREATE INDEX idx_t_ws_ts_compoil ON t(wellsite, ts, CompOilLVL_sd);
CREATE INDEX idx_t_ws_ts_brake ON t(wellsite, ts, Brake_sd);
CREATE INDEX idx_t_ws_ts_hitemp ON t(wellsite, ts, HiTemp_sd);

Then you should get an explain like this:

|     id |  select_type |          table |  type | possible_keys |                 key | key_len |    ref |   rows |           Extra |
|--------|--------------|----------------|-------|---------------|---------------------|---------|--------|--------|-----------------|
|      1 |      PRIMARY |              t | index |        (null) |  idx_t_ws_ts_engoil |      70 | (null) |     44 |     Using index |
|      2 |        UNION |              t | index |        (null) | idx_t_ws_ts_compoil |      70 | (null) |     44 |     Using index |
|      3 |        UNION |              t | index |        (null) |   idx_t_ws_ts_brake |      70 | (null) |     44 |     Using index |
|      4 |        UNION |              t | index |        (null) |  idx_t_ws_ts_hitemp |      70 | (null) |     44 |     Using index |
| (null) | UNION RESULT | <union1,2,3,4> |   ALL |        (null) |              (null) |  (null) | (null) | (null) | Using temporary |

Of course it still reads the same amount of rows, but it reads them from the indexes. The indexes are put into memory (if they fit), not the whole table (unless it fits in the query cache (I wouldn't recommend to increase the query cache to an absurd large value, just to make this query run faster. Especially since you don't run this query very often. It will hurt you in the long run...)).

Now you have to look at this query:

  SELECT sd, wellsite, TIMEDIFF(MAX(ts), MIN(ts)) AS duration, MIN(ts) AS starttime, MAX(ts) AS endtime
  FROM (
         SELECT
           tp.*
           , @val_change := IF(@prev_val != tp.val, @val_change + 1, @val_change) AS vc
           , @prev_val := tp.val
         FROM
           t_pivot tp
           , (SELECT @prev_val := NULL, @val_change := 0) var_init_subquery
         ORDER BY wellsite, sd, ts
       ) sq
  WHERE val = 1
  GROUP BY sd, vc

sqlfiddle.com stopped working right now somehow, so I can't show you the explain output, but I've seen it before and I'm not sure, if there's much more you can do about it. But there's an Using filesort which can be avoided. Add an ORDER BY NULL after GROUP BY sd, vc. The GROUP BY does an implicit ORDER BY which is not really needed here, but I guess this won't help much. See if the above mentioned indexes help already.