Mysql – Retrieve date ranges from multiple rows

MySQL

I'm struggling to find an optimal way of achieving this without lots of individual queries and PHP, I'm sure there must be an easier way of doing it.

To simplify we have the following table structure:

CREATE TABLE `log` (
  id INT UNSIGNED AUTO_INCREMENT,
  userid INT,
  itemid INT,
  eventname VARCHAR(30),
  timecreated INT
);

timecreated is a unix timestamp, eventname will be one of 2 statuses (assigned or unassigned).

Assuming the following data (I've put actual dates so it's easier to read):

id	userid	itemid	action	timecreated
1	1	1	assigned	2020-01-01
2	1	2	assigned	2020-01-01
3	1	1	unassigned	2020-03-01
4	1	1	assigned	2020-06-01
5	1	1	unassigned	2020-06-30

The output should be:

itemid	start	end
1	2020-01-01	2020-03-01
2	2020-01-01
1	2020-06-01	2020-06-30

Best Answer

Plan A: Change the schema to have two date columns. Then have the code INSERT a row when the item is "assigned" and UPDATE the other column when it is "unasigned". This essentially eliminates the entire problem. And it cuts in half the number of rows needed.

Plan B: Split the effort. Using the schema you currently have, SELECT ... ORDER BY ... to get the pairs of rows adjacent. Then have the application code finish the task by remembering the contents of the first row of each pair ("assigned") just long enough to blend it with the next row ("unassigned").

YOUR QUERY

SELECT post.postid, post.attach FROM newbb_innopost AS post WHERE post.threadid = 51506;

At first glance, that query should only touches 1.1597% (62510 out of 5390146) of the table. It should be fast given the key distribution of threadid 51506.

REALITY CHECK

No matter which version of MySQL (Oracle, Percona, MariaDB) you use, none of them can fight to one enemy they all have in common : The InnoDB Architecture.

InnoDB Architecture

CLUSTERED INDEX

Please keep in mind that the each threadid entry has a primary key attached. This means that when you read from the index, it must do a primary key lookup within the ClusteredIndex (internally named gen_clust_index). In the ClusteredIndex, each InnoDB page contains both data and PRIMARY KEY index info. See my post Best of MyISAM and InnoDB for more info.

REDUNDANT INDEXES

You have a lot of clutter in the table because some indexes have the same leading columns. MySQL and InnoDB has to navigate through the index clutter to get to needed BTREE nodes. You should reduced that clutter by running the following:

ALTER TABLE newbb_innopost
    DROP INDEX threadid,
    DROP INDEX threadid_2,
    DROP INDEX threadid_visible_dateline,
    ADD INDEX threadid_visible_dateline_index (`threadid`,`visible`,`dateline`,`userid`)
;

Why strip down these indexes ?

The first three indexes start with threadid
threadid_2 and threadid_visible_dateline start with the same three columns
threadid_visible_dateline does not need postid since it's the PRIMARY KEY and it's embedded

BUFFER CACHING

The InnoDB Buffer Pool caches data and index pages. MyISAM only caches index pages.

Just in this area alone, MyISAM does not waste time caching data. That's because it's not designed to cache data. InnoDB caches every data page and index page (and its grandmother) it touches. If your InnoDB Buffer Pool is too small, you could be caching pages, invalidating pages, and removing pages all in one query.

TABLE LAYOUT

You could shave of some space from the row by considering importthreadid and importpostid. You have them as BIGINTs. They take up 16 bytes in the ClusteredIndex per row.

You should run this

SELECT importthreadid,importpostid FROM newbb_innopost PROCEDURE ANALYSE();

This will recommend what data types these columns should be for the given dataset.

CONCLUSION

MyISAM has a lot less to contend with than InnoDB, especially in the area of caching.

While you revealed the amount of RAM (32GB) and the version of MySQL (Server version: 10.0.12-MariaDB-1~trusty-wsrep-log mariadb.org binary distribution, wsrep_25.10.r4002), there are still other pieces to this puzzle you have not revealed

The InnoDB settings
The Number of Cores
Other settings from my.cnf

If you can add these things to the question, I can further elaborate.

UPDATE 2014-08-28 11:27 EDT

You should increase threading

innodb_read_io_threads = 64
innodb_write_io_threads = 16
innodb_log_buffer_size = 256M

I would consider disabling the query cache (See my recent post Why query_cache_type is disabled by default start from MySQL 5.6?)

query_cache_size = 0

I would preserve the Buffer Pool

innodb_buffer_pool_dump_at_shutdown=1
innodb_buffer_pool_load_at_startup=1

Increase purge threads (if you do DML on multiple tables)

innodb_purge_threads = 4

GIVE IT A TRY !!!

MySQL – Retrieve Total Available Days in Date Range

First solution

Well, I tried a solution. It works but it is pretty ugly. But it works...

SELECT count(*)
FROM (
    SELECT code, dates.selected_date
    FROM appartments
    INNER JOIN (select * from 
      (select adddate('2015-01-01',t3.i*1000 + t2.i*100 + t1.i*10 + t0.i) selected_date from
       (select 0 i union select 1 union select 2 union select 3 union select 4 union select 5 union select 6 union select 7 union select 8 union select 9) t0,
       (select 0 i union select 1 union select 2 union select 3 union select 4 union select 5 union select 6 union select 7 union select 8 union select 9) t1,
       (select 0 i union select 1 union select 2 union select 3 union select 4 union select 5 union select 6 union select 7 union select 8 union select 9) t2,
       (select 0 i union select 1 union select 2 union select 3 union select 4 union select 5 union select 6 union select 7 union select 8 union select 9) t3) v
    WHERE selected_date BETWEEN '2015-04-16' AND '2015-04-28') dates
    WHERE (code, selected_date) NOT IN (
        SELECT code, dates.selected_date
        FROM appartments
        INNER JOIN (select * from 
          (select adddate('2015-01-01',t3.i*1000 + t2.i*100 + t1.i*10 + t0.i) selected_date from
           (select 0 i union select 1 union select 2 union select 3 union select 4 union select 5 union select 6 union select 7 union select 8 union select 9) t0,
           (select 0 i union select 1 union select 2 union select 3 union select 4 union select 5 union select 6 union select 7 union select 8 union select 9) t1,
           (select 0 i union select 1 union select 2 union select 3 union select 4 union select 5 union select 6 union select 7 union select 8 union select 9) t2,
           (select 0 i union select 1 union select 2 union select 3 union select 4 union select 5 union select 6 union select 7 union select 8 union select 9) t3) v
          ) dates ON dates.selected_date between date_arrival and date_departure)
    GROUP BY code, dates.selected_date) available_dates_by_code

Change the date period in the line WHERE selected_date BETWEEN '2015-04-16' AND '2015-04-28') dates.

Remove the first level SELECT FROM to get all dates of unoccupied apartments for dates between '2015-04-16' AND '2015-04-28'.

You may want to change the '2015-01-01' dates to something earlier (ie. CURDATE() if you're only working with future dates). This query will only return next 30 years dates past '2015-01-01', so change it to something like CURDATE() - '1 YEAR'

I'm very curious to see if someone have a better solution...

How it works

From the bottom to the top :

The first SELECT gets all occupied dates for all appartments.
The second SELECT gets all dates wanted and removes all appartment/date couples that are occupied.
The third select count the number of appartement/date couple available between the provided dates.

Second solution

SELECT
  (
    SELECT adddate('2015-01-01',t3.i*1000 + t2.i*100 + t1.i*10 + t0.i) selected_date
    FROM apartments
    INNER JOIN (SELECT * FROM 
     (select 0 i union select 1 union select 2 union select 3 union select 4 union select 5 union select 6 union select 7 union select 8 union select 9) t0,
     (select 0 i union select 1 union select 2 union select 3 union select 4 union select 5 union select 6 union select 7 union select 8 union select 9) t1,
     (select 0 i union select 1 union select 2 union select 3 union select 4 union select 5 union select 6 union select 7 union select 8 union select 9) t2,
     (select 0 i union select 1 union select 2 union select 3 union select 4 union select 5 union select 6 union select 7 union select 8 union select 9) t3) v
    WHERE selected_date BETWEEN '2015-04-16' AND '2015-04-28'
  ) -
  (
    SELECT count (code, dates.selected_date)
    FROM apartments
    INNER JOIN (SELECT * FROM
      (select adddate('2015-01-01',t3.i*1000 + t2.i*100 + t1.i*10 + t0.i) selected_date from
       (select 0 i union select 1 union select 2 union select 3 union select 4 union select 5 union select 6 union select 7 union select 8 union select 9) t0,
       (select 0 i union select 1 union select 2 union select 3 union select 4 union select 5 union select 6 union select 7 union select 8 union select 9) t1,
       (select 0 i union select 1 union select 2 union select 3 union select 4 union select 5 union select 6 union select 7 union select 8 union select 9) t2,
       (select 0 i union select 1 union select 2 union select 3 union select 4 union select 5 union select 6 union select 7 union select 8 union select 9) t3) v
      ) dates ON dates.selected_date between date_arrival and date_departure
    WHERE selected_date BETWEEN '2015-04-16' AND '2015-04-28'
  ) AS 'days_of_availability'

This one is much more simpler. The second SELECT counts the number of days existing for the two dates multiplied by the number of apartments. The third SELECT counts the number of occupied days for all the apartments. The top SELECT does (number of days) minus (number of occupied days).

Fun fact: it took me almost 30 minutes to get this query working. That's a shame.