MySQL Join Fails – How to Return Zero Results

join;MySQL

My query is trying to find monthly records with missing days, ie if month has 31 days but less than 31 records I want to detect it. This works fine unless all 31 records are missing when nothing is returned! (By the way Daily is a big table). So how do I get the transmitters with no records at all in a particular month to return zero?

  SELECT SUM(w.amount) AS MonthlyTotal,COUNT(*) AS Days,DAY(LAST_DAY('2016-06-01')) AS DaysInMonth 
    FROM Daily  w,Transmitter t 
    WHERE t.SiteID='1907' AND t.UtilityID='1' AND w.TransmitterID=t.TransmitterID  AND NOT t.deleted
    AND w.TimeStamp  BETWEEN '2016-06-01' AND LAST_DAY('2016-06-01') GROUP BY t.TransmitterID;

Best Answer

Before going into the main problem, there are a few more issues with the query:

the use of string literals ('1907' '1') for values that are compared with columns that seem to have integer type (SiteID, UtilityID). If the columns are integers, use integers, not strings:
```
WHERE ... t.SiteID = 1907 AND t.UtilityID = 1
```
using BETWEEN and LAST_DAY() for datetime/timestamp comparisons. This - assuming that the w.Timestamp column is indeed a TIMESTAMP will give you incorrect results, unless all your timestamps have 00:00:00 time part. The LAST_DAY('2016-01-01') will be '2016-01-31 00:00:00' and you lose the whole last day of the month, except the very first second.
If you change that to BETWEEN '2016-06-01' AND '2016-07-01 is somewhat better but still wrong as you you'll get a few results from the next day (in July)!
One way that works correctly and with all datatypes (DATE, DATETIME, TIMESTAMP) is to use inclusive-exclusive range checks, with >= and <:
```
WHERE ... w.TimeStamp >= '2016-06-01' AND w.TimeStamp < '2016-07-01'
```
If the t.Timestamp column is of DATE type, then ok, BETWEEN can be used although I prefer the consistency of the above suggested code.
using old ANSI syntax without JOIN. This is not an error but it's harder for debugging when there are many tables and error-prone (we might forget a joining clause):
```
FROM Daily w, Transmitter t 
WHERE ... AND w.TransmitterID = t.TransmitterID AND ...
```
It's better (in my opinion) to use the new (since 1992 ;) explicit JOIN syntax. It's also easier to change an INNER join to a LEFT or RIGHT outer join:
```
FROM Daily w JOIN Transmitter t
     ON w.TransmitterID = t.TransmitterID  
WHERE ... AND w.TransmitterID = t.TransmitterID AND ...
```

Now for the main problem, the solution is to start from the Trasmitter table and then LEFT JOIN the details (Daily) table:

SELECT 
    t.TransmitterID,
    SUM(w.amount)                        AS MonthlyTotal,
    COUNT(w.TransmitterID)               AS Days,
    DATEDIFF('2016-07-01', '2016-06-01') AS DaysInMonth 
FROM Transmitter t LEFT JOIN Daily w 
               ON  w.TransmitterID = t.TransmitterID  
               AND w.TimeStamp >= '2016-06-01' 
               AND w.TimeStamp  < '2016-07-01'
WHERE t.SiteID = 1907 
  AND t.UtilityID = 1 
  AND NOT t.deleted
GROUP BY t.TransmitterID ;

Related Solutions

Mysql – Database design suggestions for a data scraping/warehouse application

These are general recommendations, as you do not show the full extent of your queries to be performed (which kind of analytics you plan to do).

Assuming you do not need real time results, you should just denormalize your data at the end of the period, precalculate once your aggregated results on all necessary timeframes -by day, by week, by month-, and work only with summary tables. Depending on the queries you intend to do, you may not even need the original data.

If durability is not a problem (you can always recalculate statistics as raw data is elsewhere), you can use a caching mechanism (external, or MySQL 5.6 includes memcache), which works great for writing and reading key-value data on memory.

Use partitioning (can also be done manually), as with these kind of applications, usually the most frequently accessed rows are also the most recent. Delete or archive old rows to other tables to use our memory efficiently.

Use Innodb if you want durability, high concurrent writes and your most frequent accessed data is going to fit into memory. There is also TokuDB- it may not be raw faster, but it scales better when dealing with insertions on huge, tall tables and allows for compression on disk. There are also analytic-focused engines like Infobright.

Edit:

23 insertions/second is feasible in any storage with a bad disk but:

You do not want to use MyISAM- it cannot do concurrent writes (except on very specific conditions) and you do not want to have huge tables that become corrupted and lose data
InnoDB is fully durable by default, for better performance you may want to reduce the durability or have a good backend (disk caches). InnoDB tends to get slower on insertion with huge tables. The definition of huge is "the upper parts of the Primary key/other unique indexes must fit into the buffer pool" to check for uniqness. That can vary depending on the memory available. If you want scalability beyond that you have to partition (as I suggested above)/shard or use any of the alternative engines I mentioned before (TokuDB).

SUM() statistics do not scale on normal MySQL engines. An index increases performance, again, because most of the operations can be done on-memory, but one entry for each row has to still be read, in a single thread. I mentioned design alternatives (summary tables, caching) and alternative engines (column-based) as a solution to that. But if you do not need real-time result, but report-like queries, you shouldn't worry too much about that.

I suggest you to do a quick load test with fake data. I've had many clients doing analytics on MySQL of social networks without problems (well, at least, after I helped them :-) ), but you decision may depend on your actual non-functional requisites.

MySQL – Get All Dates in the Current Month

First, the condition WHERE date_field >= (CURDATE()-INTERVAL 1 MONTH) will not restrict your results to the current month. It will fetch all dates from 30-31 days ago up to the current date (and to the future, if there are rows with future dates in the table).

It should be:

WHERE date_field >= LAST_DAY(CURRENT_DATE) + INTERVAL 1 DAY - INTERVAL 1 MONTH
  AND date_field < LAST_DAY(CURRENT_DATE) + INTERVAL 1 DAY

Now, to the main question, to create 28-31 dates, even if the table has not rows for all the dates, you could use a Calendar table (with all dates, say for years 1900 to 2200) or create them on the fly, with something like this (the days table can be either a temporary table or you can even make it a derived table, with a somewhat more complicated query than this one):

CREATE TABLE days
( d INT NOT NULL PRIMARY KEY ) ;

INSERT INTO days
VALUES (0), (1), (2), ....
                  ..., (28), (29), (30) ;

SELECT 
    cal.my_date        AS date_field, 
    COALESCE(t.val, 0) AS val
FROM 
    ( SELECT 
          s.start_date + INTERVAL (days.d) DAY  AS my_date
      FROM 
          ( SELECT LAST_DAY(CURRENT_DATE) + INTERVAL 1 DAY - INTERVAL 1 MONTH
                       AS start_date,
                   LAST_DAY(CURRENT_DATE) 
                       AS end_date
          ) AS s
          JOIN days  
              ON  days.d <= DATEDIFF(s.end_date, s.start_date)
    ) AS cal
    LEFT JOIN my_table AS t 
        ON  t.date_field >= cal.my_date 
        AND t.date_field  < cal.my_date + INTERVAL 1 DAY ;

The above should work for any type of the date_field column (date, datetime, timestamp). If the date_field column is of type DATE, the last join can be simplified to:

    LEFT JOIN my_table AS t 
        ON  t.date_field = cal.my_date ;

Best Answer

Related Solutions

Mysql – Database design suggestions for a data scraping/warehouse application

MySQL – Get All Dates in the Current Month

Related Question