EDIT 1: The query below is quite like the original query. It very much depends on how strict are the other conditions, i.e. how many rows can be ignored with those conditions. I think the index should contain the group-max-triple (l_a, l_b, timestamp)
, but if l_act='q'
is a strong reduction and the condition is always there then l_act
should come before l_a.
Origin:
I would write it as
select tmp.*
from tmp inner join (
select l_a, l_b, max(timestamp) as max_time
from tmp where timestamp < cast('2013-01-01T00:00:00Z' as binary)
group by l_a, l_b
) as selmax
on selmax.l_a = tmp.l_a
and selmax.l_b = tmp.l_b
and selmax.max_time = tmp.timestamp
;
and give tmp an index on (l_a, l_b, timestamp)
.
And also: what says explain
?
EDIT 2:
Seeing the explain and looking at the indices clears the case. l_project starts the primary key but you give no restriction on that in your inner query.
The second index needs name
as a second field.
If l_act is a strong restriction (All of the queries give a l_act
- condition which meets less than 1/3 of the table), you should make an index
l_act, l_a, l_b, timestamp
otherwise
l_a, l_b, timestamp
If the 4-part-index is used, it should be very fast.
I wish much fun adding the index. ;-)
If you use the 4-part-index, then you must add the l_act-condition in the inner AND the outer join.
JOINs
together with aggregates (eg, SUM()
) often give you bigger values than you expect. To see this, remove the SUMs
and GROUP BY
and stare at the output. You may see, for example, the same mghours.hourstime
showing up multiple times. Hence, SUM(mghours.hourstime)
is inflated.
This is happening because JOIN mgtraining
is coming up with multiple rows in mgtraining
for each row in mghours
.
The solution is to do the query in steps. Notice how the inner query avoids inflating "vol":
SELECT x.*,
SUM(...) AS 'at
FROM (
SELECT ...,
SUM(mghours.hourstime) AS vol
FROM ...
GROUP BY ...
) x
LEFT JOIN mgtraining ON ...
GROUP BY ...
ORDER BY ...;
Best Answer
If I've understood you (at last) correctly, I think you want something like this (see fiddle here):
Create your table:
Populate it:
And use the following premliminary SQL (included to demonstrate the thought processes that went into formulating the answer. I find that people learn better if they can see a "breadcrumb trail" (cf. Hansel and Gretel)):
The
CAST
is necessary because of the inclusion of theh
in the string for the duration of usage (much better to use a plainINTEGER
). If theCAST
isn't present,8h
will be "greater than"20h
for Donald Duck when sorted as a character string -CAST
ing toUNSIGNED
removes that problem.Result:
Having obtained the
first
andlast
names and theuse_duration
, we are now in a position toJOIN
back to our original table this way:Result:
Now, I would recommend that you keep the duration as just a pure
INTEGER
rather than use 25h. I hope this responds to your needs, - if not, then please expand on your explanation and I'll endeavour to revise my answer to your satisfaction. p.s. welcome to the forum! :-)