This small data sample doesn't serve to illustrate that the behavior you are attempting to identify exists. Indeed, I've tested it on a larger data set and it did use the Index (using MySQL 5.5.30).
The problem is that when the optimizer determines that using an index would result in an inordinately large number of matches -- compared to the total number of rows in the table -- it won't use an index, because that could actually perform worse than simply scanning the whole table, and it will exhibit exactly the behavior this example illustrates... it knows the index is a candidate, but it chooses not to use it.
But I would suggest the problem lies in the fact that you're using a subquery in a place where a subquery isn't really necessary or called for.
I rewrote this as a join, because, from what I can tell, this is what you're asking the database to do: join each row in Person
to the matching row(s) in CofeeBreaks
where that person took their break during that window, and average the ages of the attendees.
I also built this here on SQL Fiddle. Removed TEMPORARY
from the table definitions because the Fiddle doesn't seem to support them properly (because it probably uses a connection pool).
SELECT cb.id,
cb.cofeeBreakStart,
cb.cofeeBreakEnd,
avg(p.age)
FROM CofeeBreaks cb
JOIN Person p ON p.lastCofee BETWEEN cb.cofeeBreakStart AND cb.cofeeBreakEnd
GROUP BY cb.id;
EXPLAIN SELECT
on this subquery shows that the index is being used, even on this small data set... although if you change that to a LEFT JOIN
(which would show all coffee breaks even if nobody took that particular break, while the JOIN
only includes breaks where at least one person did), the index shows up as being a candidate, but doesn't get used... again, likely because of the cost, and this behavior would likely be different with a larger data set.
The LEFT JOIN
version of the query would produce identical results to your subquery regardless of the table data, while the JOIN
version only produces identical results if every CofeeBreak had at least one person taking that break, which in your sample data, it does.
But using the indexes or not, a correlated subquery will not usually scale as well as a join.
http://dev.mysql.com/doc/refman/5.5/en/rewriting-subqueries.html
I'm going to suggest working to the solution in two phases. The first part is to make sure all the data you need is being returned. The second part is to combine the rows the way that you want.
I'm having trouble getting SqlFiddle to work with table types as well, so I'm going to work with the query directly. Here is the base query from above:
;WITH f AS
(
SELECT FacilityID FROM @facilities
UNION ALL SELECT @facilitydivisionID
)
SELECT
di.DocumentNumber, di.DocumentVersion, di.DocumentTitle,tsl.TransmittalSiteLeadFacility,
ts.TransmittalNumber, ts.TransmittalVersion,
FacilityImpDate = COALESCE(tsl.TransmittalSiteLeadImpDate, ts.TransmittalImpDate)
FROM
TransmittalSiteLead tsl
INNER JOIN TransmittalSheet as ts
ON ts.TransmittalID = tsl.TransmittalSiteLeadTSID
INNER JOIN documentInfo as di
ON ts.transmittalID = di.documentTransmittalImp
INNER JOIN f
ON tsl.TransmittalSiteLeadFacility = f.FacilityId
WHERE
transmittalSiteLeadImpDate
BETWEEN @startdate AND @enddate
Even if there are duplicates or extra information, all of the data you want to return should be in there. If it isn't, then look at the JOINs
or WHERE
clause to see how to include that info.
The second part of the problem is to condense or limit the rows that are returned. There are a lot of ways to do this and the best way depends on a lot of factors. Here's what I'm sure is a partial list:
SELECT DISTINCT
to limit results
GROUP BY
with MIN
to include information about a related table (See below)
EXISTS
to limit the rows return
ROW_NUMBER()
to pick a row based on an ordering (See below)
APPLY
with TOP
to get some information from a joined table
All this is to say that there's a lot of ways to do this, so do what makes sense for your situation. I've given two ways of doing this below, these aren't best for performance critical situations, but are easier to understand.
This query selects the document info and the staff id of someone who is implementing it.
;WITH f AS
(
SELECT FacilityID FROM @facilities
UNION ALL SELECT @facilitydivisionID
)
SELECT
di.DocumentNumber
,di.DocumentVersion
,di.DocumentTitle
,MIN(tsl.TransmittalSiteLeadStaffId) AS TransmittalSiteLeadStaffId
FROM
TransmittalSiteLead tsl
INNER JOIN TransmittalSheet as ts
ON ts.TransmittalID = tsl.TransmittalSiteLeadTSID
INNER JOIN documentInfo as di
ON ts.transmittalID = di.documentTransmittalImp
INNER JOIN f
ON tsl.TransmittalSiteLeadFacility = f.FacilityId
WHERE
transmittalSiteLeadImpDate
BETWEEN @startdate AND @enddate
GROUP BY
di.DocumentNumber
,di.DocumentVersion
,di.DocumentTitle
ORDER BY
di.DocumentNumber
Another way of doing it is using ROW_NUMBER
. This way is very flexible, but can be more difficult to construct and understand.
This query selects matching documents and the facility and contact of the first to implement it. PARTITION
sets how you want to divide the rows and ORDER BY
sets how you want to select which matching row you want to use.
;WITH f AS
(
SELECT FacilityID FROM @facilities
UNION ALL SELECT @facilitydivisionID
),
FullData AS (
SELECT
di.DocumentNumber
,di.DocumentVersion
,di.DocumentTitle
,tsl.TransmittalSiteLeadFacility
,tsl.TransmittalSiteLeadStaffId
,ROW_NUMBER() OVER (PARTITION BY di.DocumentNumber ORDER BY tsl.TransmittalSiteLeadImpDate) AS RowNum
FROM
TransmittalSiteLead tsl
INNER JOIN TransmittalSheet as ts
ON ts.TransmittalID = tsl.TransmittalSiteLeadTSID
INNER JOIN documentInfo as di
ON ts.transmittalID = di.documentTransmittalImp
INNER JOIN f
ON tsl.TransmittalSiteLeadFacility = f.FacilityId
WHERE
transmittalSiteLeadImpDate
BETWEEN @startdate AND @enddate
)
SELECT
DocumentNumber
,DocumentVersion
,DocumentTitle
,TransmittalSiteLeadFacility
,TransmittalSiteLeadStaffId
FROM
FullData
WHERE
RowNum = 1
ORDER BY
DocumentNumber
Best Answer
You can use the GROUP_CONCAT aggregate function to produce a comma-delimited list of values:
The result will be just a string value, however. If you use it like this:
you cannot expect to get a match on every ID in the
@facilities
list. This is because@facilities
is not expanded into a list and instead is treated as a single item of the IN list. The commas inside it will be seen as just characters inside a string value, not as syntactic delimiters.In order to use
@facilities
the way you want, you will have to build and execute a dynamic query around the value of@facilities
. It could be something like this:This way the query will not contain a reference to
@facilities
– instead, it will have the contents of@facilities
as its integral part, and the commas will thus be treated as part of the syntax.Each query that you want to use the
@facilities
list in would have to be executed in the above manner. I think you will agree it is not very convenient. There are other considerations to keep in mind as well. When you use a dynamic query and concatenate the contents of a variable into the dynamic query as described, you may be open to SQL injection attacks. Also, with a large number of items in an IN list the performance may degrade.I would suggest that instead of a variable you use a temporary table to store the IDs:
A table is much more flexible in that you are not tied to just one syntactic structure. Although you could still use the temporary table in an IN predicate, as in your example:
you could also use it in an EXISTS subquery:
;
or in a filtering join:
Having many options how to write your queries may be beneficial when you need to tune them for performance.