MYSQL: used variable on IN condition

MySQLstored-procedures

EDIT] The query below is not the actual query which I am having issues with performance. It is only a simplified query. As I only would like to know if it is possible to store multiple results into a variable and used that variable into the IN condition.

I had this query below inside a procedure which I would like to change for optimization.

SELECT * FROM request WHERE facility_id IN (SELECT facility_id FROM facility);

What I would like to do was to store the subquery into a variable and used that variable on the main query.

SET @facilities = (SELECT facility_id FROM facility);

SELECT * FROM request WHERE facility_id IN (@facilities);

But I got an error Subquery returns more than 1 row.
How should I do it?
Any help is appreciated.

I have found some link that suggest that way http://www.mysqlperformancetuning.com/how-to-speed-up-mysql-by-optimization?nocache=1#comment-961. But I cannot figure it out.

Best Answer

You can use the GROUP_CONCAT aggregate function to produce a comma-delimited list of values:

SET @facilities = (SELECT GROUP_CONCAT(facility_id) FROM facility);

The result will be just a string value, however. If you use it like this:

SELECT * FROM request WHERE facility_id IN (@facilities);

you cannot expect to get a match on every ID in the @facilities list. This is because @facilities is not expanded into a list and instead is treated as a single item of the IN list. The commas inside it will be seen as just characters inside a string value, not as syntactic delimiters.

In order to use @facilities the way you want, you will have to build and execute a dynamic query around the value of @facilities. It could be something like this:

@stmt = CONCAT('SELECT * FROM request WHERE facility_id IN (', @facilities, ')');
PREPARE stmt FROM @stmt;
EXECUTE stmt;
DEALLOCATE PREPARE stmt;

This way the query will not contain a reference to @facilities – instead, it will have the contents of @facilities as its integral part, and the commas will thus be treated as part of the syntax.

Each query that you want to use the @facilities list in would have to be executed in the above manner. I think you will agree it is not very convenient. There are other considerations to keep in mind as well. When you use a dynamic query and concatenate the contents of a variable into the dynamic query as described, you may be open to SQL injection attacks. Also, with a large number of items in an IN list the performance may degrade.

I would suggest that instead of a variable you use a temporary table to store the IDs:

CREATE TEMPORARY TABLE tmp_facilities AS SELECT facility_id FROM facility;

A table is much more flexible in that you are not tied to just one syntactic structure. Although you could still use the temporary table in an IN predicate, as in your example:

SELECT
  *
FROM
  request
WHERE
  facility_id IN (
    SELECT facility_id FROM tmp_facility
  )
;

you could also use it in an EXISTS subquery:

SELECT
  *
FROM
  request AS r
WHERE
  EXISTS (
    SELECT * FROM tmp_facility AS f WHERE f.facility_id = r.facility_id
  )

;

or in a filtering join:

SELECT
  r.*
FROM
  request AS r
  INNER JOIN tmp_facilities AS f ON f.facility_id = r.facility_id
;

Having many options how to write your queries may be beneficial when you need to tune them for performance.