Mysql – Adding a subquery to the SELECT clause returns an error

aggregatearrayMySQLquery

I have a query which was originally written to retrieve work-event data based on whether the work-event was updated during a specified date range. The query ends up being used to populate a reporting tool to monitor task activity.

Here's the original query with its example output:

SELECT tasks.id, tasks.name, users.name, DATE_FORMAT(MAX(kits.updated_at), '%m/%d/%Y'),
COUNT(DISTINCT(DATE(kits.updated_at))), COUNT(DISTINCT(kits.id))
FROM kits
JOIN tasks ON tasks.id = kits.task_id JOIN users ON users.id = kits.user_id
WHERE kits.updated_at >= DATE_SUB(CURDATE(), INTERVAL 1 MONTH)
GROUP BY tasks.id, users.name

returned result:

=> [[1574, "Test Task", "username", "11/21/2019", 3, 12]]

  • Result is an array of arrays with column fields occurring in left-to-right order:
  • Task ID, Task Name, Username, Date Last Active, Count: Dates Active, Count: Kits Active On

Here's an example of how the query's results are fed into a table:

| Task ID | Task Name | Username | Date Last Active | Count: Dates Active | Count: Kits Active On | 
|---------|-----------|----------|------------------|---------------------|-----------------------| 
| 1203    | Test Task | user1    | 11/20/2019       | 6                   | 15                    | 
| 1203    | Test Task | user2    | 11/20/2019       | 3                   | 11                    | 
| 1203    | Test Task | user3    | 11/17/2019       | 12                  | 181                   | 
| 1205    | Test Task | user4    | 11/18/2019       | 9                   | 41                    | 
| 1205    | Test Task | user5    | 11/21/2019       | 8                   | 21                    | 

To explain: in our application, task activity is made up of distinct work-events (i.e., "kits") related to the task. To directly spell out this relation, tasks are made up of or have many "kits", and each kit has a state property with values such as "assigned", "broken" or "done".

To the current report, I've been asked to add an additional "Completed Kits" column which needs to be populated with unique id's (kits.uid) for any kits that have had their status updated to "done" during the specified time range.

My question: is it possible to add something like a conditional WHERE clause or filter which is only applied to a single field in my SELECT clause? In this case, this new filter would limit only those kits which had their state value updated to "done" within the constraints of the more general filter working to include kits updated over the specified date_range.

So far, based on my review of mysql search results, I've tried the following which includes a subquery inside the SELECT clause:

SELECT tasks.id, tasks.name, users.name, DATE_FORMAT(MAX(kits.updated_at), '%m/%d/%Y'),
COUNT(DISTINCT(DATE(kits.updated_at))), COUNT(DISTINCT(kits.id)),
(SELECT kits.uid FROM kits WHERE kits.updated_at #{date_range} AND kits.state = "done")
FROM kits
JOIN tasks ON tasks.id = kits.task_id JOIN users ON users.id = kits.user_id
WHERE kits.updated_at #{date_range}
GROUP BY tasks.id, users.name

The problem here is that MySQL throws an error reporting that my subquery is returning more than 1 row.

The results I'm looking for should in fact populate a final "Kits Completed" column on the table with multiple "done" kit.uid values. Returning to the error after Jacob's comment below, I'm starting to realize that the problem is that the "more than 1 row" results returned by my subquery needs to be transformed into a string or nested collection so it represents a single piece of data, like so:

Desired results:

=> [[1574, "Test Task", "username", "11/21/2019", 3, 12, ["done-kit-uid-1", "done-kit-uid-2", "done-kit-uid-3"]]]

  • One new column for the collection:
  • Task ID, Task Name, Username, Date Last Active, Count: Dates Active, Count: Kits Active On, Kits Completed (collection)

With the expected table output looking similar to this:

| Task ID | Task Name | Username | Date Last Active | Count: Dates Active | Count: Kits Active On |    Completed Kits    |
|---------|-----------|----------|------------------|---------------------|-----------------------|----------------------|
| 1203    | Test Task | user1    | 11/20/2019       | 6                   | 15                    | kit-uid-collection-1 |
| 1203    | Test Task | user2    | 11/20/2019       | 3                   | 11                    | kit-uid-collection-2 |
| 1203    | Test Task | user3    | 11/17/2019       | 12                  | 181                   | kit-uid-collection-3 |
| 1205    | Test Task | user4    | 11/18/2019       | 9                   | 41                    | kit-uid-collection-4 |
| 1205    | Test Task | user5    | 11/21/2019       | 8                   | 21                    | kit-uid-collection-5 |

Best Answer

Ok, one possible way to resolve this is through the use of GROUP_CONCAT to implode the multiple values returned by the subquery into a single string:

Here's how my subquery has changed within the top-level SELECT:

SELECT
  ...
  ...
  (SELECT
    GROUP_CONCAT(kits.uid ORDER BY kits.updated_at ASC SEPARATOR ', ')
   FROM
     kits
   WHERE
     kits.updated_at >= DATE_SUB(CURDATE(), INTERVAL 1 MONTH) AND kits.state = "done")
FROM kits...

Since the number of characters in each individual uid for a kit varies around 20 chars and there are multiple uids being concatenated for each result, the resulting table output looks like garbage. In spite of the unsettling ugliness the Completed Kits data has on the rendered table, it's currently correct based on the requirement.

All in a day's work!

Thanks to everyone who helped and was patient reading through my ridiculously long post. Also, trust me, the redundancy of my query has me thinking I should continue searching for a better way to do this with a join as suggested by mustaccio.

Read more about GROUP_CONCAT on S.O.


? Follow-up Post (November 27, 2019):

I did end up re-writing this MySQL query using a JOIN which allowed me to eliminate the somewhat awkward syntax of nesting two subqueries in a single SELECT clause. (Also unverified, but I believe the general consensus is that JOINs are more performant):

Here's how that ended up looking:

SELECT base_query.task_id, base_query.task_name, base_query.username, base_query.date_last_active, base_query.count_dates_active, base_query.count_kits_active_on, kits_query.count_completed_kits, kits_query.completed_kits
FROM
  (SELECT tasks.id AS task_id, tasks.name AS task_name, users.name AS username, DATE_FORMAT(MAX(kits.updated_at), '%m/%d/%Y') AS date_last_active,
  COUNT(DISTINCT(DATE(kits.updated_at))) AS count_dates_active, COUNT(DISTINCT(kits.id)) AS count_kits_active_on
  FROM kits
  JOIN tasks ON tasks.id = kits.task_id JOIN users ON users.id = kits.user_id
  WHERE kits.updated_at >= DATE_SUB(CURDATE(), INTERVAL 1 MONTH)
  GROUP BY tasks.id, users.name) AS base_query
LEFT JOIN
  (SELECT tasks.id AS task_id, users.name AS username, IFNULL(COUNT(kits.id), '0') AS count_completed_kits, IFNULL(GROUP_CONCAT(kits.uid ORDER BY kits.updated_at DESC SEPARATOR ', '), 'none') AS completed_kits
  FROM kits
  JOIN tasks ON tasks.id = kits.task_id JOIN users ON users.id = kits.user_id
  WHERE kits.updated_at >= DATE_SUB(CURDATE(), INTERVAL 1 MONTH) AND kits.state = "done"
  GROUP BY tasks.id, users.name) AS kits_query
ON base_query.task_id = kits_query.task_id AND base_query.username = kits_query.username