Nested MySQL FROM Query with Multiple Databases

MySQLmysql-5.5

I need to display user-specific tables for users logged into a CMS. These tables are stored in another database with tables for the different user groups. For example User 1 to 4 should get data from Table1, User 5-12 from Table2 etc. Unfortunately I can only run one query because of the CMS plugin creating charts only supports one. I can pass the UserID into that query though.

Tables:

  • UserTable matches User IDs to Group IDs (User1 = Group99).
  • GroupTable matches Group IDs to table names (Group 99 = Table3).
  • These User specific tables (Table3) contain Datetimes and Values.

This query returns the values I need from Table3 for example:

SELECT DATE(timestamp) As Date, AVG(Value) AS Value
FROM *DB.TableX*
WHERE timestamp BETWEEN CURDATE()-INTERVAL 4 WEEK AND CURDATE()
GROUP BY DATE(timestamp)

DB.TableX should now be replaced with the result from the following query, which basically returns "Table3":

SELECT grouptable.tablename FROM usertable
LEFT JOIN grouptable ON grouptable.ID = usertable.groupID
WHERE User = %USERID%

When I try to nest the second query into the FROM statement of the first, I get an error "Unknown column timestamp in field list":

SELECT DATE(timestamp) As Date, AVG(Value) AS Value
FROM
  (SELECT grouptable.tablename FROM cmsdb.usertable
  LEFT JOIN cmsdb.grouptable ON cmsdb.grouptable.ID = cmsdb.usertable.groupID
  WHERE User = 2) AS SUBQUERY
WHERE timestamp BETWEEN CURDATE()-INTERVAL 4 WEEK AND CURDATE()
GROUP BY DATE(timestamp)

I thought that was possible but I can't figure out what's wrong. I don't want to build a seperate page for each user :-/

Best Answer

There are basically two options with your scenario, as I understand it.

  1. Dynamic SQL (see a_vlad's answer)
  2. A (potentially) insanely massive JOIN.
  3. Return "canned" data.

A massive JOIN

For option two, you have to know the names of your tables when you write the query - and, those tables have to exist.

So, step one would be determine any future tables that will be needed, and pre-create them. If you can't do this, you can still build the query - but it will have to be modified whenever a new table is added.

So, assuming that your criteria allow us to create as many tables as you'll need for the foreseeable future in advance, next you have to build the new query:

SELECT DATE(COALESCE(t1.timestamp,t2.timestamp, t3.timestamp, t4.timestamp)) As Date
      ,AVG(COALESCE(t1.Value,t2.Value,t3.Value,t4.Value)) AS Value
FROM
  cmsdb.usertable ut
  LEFT JOIN cmsdb.grouptable gt ON ut.groupID = gt.ID
  LEFT JOIN Table1 t1 ON (gt.tablename = 'Table1')
  LEFT JOIN Table2 t2 ON (gt.tablename = 'Table2')
  LEFT JOIN Table3 t3 ON (gt.tablename = 'Table3')
  LEFT JOIN Table4 t4 ON (gt.tablename = 'Table4')
WHERE u.User = 2
  AND COALESCE(t1.timestamp,t2.timestamp, t3.timestamp, t4.timestamp) BETWEEN CURDATE()-INTERVAL 4 WEEK AND CURDATE()
GROUP BY DATE(COALESCE(t1.timestamp,t2.timestamp, t3.timestamp, t4.timestamp))

As you can see, this is already pretty unwieldy with 4 tables - with 200, it'd be hideous.

NOTE: code is untested, and I don't mainly work w/ MySQL, but this is simple enough that it should work; you might need to play with exactly where the COALESCE function should be.

The key here, if it isn't obvious, is that we only actually pay attention to data from the table with the right name.

There are a couple of assumptions here:

  1. Only one table can be valid for any given user. If a user can belong to multiple groups and thus see multiple tables, the results may not be correct.
  2. This assumes that you really want a summary of all the table's contents in the date range for any user in the correct group; if there's anything user-specific, you'd need to add that check, using COALESCE again to pick up the column form the table we actually joined with.

In many ways, a dynamic SQL solution would be easier on the programmer. With that, you can avoid some of the possible problems with this (it can't handle new tables, renamed tables, etc.)

Canned data

The third option only works if the data returned does not have to be live.

For this option, you create another table - let's call it GroupSummary. This table has three key values: grouptable (can be used as key), SummaryDate, AvgValue

GroupSummary would be populated by a dynamic SQL statement, like in a_vlad's answer. You could loop through the possible tables with a cursor, and have the whole thing set up as a stored program. Run the stored program as often as needed to pick up new data (noting that running it too frequently is likely to cause performance issues with your SELECT query). If reasonably fresh data is required, I'd be thinking refreshing the data every 5-15 minutes; at a minimum, it should be refreshed after midnight (as this would, of course, change the base date it's working from).

Then, the query in your webpage would be:

SELECT SummaryDate as Date
      ,AvgValue as Value
  FROM GroupSummary gs
         INNER JOIN grouptable gt ON (gs.grouptable = gt.grouptable)
           INNER JOIN usertable ut ON (gt.ID = ut.groupID)
 WHERE User = %USERID%

(Note: WHERE clause based on your sample query, adjust as needed).