MySQL derived query with random order

MySQLmysql-5.6

Here I have two tables in MySQL 5.6…

CREATE TABLE Zones (
  ZoneID int,
  ZoneName varchar(10),
  AccountID int,
  ProxyType int
);

CREATE TABLE ProxyServers (
  ProxyID int,
  ProxyType int,
  ProxyIP varchar(10),
  ProxyPort int,
  Enabled tinyint
);

My structure is a lot more complex than the one outlined about but I am trying to keep this as simple as possible to explain.

Basically I want to return 1 row of data consisting of
Zones.ID, Zones.ZoneName, and a ProxyID selected at random joined by ProxyType

SELECT * FROM Zones z    
LEFT JOIN ProxyServers ps ON ps.ProxyID = (
      SELECT ProxyID FROM ProxyServers WHERE ProxyType = z.ProxyType ORDER BY RAND() LIMIT 1
    )
ORDER BY ZoneID DESC
LIMIT 1

This is simple enough to do in T-SQL, but can't figure it out in MySQL. Plus what is rather strange, even thought the data set does not change, the results are different ever time the query is run. Sometimes no rows, sometimes 4, sometimes even duplicate rows. I don't get it :/ I think it's the ORDER BY RAND() causing this because when I remove that it's fine.

Best Answer

If you want the query to return just 1 row, a zone with a random proxy (chosen from the associated ones), then it's easy to fix the query:

SELECT z.*, ps.* 
FROM Zones AS z    
  LEFT JOIN ProxyServers AS ps 
  ON ps.ProxyType = z.ProxyType 
-- WHERE <some condition to choose from some zones or just one>
ORDER BY RAND() 
LIMIT 1 ;

or even better:

SELECT z.*, ps.* 
FROM 
    ( SELECT *
      FROM Zones
      ORDER BY ZoneID DESC
      LIMIT 1
    ) AS z    
  LEFT JOIN ProxyServers AS ps 
  ON ps.ProxyType = z.ProxyType 
ORDER BY RAND() 
LIMIT 1 ;

If you want multiple or all the zones, each one with a randomly chosen proxy server, it's really not trivial to solve, due to the way that MySQL executes subqueries (in ON and WHERE clauses).

In short, the subqueries (the ON condition in your case) is executed for each and every combination of z and ps. Sometimes the randomly chosen ProxyID matches the ps.ProxyID, sometimes it doesn't. Therefore, for each zone, you may get 0 matches, 1, 2 or more, which is exactly what you see in the results.
See previous answers in similar issues:


Writing a query that behaves as expected - and return 1 row for each zone - it really hard because the values of RAND() are not found previously stored in a table but provided during run time.

A work around would be to abuse two other non-standard MySQL "features". First, that subqueries allow ORDER BY without LIMIT, which is normally redundant. Second, that in GROUP BY queries, it is allowed to have expressions in the SELECT list which are not aggregates and are not in the GROUP BY list. This is also against the standard and often leads to unexpected results for poorly written queries.

Combining these two ill-behaving features, we can have a monster:

SELECT g.*
FROM
  ( SELECT z.*, 
           ps.ProxyID, ps.ProxyIP, zs.ProxyPort, ps.Enabled 
    FROM Zones AS z    
      LEFT JOIN ProxyServers AS ps 
      ON ps.ProxyType = z.ProxyType 
    ORDER BY z.ZoneID, RAND() 
  ) AS g
GROUP BY ZoneID ;

It's certainly not guaranteed to work, as recent or future versions of MySQL may or do detect the ill behaviour and may either remove the redundant ORDER BY killing the wanted functionality or may throw an error due to the non-standard GROUP BY clause.