Correlated Subqueries: resolving ambiguous names

subquery

Suppose I have 2 tables which share some column names. In this case the primary key is both is called id.

CREATE TABLE artists(
    id int primary key,
    name text,
--  …,

);
CREATE TABLE paintings(
    id int primary key,
    artistid references artists(id),
    title text,
--  …,
);

Note: I know there are arguments against calling your primary key a generic name like id, but let’s suppose, for argument’s sake that it’s out of my control. In any case, it could have been any other column for the purpose of this question.

Suppose I now have a carelessly written SELECT statement which seeks to extract data from the referenced table using a correlated subquery:

SELECT
    id, title,
    (SELECT name FROM artists WHERE artistid=id) as artist
FROM paintings;

Clearly the inner WHERE clause could have been better written as WHERE paintings.artistid=artists.id. However I have got away with it and it works. It even works if I write the WHERE clause as id=artistid.

I know it’s not the best way to go about it, but I’m more surprised that it has understood my intention, which is not what we have come to expect from SQL.

The question is: How does SQL interpret ambiguous columns in a Correlated Subquery?

Best Answer

It resolves the column names in scope order. Inside the sub query columns will be resolved against those from the artists table if there is such ambiguity.

This can sometimes be quite useful though it can also be quite dangerous.

I usually use table aliases and two part names to avoid any unexpected issues and make it clearer to the reader which table each column is from.

SELECT p.id,
       p.title,
       (SELECT a.name
        FROM   artists a
        WHERE  p.artistid = a.id) AS artist
FROM   paintings p;