Correlated Subqueries: resolving ambiguous names

subquery

Suppose I have 2 tables which share some column names. In this case the primary key is both is called id.

CREATE TABLE artists(
    id int primary key,
    name text,
--  …,

);
CREATE TABLE paintings(
    id int primary key,
    artistid references artists(id),
    title text,
--  …,
);

Note: I know there are arguments against calling your primary key a generic name like id, but let’s suppose, for argument’s sake that it’s out of my control. In any case, it could have been any other column for the purpose of this question.

Suppose I now have a carelessly written SELECT statement which seeks to extract data from the referenced table using a correlated subquery:

SELECT
    id, title,
    (SELECT name FROM artists WHERE artistid=id) as artist
FROM paintings;

Clearly the inner WHERE clause could have been better written as WHERE paintings.artistid=artists.id. However I have got away with it and it works. It even works if I write the WHERE clause as id=artistid.

I know it’s not the best way to go about it, but I’m more surprised that it has understood my intention, which is not what we have come to expect from SQL.

The question is: How does SQL interpret ambiguous columns in a Correlated Subquery?

Best Answer

It resolves the column names in scope order. Inside the sub query columns will be resolved against those from the artists table if there is such ambiguity.

This can sometimes be quite useful though it can also be quite dangerous.

I usually use table aliases and two part names to avoid any unexpected issues and make it clearer to the reader which table each column is from.

SELECT p.id,
       p.title,
       (SELECT a.name
        FROM   artists a
        WHERE  p.artistid = a.id) AS artist
FROM   paintings p;

UPDATE 2012-01-12 14:03 EDT

I refactored it again to make sure the readings keys and boards keys are combined correctly before retrieving the data from the readings table:

SELECT 
    readings.* 
FROM 
    ( 
        SELECT A.* FROM
        (
            SELECT boxsn FROM readings 
            WHERE (time >= 1325404800)  
            AND (time < 1326317400)  
            ORDER BY `time` ASC
        ) A
        LEFT JOIN
        (
            SELECT id AS boxsn
            FROM boards
            WHERE siteId = '1'
        ) B
        USING (boxsn)
        WHERE B.boxsn IS NOT NULL
    ) readings_keys 
    LEFT JOIN readings 
    USING (boxsn) 
;

PostgreSQL – Using Aliases with Correlated Subqueries

As @ypercube already explained, the subquery has no reference to columns in the outer query, it can be processed independently. So it is not a "correlated subquery". Some call that a "derived table", or just "subquery".

SELECT *
FROM   orderinfo o
    , (SELECT * FROM customer c1 WHERE town = 'Bingham') c2
WHERE  c2.customer_id = o.customer_id;

As to your questions:

c1 is a table alias for customer in the subquery, short for customer AS c1. The key word AS has been omitted which is fine since it would be just noise for a table alias. I quote the manual on "Omitting the AS Key Word":

In FROM items, both the standard and PostgreSQL allow AS to be omitted before an alias that is an unreserved keyword. But this is impractical for output column names, because of syntactic ambiguities.

The example probably goes to demonstrate visibility: only the outer table alias is visible in the outer WHERE clause, so there is no naming conflict with c used twice.
Otherwise, c1 is useless here since nothing refers to it. You can just drop it.
c2 is another table alias for the "derived table". This one is mandatory since Postgres requires a name for every used table, and a subquery has none until you name it.

The example is not very good overall. Such a query should rather use an explicit JOIN clause and the subquery is just useless. This would be better, shorter and faster:

SELECT *
FROM   orderinfo
JOIN   customer  c USING (customer_id)
WHERE  c.town = 'Bingham';

The only difference: customer_id is listed once instead of twice in the result (due to the USING clause), which would be preferable since it is completely redundant in this case.

Details in the manual about SELECT.

Best Answer

Related Solutions

Mysql – Query performance with subquery and IN clause

UPDATE 2012-01-12 14:03 EDT

PostgreSQL – Using Aliases with Correlated Subqueries

Related Question