PostgreSQL Alias – Why GROUP BY is Required?

aliaspostgresql

Here's something strange: there is a large table named "products" which is partitioned with inheritance such that there are two children: products_active and products_inactive, and the constraint is that in products_active, status=1, and products_inactive gets all the other statuses.

There is a large query which joins a bunch of tables, of which this is the first part:

SELECT
  products.id, products.status, products.brand_id, products.name, products.description, products.data, products.website,
  products.packaging, products.container, products.country_of_origin, products.category_id, products.product_type_id, products.tsv_keywords,
  COUNT(prices.id) as prices_count,
  ROUND(AVG(currency_convert(prices.amount,currencies.currency_code,'USD')),2) as avg_price,
  ROUND(MAX(currency_convert(prices.amount,currencies.currency_code,'USD')),2) as high_price,
  ROUND(MIN(currency_convert(prices.amount,currencies.currency_code,'USD')),2) as low_price,
  ts_rank(tsv_keywords, plainto_tsquery('merlot')) as rank,
  ROUND(AVG(ST_Distance(ST_GeographyFromText('SRID=4326;POINT(0.001 0.001)'),ST_GeographyFromText('SRID=4326;POINT(' || stores.longitude || ' ' || stores.latitude || ')')))) AS distance
FROM
  products
  JOIN product_types ON products.product_type_id = product_types.id
  JOIN categories ON products.category_id = categories.id
  JOIN prices ON prices.product_id = products.id 
  JOIN currencies ON prices.currency_id = currencies.id
  JOIN stores ON prices.store_id = stores.id
  JOIN brands ON products.brand_id = brands.id
  JOIN merchants ON stores.merchant_id = merchants.id
  JOIN manufacturers ON brands.manufacturer_id = manufacturers.id
, delivery_zones
WHERE ...

(full query here: http://pastebin.com/VjJPTQWj)

The problem is: note the FROM products... part, if I replace this with FROM products_active AS products, the query errors out with:

ERROR:  column "products.status" must appear in the GROUP BY clause or be used in an aggregate function
LINE 2:   products.id, products.status, products.brand_id, products....

The two tables (products and products_active) have identical structures! They are inherited from one another!

Am I missing a subtlety in aliasing table names?

Best Answer

The problem is that the 2 tables are not identical.

The first has (id) as the PRIMARY KEY while the second has (id) a UNIQUE constraint (or index, doesn't matter). This might seem like a minor detail but it isn't.

Postgres has added in version 9.1, a feature were functionally dependent columns need not be mentioned in a GROUP BY clause (if the column they depend on is mentioned in the GROUP BY) and still used in the HAVING, SELECT and ORDER BY clauses. However the implementation is not 100% complete. It doesn't identify all possible functional dependencies but only those coming from PRIMARY KEY constraints. While a UNIQUE constraint with all columns NOT NULL does not differ in any meaningful way from a PRIMARY KEY constraint, the implementation does not consider UNIQUE constraints.

So, for example, with these two very similar tables (test in SQLfiddle):

CREATE TABLE products
  ( id INT NOT NULL,
    name TEXT NOT NULL,
    status TEXT NOT NULL,
    PRIMARY KEY (id)
  ) ;

CREATE TABLE products_active
  ( id INT NOT NULL,
    name TEXT NOT NULL,
    status TEXT NOT NULL,
    UNIQUE (id)
  ) ;

The query will succeed for the first table:

SELECT p.id, p.status, p.name
FROM products AS p
GROUP BY p.id ;

while it will fail for the second:

SELECT p.id, p.status, p.name
FROM products_active AS p
GROUP BY p.id ;

with:

ERROR: column "p.status" must appear in the GROUP BY clause or be used in an aggregate function