Postgresql – Column alias in Postgres FROM clause

aliaspostgresql

I'm reading through the Postgres documentation, the page on SELECT statements, and I ran across an aspect of aliases that I have never encountered.

In the section on FROM clauses, subheading alias, there is a sentence stating:

If an alias is written, a column alias list can also be written to provide substitute names for one or more columns of the table.

There are no examples given in the documentation that I could find.

I know how to set up output names as aliases, but that doesn't appear to be the same thing.

The synopsis for SELECT includes the lines:

... SELECT [ ALL | DISTINCT [ ON (expression[, ...] ) ] ] * |expression[ [ AS ]output_name] [, ...] [ FROMfrom_item[, ...] ] ...

And defines from_item as:

where from_item can be one of:

[ ONLY ]table_name[ * ] [ [ AS ]alias[ (column_alias[, ...] ) ] ] (select) [ AS ]alias[ (column_alias[, ...] ) ] ...(other forms omitted)...

Note that from_item actually includes column_alias.

It makes sense to me that the ( select ) form can be given column aliases in the FROM clause (rather than only output_names, in the "expressions" of the SELECT statement), since the "columns" of the subquery will have been explicitly chosen in most cases and thus the sequence will be known. So I would imagine in that case the column_alias values could simply be a list of names, and they would be matched up in sequence against the columns returned by the subquery. (Though an example would be nice.)

However, how can column aliases be used for a table_name? Do you have to know the exact sequence of columns defined in the table, or can you set an alias just for one or two of these in the FROM clause?

What if you only want to set a column_alias for one column with a very long name (and leave the other columns not aliased); is this possible? (If so, is this Postgres specific?)

Best Answer

The column aliases there override the column names/aliases of the internal select subquery (derived table). The same way, they can override the column names of the table (whether it's base table, a view, a derived table or a cte does not matter at all).

So, the simple example will give an error:

select 
    a, count_a              -- invalid here (have been overridden)
from 
    ( select t.a, count(*) as count_a
      from t
      group by t.a
      order by count_a desc           -- count_a is valid here
      limit 8
    ) 
      as d (b, count_b) ;

but this will work:

select 
    b, count_b              -- valid column aliases
from     
    -- identical as above

the names a and count_a are valid inside the subquery (derived table) but not outside because they have been overridden by b and count_b.

Do you have to know the exact sequence of columns defined in the table, or can you set an alias just for one or two of these in the FROM clause?

Yes, you do have to know the sequence of columns.

But you don't have to change all columns. Say the table has 5 columns. If you use:

select t.*
from table_name as t (a,b,c) ;

only the first 3 columns will appear with the new names (a,b,c). The 4th and 5th will show with their real names. You'll get an error if you provide more aliases than needed (eg. 6 aliases for a 5-column table).

What if you only want to set a column_alias for one column with a very long name (and leave the other columns not aliased); is this possible? (If so, is this Postgres specific?)

Only if it's the first. Or by providing all the previous column names up to the column you want to alias with a different name.

I suppose you can't just give an alias for the third column only, or something like that?

I don't know of any syntax to allow you to alias only the 3rd column, without providing the names of the 1st and 2nd column.

Overall, the usefulness of the feature is at least debatable when used for base tables. And the above query that overrides the names of just 3 of the possibly many columns reeks obfuscation and could be very well considered bad practice.

But the feature is provided because it's standard SQL and for completeness. It wouldn't make sense to have this only for subqueries and CTEs and not for other kinds of tables.

One case where it can be useful is (not with base table but with) the VALUES construct, where the columns get default names of column1, column2, etc. and this aliasing can be used to select more meaningful names:

select 
    a, b
from 
    ( values
         (1, 2),
         (2, 3), 
         (3, 5)
    ) 
      as d (a, b) ;

Related Solutions

Sql-server – Why are queries parsed in such a way that disallows the use of column aliases in most clauses

Summary

There's no logical reason it couldn't be done, but the benefit is small and there are some pitfalls that may not be immediately apparent.

Research Results

I did some research and found some good information. The following is a direct quote from a reliable primary source (that wishes to remain anonymous) at 2012-08-09 17:49 GMT:

When SQL was first invented, it had no aliases in the SELECT clause. This was a serious shortcoming that was corrected when the language was standardized by ANSI in about 1986.

The language was intended to be "non-procedural"--in other words, to describe the data that you want without specifying how to find it. So, as far as I know, there's no reason why an SQL implementation couldn't parse the whole query before processing it, and allow aliases to be defined anywhere and used everywhere. For example, I don't see any reason why the following query shouldn't be valid:
select name, salary + bonus as pay
from employee
where pay > 100000
Although I think this is a reasonable query, some SQL-based systems may introduce restrictions on the use of aliases for some implementation-related reason. I'm not surprised to hear that SQL Server does this.

I am interested in further research into the SQL-86 standard and why modern DBMSes don't support alias reuse, but haven't had the time to get very far with it yet. For starters, I don't know where to get the documentation or how to find out who exactly made up the committee. Can anyone help out? I also would like to know more about the original Sybase product that SQL Server came from.

From this research and some further thought, I have come to suspect that using aliases in other clauses, while quite possible, simply has never been that high a priority for DBMS manufacturers compared to other language features. Since it is not that much of an obstacle, being easily worked around by the query writer, putting effort into it over other advancements is not optimal. Additionally, it would be proprietary as it is obviously not part of the SQL standard (though I'm waiting to find out more on that for sure) and thus would be a minor improvement, breaking SQL compatibility between DBMSes. By comparison, CROSS APPLY (which is really nothing more than a derived table allowing outer references) is a huge change, that while proprietary offers incredible expressive power not easily performed in other ways.

Problems With Using Aliases Everywhere

If you allow SELECT items to be put in the WHERE clause, you can not only explode the complexity of the query (and thus the complexity of finding a good execution plan) it is possible to come up with completely illogical stuff. Try:

SELECT X + 5 Y FROM MyTable WHERE Y = X

What if MyTable already has a column Y, which one is the WHERE clause referring to? The solution is to use a CTE or a derived table, which in most cases should cost no extra but achieves the same final end result. CTEs and derived tables at least enforce the resolution of ambiguity by allowing an alias to be used only once.

Also, not using aliases in the FROM clause makes eminent sense. You can't do this:

SELECT
   T3.ID + (SELECT Min(Interval) FROM Intervals WHERE IntName = 'T') CalcID
FROM
   Table1 T
   INNER JOIN Table2 T2
      ON T2.ID = CalcID
   INNER JOIN Table3 T3
      ON T2.ID = T3.ID

That's a circular reference (in the sense that T2 is secretly referring to a value from T3, before that table has been presented in the JOIN list), and darn hard to see. How about this one:

INSERT dbo.FinalTransaction
SELECT
   newid() FinalTransactionGUID,
   'GUID is: ' + Convert(varchar(50), FinalTransactionGUID) TextGUID,
   T.*
FROM
   dbo.MyTable T

How much do you want to bet that the newid() function is going to be put into the execution plan twice, completely unexpectedly making the two columns show different values? What about when the above query is used N levels deep in CTEs or derived tables. I guarantee that the problem is worse than you can imagine. There are already serious inconsistency problems about when things are evaluated only once or at what point in a query plan, and Microsoft has said it will not fix some of them because they are expressing query algebra properly--if one gets unexpected results, break the query up into parts. Allowing chained references, detecting circular references through potentially very long such chains–these are quite tricky problems. Introduce parallelism and you've got a nightmare in the making.

Note: Using the alias in WHERE or GROUP BY isn't going to make a difference to the problems with functions like newid() or rand().

A SQL Server way to create reusable expressions

CROSS APPLY/OUTER APPLY is one way in SQL Server to create expressions that can be used anywhere else in the query (just not earlier in the FROM clause):

SELECT
   X.CalcID
FROM
   Table1 T
   INNER JOIN Table3 T3
      ON T.ID = T3.ID
   CROSS APPLY (
      SELECT
         T3.ID + (SELECT Min(Interval) FROM Intervals WHERE IntName = 'T') CalcID
   ) X
   INNER JOIN Table2 T2
      ON T2.ID = X.CalcID

This does two things:

Makes all expressions in the CROSS APPLY get a "namespace" (a table alias, here, X) and be unique within that namespace.
Makes it obvious everywhere not only that CalcID is coming from X, but also makes it obvious why you can't use anything from X when joining table T1 and T3, because X hasn't been introduced yet.

I'm actually quite fond of CROSS APPLY. It has become my faithful friend, and I use it all the time. Need a partial UNPIVOT (which would require a PIVOT/UNPIVOT or UNPIVOT/PIVOT using native syntax)? Done with CROSS APPLY. Need a calculated value that will be reused many times? Done. Need to rigidly enforce execution order for calls over a linked server? Done-with a screaming improvement in speed. Need only one type of row split to 2 rows or with extra conditions? Done.

So at the very least, in DBMS SQL Server 2005 and up, you have no further cause for complaint: CROSS APPLY is how you DRY in the way you are wanting.

Postgresql – How to define alias in an ARRAY_AGG expression

You are forming an ad-hoc row type (effectively an anonymous record) with this expression:

(media_files.position, media_files.token, media_files.title)

in your aggregate function call:

ARRAY_AGG((media_files.position, media_files.token, media_files.title)
          ORDER BY media_files.position) AS media_files

Arrays types can only be built upon well-known types. Your option is to announce such a type to the system and cast the record to it before forming the array. Create a well-known composite type:

CREATE TYPE my_type AS (
  position int    -- data type?
 ,token    text
 ,title    text
 )

I am guessing data types for lack of information here. Fill in your actual types.

Creating a table has the same effect: It announces a well known composite type to the system indirectly, as well. For this reason, you can (ab-)use a temporary table to register a composite type for the duration of the session:

CREATE TEMP TABLE my_type AS (
  position int    -- data type?
 ,token    text
 ,title    text
 )

Either way, you can then cast your record:

ARRAY_AGG((media_files.position, media_files.token, media_files.title)::my_type
          ORDER BY media_files.position) AS media_files

Then you can reference elements of the (now well-known) type by name:

SELECT media_files[1].position, media_files[1].token
FROM  (
     ...
   ,ARRAY_AGG((media_files.position, media_files.token, media_files.title)::my_type
          ORDER BY media_files.position) AS media_files
   ...
   FROM ....
   GROUP BY ...
  ) sub;

Now, Postgres can use these names for building a JSON value. Voilá.

Best Answer

Related Solutions

Sql-server – Why are queries parsed in such a way that disallows the use of column aliases in most clauses

Postgresql – How to define alias in an ARRAY_AGG expression

Related Question