Mysql – Hide columns that are constant across a small number of rows

MySQL

I have a query that returns a small number of rows (usually 2-5) but a huge number of columns (several hundred). The rows are mostly identical, save for a few places. I would like to quickly identify them by hiding columns which are constant (e.g., I have 5 products, all of which cost $3, so the cost column should be suppressed). Is there an elegant way to do this using MySQL? If it helps, most of the columns are numerical.

Here's a tiny example: Suppose my initial query returns

col1|col2|col3|col4
 1  | x  | a  | 3.14
 1  | x  | b  | 3.14

Columns 1, 2 and 4 are "constant", while column 3 is not. Therefore I would like a secondary query to return only column 3:

|col3|
| a  |
| b  |

The problem is that manually finding a small number of non-constant columns is quite difficult with a large number of constant columns.

Best Answer

Columns are not dynamic. They can't be suppressed on the result because at that point it's too late. You have two options

  1. Unpivot

    id|ident|x|y
    1 | foo |1|2
    

    Becomes

    id|ident|key|val
    1 | foo | x |2
    1 | foo | y |2
    

    The schema is fixed from that point and you can return just what you need.

  2. Use dynamic SQL. The server returns sql that you issue against the server.

You could also fix the schema, if most of those columns are redundant modify the schema.

Update

I think perhaps you misunderstood my goal. I'm working with a query that returns something like: [...] I want to turn hide/supress the columns which are "constant," as in they don't change from row to row. That would turn the above table into...

I'm very sure I don't misunderstand your goal. We have at least one question a week that is along the same lines we have to turn around for the very same reason. In fact, a year ago yesterday I asked a similar question out of desperation thinking there may be a better way, and was turned around for the same reason.. I can still quote Erwin's relevant parts here,

Column names and types cannot be dynamic. SQL demands to know number, names and types of resulting columns at call time. Either by explicit declaration or from information in the system catalogs (That's what happens with SELECT * FROM tbl: Postgres looks up the registered table definition.) ... You want Postgres to derive resulting columns from data in a user table. Not going to happen. [...] One way or the other, you need two round trips to the server. Either you create a cursor and then walk through it. Or you create a temp table and then select from it. Or you register a type and use it in the call.

Your use case here is slightly different, but you should still read the above bolded parts at least 15 times, because it has to sink through.

Then you've got to either,

  1. Change the format you're will to accept from the server.
  2. Process the result set on the server that generates the right query, and then issue that query back to the server.

For your example

CREATE TABLE tinyexample(id,ident,key,val)
AS VALUES
  ( 1, 'foo', 'x', 2),
  ( 1, 'foo', 'y', 2);

What you want is something like this. Note this is not the best way to do this, you could save a round trip by running this in a function.

SELECT FORMAT (
  $$
    SELECT FORMAT(
      'SELECT %%s FROM %%I.%%I.%%I;',
      array_to_string(array_remove(ARRAY[%s]::text[],null), ', '),
      %s,
      %s,
      %s
    ) AS inner
    FROM %I.%I.%I;
  $$,
  colexpr,
  quote_literal(table_catalog),
  quote_literal(table_schema),
  quote_literal(table_name),
  table_catalog,
  table_schema,
  table_name
) AS outer
FROM (
  SELECT
    table_catalog,
    table_schema,
    table_name,
    string_agg(  format($$CASE WHEN count(DISTINCT %I) > 1 THEN %s END$$, column_name, quote_literal(column_name)), ', '  ) AS colexpr
  FROM information_schema.columns
  WHERE table_catalog = 'test'
    AND table_schema = 'public'
    AND table_name = 'tinyexample'
  GROUP BY table_catalog, table_schema, table_name
) AS t;

This seems like a mess, most of that's not me though. We have

  1. Issue our first query to the information_schema to get the column listing. Without this you'd have to statically type the columns. That's no fun. This returns query "outer".
  2. Issue our second query ("outer") to find out which columns are not distinct based on their values. This returns query "inner".
  3. Issue our third query ("inner") which returns the resultset we want.