Mysql – Hide columns that are constant across a small number of rows

MySQL

I have a query that returns a small number of rows (usually 2-5) but a huge number of columns (several hundred). The rows are mostly identical, save for a few places. I would like to quickly identify them by hiding columns which are constant (e.g., I have 5 products, all of which cost $3, so the cost column should be suppressed). Is there an elegant way to do this using MySQL? If it helps, most of the columns are numerical.

Here's a tiny example: Suppose my initial query returns

col1|col2|col3|col4
 1  | x  | a  | 3.14
 1  | x  | b  | 3.14

Columns 1, 2 and 4 are "constant", while column 3 is not. Therefore I would like a secondary query to return only column 3:

|col3|
| a  |
| b  |

The problem is that manually finding a small number of non-constant columns is quite difficult with a large number of constant columns.

Best Answer

Columns are not dynamic. They can't be suppressed on the result because at that point it's too late. You have two options

Unpivot
```
id|ident|x|y
1 | foo |1|2
```
Becomes
```
id|ident|key|val
1 | foo | x |2
1 | foo | y |2
```
The schema is fixed from that point and you can return just what you need.
Use dynamic SQL. The server returns sql that you issue against the server.

You could also fix the schema, if most of those columns are redundant modify the schema.

Update

I think perhaps you misunderstood my goal. I'm working with a query that returns something like: [...] I want to turn hide/supress the columns which are "constant," as in they don't change from row to row. That would turn the above table into...

I'm very sure I don't misunderstand your goal. We have at least one question a week that is along the same lines we have to turn around for the very same reason. In fact, a year ago yesterday I asked a similar question out of desperation thinking there may be a better way, and was turned around for the same reason.. I can still quote Erwin's relevant parts here,

Column names and types cannot be dynamic. SQL demands to know number, names and types of resulting columns at call time. Either by explicit declaration or from information in the system catalogs (That's what happens with SELECT * FROM tbl: Postgres looks up the registered table definition.) ... You want Postgres to derive resulting columns from data in a user table. Not going to happen. [...] One way or the other, you need two round trips to the server. Either you create a cursor and then walk through it. Or you create a temp table and then select from it. Or you register a type and use it in the call.

Your use case here is slightly different, but you should still read the above bolded parts at least 15 times, because it has to sink through.

Then you've got to either,

Change the format you're will to accept from the server.
Process the result set on the server that generates the right query, and then issue that query back to the server.

For your example

CREATE TABLE tinyexample(id,ident,key,val)
AS VALUES
  ( 1, 'foo', 'x', 2),
  ( 1, 'foo', 'y', 2);

What you want is something like this. Note this is not the best way to do this, you could save a round trip by running this in a function.

SELECT FORMAT (
  $$
    SELECT FORMAT(
      'SELECT %%s FROM %%I.%%I.%%I;',
      array_to_string(array_remove(ARRAY[%s]::text[],null), ', '),
      %s,
      %s,
      %s
    ) AS inner
    FROM %I.%I.%I;
  $$,
  colexpr,
  quote_literal(table_catalog),
  quote_literal(table_schema),
  quote_literal(table_name),
  table_catalog,
  table_schema,
  table_name
) AS outer
FROM (
  SELECT
    table_catalog,
    table_schema,
    table_name,
    string_agg(  format($$CASE WHEN count(DISTINCT %I) > 1 THEN %s END$$, column_name, quote_literal(column_name)), ', '  ) AS colexpr
  FROM information_schema.columns
  WHERE table_catalog = 'test'
    AND table_schema = 'public'
    AND table_name = 'tinyexample'
  GROUP BY table_catalog, table_schema, table_name
) AS t;

This seems like a mess, most of that's not me though. We have

Issue our first query to the information_schema to get the column listing. Without this you'd have to statically type the columns. That's no fun. This returns query "outer".
Issue our second query ("outer") to find out which columns are not distinct based on their values. This returns query "inner".
Issue our third query ("inner") which returns the resultset we want.

Related Solutions

Mysql – Rows with a variable number of columns

Edited after a couple of comments by the question asker!:

Better idea for your design: new sample schema

(If you're not familar with these diagrams, that's seven tables, with foreign key links as pictured [the three _Link tables all have two foreign keys referencing the data tables]. It probably looks like I'm overcomplicating things, but trust me, "tokenise once and store in the database" is so much more efficient than "tokenise data each time you retrieve it".)

Sample data, using a simple tweet (ignoring Users for now, it's the same concept though): "test tweet! #howdoesthiswork #newbie"

      Tweet
TweetID   WordCount
   1         2

Tweet_Word_Link
TweetID   WordID
   1         1
   1         2

      Word
WordID    Word
   1      test
   2      tweet

Tweet_Hashtag_Link
TweetID  HashtagID
   1         1
   1         2

      Hashtag
HashtagID  Hashtag
    1      #howdoesthiswork
    2      #newbie

This allows you to easily get all the users linked in a tweet:

SELECT U.UserName
FROM  User U
        INNER JOIN
      Tweet_User_Link TUL ON U.UserID = TUL.UserID
        INNER JOIN
      Tweet T ON TUL.TweetID = T.TweetID
WHERE T.TweetID = <your ID here>

(change table names as appropriate for Hashtags and Words)

Or, slightly more complicated, get the top 10 words for all tweets mentioning a specific hashtag:

SELECT TOP 10 W.Word, COUNT(*) AS WordCount
FROM  Word W
        INNER JOIN
      Tweet_Word_Link TWL ON TWL.WordID = W.WordID
        INNER JOIN
      Tweet T ON TWL.TweetID = T.TweetID
        INNER JOIN
      Tweet_Hashtag_Link THL ON THL.TweetID = T.TweetID
        INNER JOIN
      Hashtag H ON H.HashtagID = THL.HashtagID
WHERE H.Hashtag = <your hashtag here>
GROUP By W.Word

(I've used ANSI SQL syntax, I'm not 100% sure if MySQL supports all of ANSI SQL - but it should be close enough.)

Mysql – Limit WHERE to MAX() & COUNT()

Here is your original query from the question

SELECT e.*, MAX(m.datetime) AS unread_last, COUNT(m.id) AS unread 
FROM TAB_EVENT e 
LEFT JOIN TAB_MESSAGE m ON e.id=m.event_id 
WHERE ( m.`read` IS NULL OR m.`read` = 0) 
GROUP BY e.id 
ORDER BY m.datetime DESC, e.id ASC 
LIMIT 10;

Maybe try refactoring the query in such a way that in executes in this sequence

only collect necessary columns from TAB_MESSAGE
apply LIMIT 10 against the collected rows from TAB_MESSAGE
run the JOIN
apply the MAX() and COUNT() last

Here is what I am proposing

SELECT e.*, MAX(m.datetime) AS unread_last, COUNT(m.id) AS unread 
FROM
(
    SELECT * FROM
    (SELECT id,event_id,datetime FROM TAB_MESSAGE
    WHERE read IS NULL OR read = 0
    ORDER BY datetime DESC) mm
    LIMIT 10
) m
LEFT JOIN TAB_EVENT e 
ON e.id=m.event_id
ORDER BY m.datetime DESC, e.id ASC;

Give it a Try !!!

UPDATE 2012-02-21 17:06 EDT

SELECT e.*, MAX(m.datetime) AS unread_last, COUNT(m.id) AS unread 
FROM
TAB_EVENT e LEFT JOIN
(
    SELECT * FROM
    (SELECT id,event_id,datetime FROM TAB_MESSAGE
    WHERE read IS NULL OR read = 0
    ORDER BY datetime DESC) mm
    LIMIT 10
) m
ON e.id=m.event_id
ORDER BY m.datetime DESC, e.id ASC;

@Sebastian, I put the query back in the original join order. Please try this as well !!!

UPDATE 2012-02-21 17:11 EDT

Make sure the datetime field is indexed

ALTER TABLE TAB_MESSAGE ADD INDEX read_datetime_ndx (read,datetime);