Postgresql – find if the column of the PostgreSQL table is empty with minimum cost

postgresql

I need to find if there are any values written to the column but with minimum cost.

There is a table with ~1B rows and about 100 columns with data. I need to find columns that have no data at all (all nulls) and delete them. If I query like "SELECT * FROM my_table WHERE my_column IS NOT NULL LIMIT 1" then it costs me 1-2 minutes and I'm looking for the faster solution.

As far as I know (but not sure), if there are no data at all for the column, there is some property in the database saying that the chunk or the table has no data for this column, at least this happens when adding a new column to the existing table, so the database doesn't update all existing rows if the default value of a new column is NULL. I wonder if there is a fast way to get a result based on this info.

P.S. I use timescaleDB extension and the PK is a timestamp, if this changes anything

Best Answer

As far as I know (but not sure), if there are no data at all for the column, there is some property in the database saying that the chunk or the table has no data for this column, at least this happens when adding a new column to the existing table, so the database doesn't update all existing rows if the default value of a new column is NULL

If the row data ends early, it assumes all column after that for that particular row are NULL, or in newer versions, are whatever DEFAULT was at the time the column was added. But this says nothing at all about what might be in other rows.

You can't avoid scanning the table unless you a had a very strange constraint which forces the values to all be NULL. But you should be able to scan the table just once, not once for every column.

select count(col1), count(co2), count(col3)...count(col100) from the_table;