Postgresql – Find tables with columns with empty and NULL values in Postgresql

database-designnullplpgsqlpostgresqlschema

After some research I found an example where I can find tables and its columns with NULL values. But the function also returns true when the column is empty. It gives me errors when I try to add an or statement. How can I modify the function so it returns also true when the column contains blank values? This is the function I found:

create function has_nulls(p_schema in text, p_table in text, p_column in text)
                returns boolean language plpgsql as $$
declare 
  b boolean;
begin
  execute 'select exists(select * from '||
          p_table||' where '||p_column||' is null)' into b;
  return b;
end;$$;

Best Answer

Assuming "empty" and "blank values" means empty string ('').

This function checks whether the passed table has any NULL or empty values ('') in the passed column (which must be a string type or some other type where the empty string is valid; not the case for numeric types for instance):

CREATE FUNCTION f_has_missing(_tbl regclass, _col text, OUT has_missing boolean)
  LANGUAGE plpgsql AS
$func$
BEGIN
   EXECUTE
   format($$SELECT EXISTS (SELECT FROM %s WHERE %2$I = '' OR %2$I IS NULL)$$, _tbl, _col)
   INTO has_missing;
END
$func$;

Call:

SELECT f_has_missing('tablename', 'column')

Or, optionally, schema-qualified:

SELECT f_has_missing('schema.tablename', 'column')

db<>fiddle here

Probably most important: Never concatenate parameter values into SQL code blindly. That's begging for SQL injection. I sanitized the code with format(). See:

There are a couple of smart expressions to check for both:

(string_col = '') IS NOT FALSE
(string_col <> '') IS NOT TRUE
coalesce(string_col, '') = ''

See:

But I chose this plain and more verbose expression for two reason:

string_col = '' OR string_col IS NULL

I have grown fond of simple, obvious code, and none of the above is as clear as this. But more importantly, this expression can use an index on (string_col), while the above cannot - which makes a big difference for big tables. See the added demo in the fiddle!

db<>fiddle here

Obviously, we want a different name for the function than "has_nulls", now.

I use an OUT parameter for convenience and short code. Now we can assign to it and be done.

And I use regclass as IN parameter for the table name. This way I can provide a schema explicitly or not. Again, see:

Aside, one might loop through all columns of a table or a whole db to find any such column at once. Related:

Related Question