Postgresql – Find tables with columns with empty and NULL values in Postgresql

database-designnullplpgsqlpostgresqlschema

After some research I found an example where I can find tables and its columns with NULL values. But the function also returns true when the column is empty. It gives me errors when I try to add an or statement. How can I modify the function so it returns also true when the column contains blank values? This is the function I found:

create function has_nulls(p_schema in text, p_table in text, p_column in text)
                returns boolean language plpgsql as $$
declare 
  b boolean;
begin
  execute 'select exists(select * from '||
          p_table||' where '||p_column||' is null)' into b;
  return b;
end;$$;

Best Answer

Assuming "empty" and "blank values" means empty string ('').

This function checks whether the passed table has any NULL or empty values ('') in the passed column (which must be a string type or some other type where the empty string is valid; not the case for numeric types for instance):

CREATE FUNCTION f_has_missing(_tbl regclass, _col text, OUT has_missing boolean)
  LANGUAGE plpgsql AS
$func$
BEGIN
   EXECUTE
   format($$SELECT EXISTS (SELECT FROM %s WHERE %2$I = '' OR %2$I IS NULL)$$, _tbl, _col)
   INTO has_missing;
END
$func$;

Call:

SELECT f_has_missing('tablename', 'column')

Or, optionally, schema-qualified:

SELECT f_has_missing('schema.tablename', 'column')

db<>fiddle here

Probably most important: Never concatenate parameter values into SQL code blindly. That's begging for SQL injection. I sanitized the code with format(). See:

Table name as a PostgreSQL function parameter

There are a couple of smart expressions to check for both:

(string_col = '') IS NOT FALSE
(string_col <> '') IS NOT TRUE
coalesce(string_col, '') = ''

See:

Best way to check for “empty or null value”

But I chose this plain and more verbose expression for two reason:

string_col = '' OR string_col IS NULL

I have grown fond of simple, obvious code, and none of the above is as clear as this. But more importantly, this expression can use an index on (string_col), while the above cannot - which makes a big difference for big tables. See the added demo in the fiddle!

db<>fiddle here

Obviously, we want a different name for the function than "has_nulls", now.

I use an OUT parameter for convenience and short code. Now we can assign to it and be done.

And I use regclass as IN parameter for the table name. This way I can provide a schema explicitly or not. Again, see:

Table name as a PostgreSQL function parameter

Aside, one might loop through all columns of a table or a whole db to find any such column at once. Related:

Major points

Do not use a separate code block without need. Removed the spurious BEGIN .. END.
NEW.baking_instructions_ <> '' is a 100% identical drop-in replacement for
length( NEW.baking_instructions_ ) > 0. Just shorter and faster.
The plpgsql assignment operator is :=, not =.
Removed multiple parentheses to demonstrate they are just noise.

Simplify

Given the trigger definition in your answer (which should be in your question), you can further simplify the trigger function.

If your trigger function is not intended to be generic but for a certain trigger on a certain table (which is the normal case) I suggest to reflect that in the function name instead of adding code and error messages to the function body. Much cheaper and cleaner (IMHO), but that's a matter of taste and style:

CREATE OR REPLACE FUNCTION trg_insup_bef_recipe()
  RETURNS TRIGGER AS
$func$
BEGIN
NEW.is_baked_   := NEW.baking_instructions_ <> '';
NEW.is_roasted_ := NEW.roasting_instructions_ <> '';
END
$func$ LANGUAGE plpgsql;

And since this trigger only has work to do when any of the involved columns have been changed, you only need to trigger it in those cases:

CREATE TRIGGER insup_bef_recipe
BEFORE INSERT
OR UPDATE OF is_baked_, baking_instructions_, is_roasted_, roasting_instructions_
ON recipe_
FOR EACH ROW EXECUTE PROCEDURE trg_insup_bef_recipe();

Simplify further

Do you really need those flags materialized as additional columns? Could be completely (and more reliably) replaced with simple expressions in your queries:

baking_instructions_ <> '' AS is_baked_

Then you don't need a trigger at all. You could create a view with that expression. Or even "generated field". Detailed instructions:
Store common query as column? Computed / calculated columns in PostgreSQL

PostgreSQL – PL/pgSQL Function to Replace Spaces with Null After Insert Using Trigger

Just modify your SET clause and use CASE ... WHEN ... END:

CREATE OR REPLACE FUNCTION delete_space() RETURNS TRIGGER AS $_$
BEGIN
UPDATE proj.general
SET deliv1 = case when deliv1 ='' then NULL else deliv1 end,
SET deliv1 = case when deliv2 ='' then NULL else deliv2 end,
SET deliv1 = case when fmt ='' then NULL else fmt end,
SET deliv1 = case when spec_instr ='' then NULL else spec_instr end
where general.deliv1 = ''
or general.deliv2 = ''
or general.fmt = ''
or general.spec_instr = '';
RETURN NULL;
END $_$ LANGUAGE 'plpgsql';

Best Answer

Related Solutions

Postgresql – Basic trigger in PL/pgSQL, to flip a boolean field if another field has text

Major points

Simplify

Simplify further

PostgreSQL – PL/pgSQL Function to Replace Spaces with Null After Insert Using Trigger

Related Question