Mysql – Select only rows with at least 2 non-missing values

MySQLselect

I'm pretty new to MySQL (used to programming in R/Stata), and getting stuck on this – I'm probably going about it the wrong way. Any pointers would be appreciated!

I have a large database (a couple million rows) and want to select records where at least two fields from a selection of variables are non-missing (not NULL). I can find lots on counting missing values by column, but not by row.

As an example, I would like to write a select statement for the table below, based only on var1-var5, that would select ID 1 and 3 as they have at least 2 non-NULL values.

ID | var1 | var2 | var3 | var4  | var5 | var6 | var7
1  | NULL | 4    | X    | NULL  | NULL | NULL | R69
2  | NULL | NULL | NULL | NULL  | 1    | X    | J10
3  | 4    | 5    | 23   | 1jan03| 2    | 22   | Z38

I was thinking creating a user-defined variable to count the number of non-missing values and delete rows where this is <2, but it doesn't seem to work.

SET @include = 0;
SELECT var1, var2, var3, var4, var5 FROM db1;
@include := @include + 1 if var1 is not null;
DELETE FROM db1 WHERE @include<2;

I'm guessing this is something I can do with a WHERE statement in the SELECT line, but I can only think of writing it so it only selects rows where everything is non-missing, not at least 2 values.

Best Answer

To get only the rows that have at least 2 non-null values in the 5 columns:

SELECT *
FROM db1
WHERE (var1 IS NOT NULL) + (var2 IS NOT NULL) 
      + (var3 IS NOT NULL) + (var4 IS NOT NULL) 
      + (var5 IS NOT NULL) >= 2 ;

To delete the rows that have less than 2 non-null values, we can just use the reverse condition:

DELETE FROM db1
WHERE (var1 IS NOT NULL) + (var2 IS NOT NULL) 
      + (var3 IS NOT NULL) + (var4 IS NOT NULL) 
      + (var5 IS NOT NULL) < 2 ;

The (varX IS NOT NULL) is just a shorter form of the standard but more complicated:

 CASE WHEN varX IS NOT NULL THEN 1 ELSE 0 END 

Works because MySQL translates TRUE to 1 and FALSE to 0.