I'm pretty new to MySQL (used to programming in R/Stata), and getting stuck on this – I'm probably going about it the wrong way. Any pointers would be appreciated!
I have a large database (a couple million rows) and want to select records where at least two fields from a selection of variables are non-missing (not NULL). I can find lots on counting missing values by column, but not by row.
As an example, I would like to write a select statement for the table below, based only on var1-var5, that would select ID 1 and 3 as they have at least 2 non-NULL values.
ID | var1 | var2 | var3 | var4 | var5 | var6 | var7
1 | NULL | 4 | X | NULL | NULL | NULL | R69
2 | NULL | NULL | NULL | NULL | 1 | X | J10
3 | 4 | 5 | 23 | 1jan03| 2 | 22 | Z38
I was thinking creating a user-defined variable to count the number of non-missing values and delete rows where this is <2, but it doesn't seem to work.
SET @include = 0;
SELECT var1, var2, var3, var4, var5 FROM db1;
@include := @include + 1 if var1 is not null;
DELETE FROM db1 WHERE @include<2;
I'm guessing this is something I can do with a WHERE statement in the SELECT line, but I can only think of writing it so it only selects rows where everything is non-missing, not at least 2 values.
Best Answer
To get only the rows that have at least 2 non-null values in the 5 columns:
To delete the rows that have less than 2 non-null values, we can just use the reverse condition:
The
(varX IS NOT NULL)
is just a shorter form of the standard but more complicated:Works because MySQL translates
TRUE
to1
andFALSE
to0
.