Check for spaces or zeros in all columns in all tables

data integrityoracleoracle-18c

I have multiple tables in an Oracle schema that have columns with incorrect values:

  • Number columns where the value is zero, instead of null.
  • Text columns where the value is a space, instead of null.

I’m in the process of cleaning up the data. I will eventually create constraints on the columns to prevent bad data being entered in the first place. But before I do that, I need to find the columns that have existing problems and investigate them.

To start, I would like to produce a list of columns that have incorrect values.

  • For each table in a schema, provide a list of columns that have zeros or spaces as values.
table_name    column_name    datatype    value    count
—————————————————————————————————-————————————-———————-
roads         width          number      0        500
sidewalks     description    varchar2    [space]  10000
sidewalks     const_year     number      0        2000

Question:

What would it take to produce a list like this in Oracle?

(Note: I’m a public works technician by day, not a database administrator. Layman’s terms would be appreciated.)

Best Answer

You should look into querying the information schema based tables (such as ALL_TAB_COLUMNS or DBA_TAB_COLUMNS) to get a list of all tables and all columns: https://docs.oracle.com/en/database/oracle/oracle-database/18/refrn/ALL_TAB_COLUMNS.html#GUID-F218205C-7D76-4A83-8691-BFD2AD372B63

https://docs.oracle.com/en/database/oracle/oracle-database/18/refrn/DBA_TAB_COLUMNS.html#GUID-91F945AE-5778-45A6-A07E-775A006A9AA1

Then look into how to write dynamic SQL to easily generate the code you need to query every column of every table: https://docs.oracle.com/en/database/oracle/oracle-database/18/lnpls/dynamic-sql.html#GUID-7E2F596F-9CA3-4DC8-8333-0C117962DB73

Here is some psuedo-code on what your thought process should be when creating the dynamic SQL (*note this is not Oracle PL/SQL syntax, so just take it as a general example on how to build dynamic SQL, and focus mostly on step 3):

-- 1) Create a temp table to hold the results
DROP TABLE IF EXISTS #Results
CREATE TABLE #Results (TableName VARCHAR(100), ColumnName VARCHAR(100), [Value] VARCHAR(100) NULL)

-- 2) Dynamic SQL variable used to build the dynamic SQL string
DECLARE @DynamicSQL AS NVARCHAR(MAX) = ''

-- 3) Build the dynamic SQL string using the DBA_TAB_COLUMNS information schema view. First query gets any row whose column has 0 as a value, second query gets any row whose column has a NULL value
SELECT @DynamicSQL = @DynamicSQL + '
    INSERT INTO #Results 
    SELECT ''' + TABLE_NAME+ ''' AS TableName, ''' + COLUMN_NAME + ''' AS ColumnName, [' + COLUMN_NAME + '] AS ColumnValue 
    FROM [' + TABLE_NAME+ '] 
    WHERE [' + COLUMN_NAME + '] = ''0'';' 
    + '
    INSERT INTO #Results 
    SELECT ''' + TABLE_NAME + ''' AS TableName, ''' + COLUMN_NAME + ''' AS ColumnName, [' + COLUMN_NAME + '] AS ColumnValue 
    FROM [' + TABLE_NAME + '] 
    WHERE [' + COLUMN_NAME + '] IS NULL;'
FROM sys.DBA_TAB_COLUMNS

-- 4) Execute the dynamic SQL string
EXEC sp_ExecuteSQL @DynamicSQL

-- 5) Select the final results, grouping up on TableName, ColumnName, and Value to get the counts of 0 and NULL values per column per table
SELECT R.TableName, R.ColumnName, TC.DATA_TYPE AS DataType, R.[Value], COUNT(1) AS [Count]
FROM #Results AS R
INNER JOIN sys.DBA_TAB_COLUMNS AS TC
    ON R.TableName = TC.TableName
    AND R.ColumnName = R.ColumnName
GROUP BY R.TableName, R.ColumnName, R.[Value]