How are these null values stored in a NOT NULL column

oracleoracle-10gsap-hana

We are replicating tables from SAP ECC 6.0 on HANA into an Oracle 10g warehouse, using SAP SLT.
Since starting this, we have noticed the NOT NULL column definitions from HANA are retained in the Oracle copies of the tables, but HANA stores many values as empty strings. Oracle stores empty (varchar) strings as NULLs and somehow this does not conflict with the NOT NULL column definition (i.e. we have NULL in a column defined as NOT NULL).

Querying these tables is producing strange results:

SELECT COUNT(*) FROM warehouse.table WHERE col IS NULL;
0
SELECT COUNT(*) FROM warehouse.table WHERE col = '';
0
SELECT COUNT(*) FROM warehouse.table GROUP BY NVL(col,'N');
X 503206
N 2377222

So we can tell that there are NULL values in these columns by using NVL or DECODE functions, but querying them is returning odd results.

We do get proper results once we alter the column:

ALTER TABLE warehouse.table MODIFY (col NULL);
Table altered.
SELECT COUNT(*) FROM warehouse.table WHERE col IS NULL;
390986

But of course we can't alter the column back:

ALTER TABLE warehouse.table MODIFY (col NOT NULL);
ERROR at line 1:
ORA-02296: cannot enable (warehouse.) - null values found

I can't tell if this is a problem with Oracle's implementation of empty string storage, or possibly just a quirk of interacting with SAP's SLT replication. It seems Oracle should not allow these rows with '' or NULL values as replication tries to place them there but we have not seen any errors indicating this.

Edit to add query requested by hypercube:

SELECT LENGTH(col) FROM warehouse.table GROUP BY LENGTH(col);
 2377222
1 503206

Best Answer

What you described is not normal.

As already mentioned in comments, in Oracle databases, empty/zero-length strings are treated as NULL.

https://docs.oracle.com/cd/B19306_01/server.102/b14200/sql_elements005.htm#i59110

Note:

Oracle Database currently treats a character value with a length of zero as null. However, this may not continue to be true in future releases, and Oracle recommends that you do not treat empty strings the same as nulls.

Also, WHERE col = '' will never return results, because it is basically WHERE col = NULL.

It is fairly simple to reproduce these wrong results by corrupting the data dictionary manually, but I do not know what caused it in your environment just from this information.

SQL> create table t1 (c1 varchar2(20) not null disable);

Table created.

SQL> select constraint_name from user_constraints where table_name = 'T1';

CONSTRAINT_NAME
------------------------------
SYS_C005148

SQL> insert into t1(c1) values ('');

1 row created.

SQL> commit;

Commit complete.

SQL> select count(*) from t1 where c1 is null;

  COUNT(*)
----------
         1

So far everything is normal.

The database can skip entire steps while executing a SQL statement based on a constraint. If you have an enabled, validated, NOT NULL constraint on col, and your predicate is col is null, the database knows that column can not contain NULL, so it will return 0 rows without actually executing the related steps. If you have an enabled, validated constraint on a column, the NULL$ column in the COL$ dictionary table for that column is set to 1. But even with a disabled, non-validated constraint, corrupting NULL$ is enough for the database to return wrong results.

The proper method for enabling that constraint should be (which obviously fails):

SQL> alter table t1 modify constraint SYS_C005148 enable validate;
alter table t1 modify constraint SYS_C005148 enable validate
                                 *
ERROR at line 1:
ORA-02293: cannot validate (SYS.SYS_C005148) - check constraint violated

Now I set NULL$ manually:

SQL> update col$ set null$ = 1 where obj# = (select object_id from user_objects where object_name = 'T1') and name = 'C1';

1 row updated.

SQL> commit;

Commit complete.

Then run the query again:

SQL> select /*+ gather_plan_statistics */ count(*) from t1 where c1 is null;

  COUNT(*)
----------
         0

SQL> select * from table(dbms_xplan.display_cursor(null, null, 'allstats last'));

PLAN_TABLE_OUTPUT
----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
SQL_ID  fjf8bcs2hhb7b, child number 0
-------------------------------------
select /*+ gather_plan_statistics */ count(*) from t1 where c1 is null

Plan hash value: 4294799605

----------------------------------------------------------------------------
| Id  | Operation           | Name | Starts | E-Rows | A-Rows |   A-Time   |
----------------------------------------------------------------------------
|   0 | SELECT STATEMENT    |      |      1 |        |      1 |00:00:00.01 |
|   1 |  SORT AGGREGATE     |      |      1 |      1 |      1 |00:00:00.01 |
|*  2 |   FILTER            |      |      1 |        |      0 |00:00:00.01 |
|   3 |    TABLE ACCESS FULL| T1   |      0 |      1 |      0 |00:00:00.01 |
----------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   2 - filter(NULL IS NOT NULL)

Note
-----
   - dynamic sampling used for this statement

Returns 0 rows, the table was not accessed at all (Starts = 0 on Operation Id 3), because of the FILTER on Operation Id 2: NULL IS NOT NULL, which obviously to FALSE.

If you use NVL(col, 'N') instead of col, then the database is not able to use this kind of optimization, will access the table and return the correct result:

SQL> select /*+ gather_plan_statistics */ nvl(c1, 'N'), count(*) from t1 group by nvl(c1, 'N');

NVL(C1,'N')            COUNT(*)
-------------------- ----------
N                             1

SQL> select * from table(dbms_xplan.display_cursor(null, null, 'allstats last'));

PLAN_TABLE_OUTPUT
----------------------------------------------------------------------------------------------------------------
SQL_ID  0zfyk18knxtfk, child number 0
-------------------------------------
select /*+ gather_plan_statistics */ nvl(c1, 'N'), count(*) from t1 group by nvl(c1, 'N')

Plan hash value: 136660032

----------------------------------------------------------------------------------------------------------------
| Id  | Operation          | Name | Starts | E-Rows | A-Rows |   A-Time   | Buffers |  OMem |  1Mem | Used-Mem |
----------------------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT   |      |      1 |        |      1 |00:00:00.01 |       3 |       |       |          |
|   1 |  HASH GROUP BY     |      |      1 |      1 |      1 |00:00:00.01 |       3 |  1156K|  1156K|  323K (0)|
|   2 |   TABLE ACCESS FULL| T1   |      1 |      1 |      1 |00:00:00.01 |       3 |       |       |          |
----------------------------------------------------------------------------------------------------------------

Note
-----
   - dynamic sampling used for this statement

No FILTER this time, and the table was accessed (Starts 1 on Operation Id 2 - TABLE ACCESS FULL).

There was a bug in 10g that caused wrong results with the optimization for CHECK constraints with NULL on transitive predicates:

Bug 5462687 - CHECK constraint can cause wrong results (Doc ID 5462687.8)

This will not help me, because I corrupted the dictionary myself. But if your actual query is more complex than you posted (otherwise do not bother with this), and have transitive predicates, you could try the workaround written in this, and disable this behaviour by setting event 10195

$ oerr ora 10195
10195, 00000, "CBO don't use check constraints for transitive predicates"
// *Cause:
// *Action:

Example:

alter session set event '10195 trace name context level 1';

I do not think this caused by how data is stored, rather just something about the constraints.

Related Solutions

Why won’t Oracle alter the size of a column that is used for sub partitioning

The solution is dbms_redefinition package. Basically redefition will move, online, to a new table (called interim table), then move all dependent objets like indexes and last exchange the interim table with the original one.

What you have to do is:

Create the interim table withe correct column definition
Run redefinition
Drop the old table

Use dbms_redefinition in the following way:

-- DETERMINE IF THE ORIGINAL TABLE CAN BE REDEFINED ONLINE
BEGIN
DBMS_REDEFINITION.CAN_REDEF_TABLE('SCHEMA','YOURTABLE', DBMS_REDEFINITION.CONS_USE_ROWID);   
END;
/

-- BEGIN THE REDEFINITION
BEGIN
DBMS_REDEFINITION.START_REDEF_TABLE(
UNAME => 'SCHEMA',
ORIG_TABLE => 'YOURTABLE',
INT_TABLE => 'INTERIM_YOURTABLE'
OPTIONS_FLAG => DBMS_REDEFINITION.CONS_USE_ROWID);
END;
/

-- IF THE TABLE HAD DEPENDENCIES (INDEXES ... CONSTRAINTS ... TRIGGERS)
-- THIS WOULD BE THE POINT AT WHICH THEY WOULD HAVE BEEN COPIED
DECLARE 
error_count pls_integer := 0;
BEGIN
DBMS_REDEFINITION.COPY_TABLE_DEPENDENTS('SCHEMA', 'YOURTABLE', 'INTERIM_YOURTABLE', 
dbms_redefinition.cons_orig_params, TRUE, TRUE, TRUE, FALSE, error_count);
DBMS_OUTPUT.PUT_LINE('errors := ' || TO_CHAR(error_count));
END;
/

-- FINISH THE REDEFINITION
exec DBMS_REDEFINITION.FINISH_REDEF_TABLE('SCHEMA','YOURTABLE','INTERIM_YOURTABLE');

If you run redefinition with SYSTEM user there is no problem. If you want to execute it with a less privileged user you have to trick some privileges in order to get it working. Privileges required are:

Execute privilege to DBMS_REDEFINITION
Create any table
Alter any table
Drop any table
Lock any table
Select any table

Tables with the following characteristics cannot be redefined online:

[9.0.1]Tables with no primary keys
Tables that have materialized view logs defined on them
[9i] Tables that are materialized view container tables and AQ tables
[10g] Tables that are replicated in an n-way master configuration can be redefined, but horizontal subsetting (subset of rows in the table), vertical subsetting (subset of columns in the table), and column transformations are not allowed.
The overflow table of an IOT table
Tables with fine-grained access control (row-level security)
Tables with BFILE columns
Tables with LONG columns can be redefined online, but those columns must be converted to CLOBS. Also, LONG RAW columns must be converted to BLOBS. Tables with LOB columns are acceptable.
Tables in the SYS and SYSTEM schema
Temporary tables

Other restrictions:

A subset of rows in the table
Only simple deterministic expressions, sequences, and SYSDATE can be used when mapping the columns in the interim table to those of the original table. For example, subqueries are not allowed.
If new columns are being added with no column mappings, then they must not be declared NOT NULL until the redefinition is complete.
There cannot be any referential constraints between the table being redefined and the interim table.
Table redefinition cannot be done NOLOGGING.
[10g] For materialized view logs and queue tables, online redefinition is restricted to changes in physical properties.
You cannot convert a nested table to a VARRAY.

Find columns with different values compared with a reference row

This answer shows an implementation of the techniques mentioned by Michael Green in comments to the question. It uses SQL Server, but the basic idea is applicable to any SQL-compatible database engine that supports dynamic SQL. Support for UNPIVOT and PIVOT is not required.

The broad idea is:

Convert rows to key-value pairs
Compare with the reference row to identify differences
Convert the differences to one row per WERKS using dynamic SQL

It is the use of dynamic SQL that gets around the issue of not knowing which columns to SELECT - they are determined dynamically.

Sample table and data

CREATE TABLE #MARC
(
    c1 integer PRIMARY KEY,
    c2 integer NULL,
    c3 integer NULL,
    c4 integer NULL,
    c5 integer NULL
);

INSERT #MARC
    (c1,c2,c3,c4,c5)
VALUES
    (1, 2, 3, 4, 5), -- The reference row
    (2, 2, 3, 4, 5),
    (3, 2, 3, 4, 5),
    (4, 2, 3, 7, 9), -- Differences: 7 in c4; 9 in c5 
    (5, 6, 3, 4, 8); -- Differences: 6 in c2; 8 in c5

Unpivot and find differences

-- Holds differences found (WERKS keys and column key-value pairs)
CREATE TABLE #Differences
(
    c1 integer NOT NULL,
    name varchar(2) NOT NULL,
    value integer NOT NULL
);

WITH 
    Ref AS
    (
        -- Unpivoted reference row
        SELECT 
            M.c1, 
            CA.name,
            CA.value
        FROM #MARC AS M
        CROSS APPLY 
        (
            VALUES
                ('c2', M.c2),
                ('c3', M.c3),
                ('c4', M.c4),
                ('c5', M.c5)
        ) AS CA (name, value)
        WHERE
            M.c1 = 1
    ), 
    NonRef AS
    (
        -- Unpivoted non-reference rows
        SELECT 
            M.c1, 
            CA.name,
            CA.value
        FROM #MARC AS M
        CROSS APPLY 
        (
            VALUES
                ('c2', M.c2),
                ('c3', M.c3),
                ('c4', M.c4),
                ('c5', M.c5)
        ) AS CA (name, value)
        WHERE
            M.c1 <> 1
    ), 
    Differences AS
    (
        -- Find column value differences
        SELECT 
            NonRef.c1, 
            NonRef.name, 
            NonRef.value
        FROM NonRef
        JOIN Ref
            ON Ref.name = NonRef.name      -- Same column name
            AND Ref.value <> NonRef.value  -- Different value
    )
INSERT #Differences
    (c1, name, value)
SELECT
    D.c1,
    D.name,
    D.value
FROM Differences AS D;

Differences

-- Show differences for illustrative purposes
SELECT
    D.c1,
    D.name,
    D.value
FROM #Differences AS D;

╔════╦══════╦═══════╗
║ c1 ║ name ║ value ║
╠════╬══════╬═══════╣
║  4 ║ c4   ║     7 ║
║  4 ║ c5   ║     9 ║
║  5 ║ c2   ║     6 ║
║  5 ║ c5   ║     8 ║
╚════╩══════╩═══════╝

Dynamic SQL

-- Header
DECLARE @sql varchar(max) =
    'SELECT D.c1' + CHAR(13);

-- Add aggregation
WITH D AS
(
    SELECT DISTINCT D2.name
    FROM #Differences AS D2
)
SELECT @sql +=
    (
        SELECT 
            ', MAX(CASE WHEN D.name = ' + CHAR(39) + D.name + CHAR(39) + 
            ' THEN D.value END) AS ' + QUOTENAME(D.name) + CHAR(13) AS [text()]
        FROM D
        FOR XML PATH (''), TYPE
    ).value('(./text())[1]', 'varchar(max)')

-- Footer    
SET @sql +=
    'FROM #Differences AS D' + CHAR(13) +
    'GROUP BY D.c1;'

The constructed SQL statement in @sql is:

SELECT D.c1
, MAX(CASE WHEN D.name = 'c2' THEN D.value END) AS [c2]
, MAX(CASE WHEN D.name = 'c4' THEN D.value END) AS [c4]
, MAX(CASE WHEN D.name = 'c5' THEN D.value END) AS [c5]
FROM #Differences AS D
GROUP BY D.c1;

That is the manual PIVOT syntax. The final result is obtained by executing the SQL:

EXECUTE (@sql);

Results:

╔════╦══════╦══════╦════╗
║ c1 ║  c2  ║  c4  ║ c5 ║
╠════╬══════╬══════╬════╣
║  4 ║ NULL ║ 7    ║  9 ║
║  5 ║ 6    ║ NULL ║  8 ║
╚════╩══════╩══════╩════╝

This is the minimal set of columns describing the column differences with respect to the reference row, as required.

-- Clean up temporary tables
DROP TABLE 
    #Differences,
    #MARC;

There is some SQL Server-specific syntax above, but the manual UNPIVOT can be achieved in other ways if APPLY or LATERAL is not supported e.g. with a cross join to a suitable identity matrix table. Generating the dynamic SQL uses FOR XML PATH as an efficient way to concatenate the strings, but again this can be done in any language.

Stack Exchange Data Explorer Demo