How to Help SQL Server Recognize Indexed View Column as NOT NULL

constraintnullsql-server-2008view

I have the following indexed view defined in SQL Server 2008 (you can download a working schema from gist for testing purposes):

CREATE VIEW dbo.balances
WITH SCHEMABINDING
AS
SELECT
      user_id
    , currency_id

    , SUM(transaction_amount)   AS balance_amount
    , COUNT_BIG(*)              AS transaction_count
FROM dbo.transactions
GROUP BY
      user_id
    , currency_id
;
GO

CREATE UNIQUE CLUSTERED INDEX UQ_balances_user_id_currency_id
ON dbo.balances (
      user_id
    , currency_id
);
GO

user_id, currency_id, and transaction_amount are all defined as NOT NULL columns in dbo.transactions. However, when I look at the view definition in Management Studio's Object Explorer, it marks both balance_amount and transaction_count as NULL-able columns in the view.

I've taken a look at several discussions, this one being the most relevant of them, that suggest some shuffling of functions may help SQL Server recognize that a view column is always NOT NULL. No such shuffling is possible in my case, though, since expressions on aggregate functions (e.g. an ISNULL() over the SUM()) are not allowed in indexed views.

Is there any way I can help SQL Server recognize that balance_amount and transaction_countare NOT NULL-able?
If not, should I have any concerns about these columns being mistakenly identified as NULL-able?

The two concerns I could think of are:
- Any application objects mapped to the balances view are getting an incorrect definition of a balance.
- In very limited cases, certain optimizations are not available to the Query Optimizer since it does not have a guarantee from the view that these two columns are NOT NULL.
Is either of these concerns a big deal? Are there any other concerns I should keep in mind?

Best Answer

user_id, currency_id, and transaction_amount are all defined as NOT NULL columns in dbo.transactions

It looks to me that SQL Server has a blanket assumption that an aggregate can produce a null even if the field(s) it operates on are not null. This is obviously true in certain cases:

create table foo(bar integer not null);
select sum(bar) from foo
-- returns 1 row with `null` field

And is also true in the generalized versions of group by like cube

This simpler test case illustrates the point that any aggregate is interpreted as being nullable:

CREATE VIEW dbo.balances
with schemabinding
AS
SELECT
      user_id
    , sum(1)   AS balance_amount
FROM dbo.transactions
GROUP BY
      user_id
;
GO

IMO this is a limitation (albeit a minor one) of SQL Server - some other RDBMSs allow the creation of certain constraints on views that are not enforced and exist only to give clues to the optimizer, though I think 'uniqueness' is more likely to help in generating a good query plan than 'nullability'

If the nullability of the column is important, perhaps for use with an ORM, consider wrapping the indexed view in another view that simply guarantees the non-nullability using ISNULL:

CREATE VIEW dbo.balancesORM
WITH SCHEMABINDING
AS
SELECT 
    B.[user_id],
    B.currency_id,
    balance_amount = ISNULL(B.balance_amount, 0),
    transaction_count = ISNULL(B.transaction_count, 0)
FROM dbo.balances AS B;

Related Solutions

Sql-server – Search on part of a column in a SQL Server cross-database view

Are you allowed to replicate that table in another database? Using transactional replication, for latest data access. If yes, then I'd test a more cumbersome process like replicating the needed table, building another table or materialized view with a computed column as you need.

If replication is not an option, I'd say that you need to establish a scheduled process to take that table in your database and continue locally to process the data. Exactly like the first suggestion, but using your own scripts and schedule.

That, or I think you're stick to the scalar functions that aren't really the best option from performance point of view.

Sql-server – Writing a simple bank schema: How should I keep the balances in sync with their transaction history

I am not familiar with accounting, but I solved some similar problems in inventory-type environments. I store running totals in the same row with the transaction. I am using constraints, so that my data is never wrong even under high concurrency. I have written the following solution back then in 2009::

Calculating running totals is notoriously slow, whether you do it with a cursor or with a triangular join. It is very tempting to denormalize, to store running totals in a column, especially if you select it frequently. However, as usual when you denormalize, you need to guarantee the integrity of your denormalized data. Fortunately, you can guarantee the integrity of running totals with constraints – as long as all your constraints are trusted, all your running totals are correct. Also this way you can easily ensure that the current balance (running totals) is never negative - enforcing by other methods can also be very slow. The following script demonstrates the technique.

CREATE TABLE Data.Inventory(InventoryID INT NOT NULL IDENTITY,
  ItemID INT NOT NULL,
  ChangeDate DATETIME NOT NULL,
  ChangeQty INT NOT NULL,
  TotalQty INT NOT NULL,
  PreviousChangeDate DATETIME NULL,
  PreviousTotalQty INT NULL,
  CONSTRAINT PK_Inventory PRIMARY KEY(ItemID, ChangeDate),
  CONSTRAINT UNQ_Inventory UNIQUE(ItemID, ChangeDate, TotalQty),
  CONSTRAINT UNQ_Inventory_Previous_Columns 
     UNIQUE(ItemID, PreviousChangeDate, PreviousTotalQty),
  CONSTRAINT FK_Inventory_Self FOREIGN KEY(ItemID, PreviousChangeDate, PreviousTotalQty)
    REFERENCES Data.Inventory(ItemID, ChangeDate, TotalQty),
  CONSTRAINT CHK_Inventory_Valid_TotalQty CHECK(
         TotalQty >= 0 
     AND (TotalQty = COALESCE(PreviousTotalQty, 0) + ChangeQty)
  ),
  CONSTRAINT CHK_Inventory_Valid_Dates_Sequence CHECK(PreviousChangeDate < ChangeDate),
  CONSTRAINT CHK_Inventory_Valid_Previous_Columns CHECK(
        (PreviousChangeDate IS NULL AND PreviousTotalQty IS NULL)
     OR (PreviousChangeDate IS NOT NULL AND PreviousTotalQty IS NOT NULL)
  )
);

-- beginning of inventory for item 1
INSERT INTO Data.Inventory(ItemID,
  ChangeDate,
  ChangeQty,
  TotalQty,
  PreviousChangeDate,
  PreviousTotalQty)
VALUES(1, '20090101', 10, 10, NULL, NULL);

-- cannot begin the inventory for the second time for the same item 1
INSERT INTO Data.Inventory(ItemID,
  ChangeDate,
  ChangeQty,
  TotalQty,
  PreviousChangeDate,
  PreviousTotalQty)
VALUES(1, '20090102', 10, 10, NULL, NULL);


Msg 2627, Level 14, State 1, Line 10

Violation of UNIQUE KEY constraint 'UNQ_Inventory_Previous_Columns'. 
Cannot insert duplicate key in object 'Data.Inventory'.

The statement has been terminated.


-- add more
DECLARE @ChangeQty INT;
SET @ChangeQty = 5;

INSERT INTO Data.Inventory(ItemID,
  ChangeDate,
  ChangeQty,
  TotalQty,
  PreviousChangeDate,
  PreviousTotalQty)

SELECT TOP 1 ItemID, '20090103', @ChangeQty, TotalQty + @ChangeQty, ChangeDate, TotalQty
  FROM Data.Inventory
  WHERE ItemID = 1
  ORDER BY ChangeDate DESC;

SET @ChangeQty = 3;

INSERT INTO Data.Inventory(ItemID,
  ChangeDate,
  ChangeQty,
  TotalQty,
  PreviousChangeDate,
  PreviousTotalQty)

SELECT TOP 1 ItemID, '20090104', @ChangeQty, TotalQty + @ChangeQty, ChangeDate, TotalQty
  FROM Data.Inventory
  WHERE ItemID = 1
  ORDER BY ChangeDate DESC;

SET @ChangeQty = -4;

INSERT INTO Data.Inventory(ItemID,
  ChangeDate,
  ChangeQty,
  TotalQty,
  PreviousChangeDate,
  PreviousTotalQty)

SELECT TOP 1 ItemID, '20090105', @ChangeQty, TotalQty + @ChangeQty, ChangeDate, TotalQty
  FROM Data.Inventory
  WHERE ItemID = 1
  ORDER BY ChangeDate DESC;

-- try to violate chronological order
SET @ChangeQty = 5;

INSERT INTO Data.Inventory(ItemID,
  ChangeDate,
  ChangeQty,
  TotalQty,
  PreviousChangeDate,
  PreviousTotalQty)

SELECT TOP 1 ItemID, '20081231', @ChangeQty, TotalQty + @ChangeQty, ChangeDate, TotalQty
  FROM Data.Inventory
  WHERE ItemID = 1
  ORDER BY ChangeDate DESC;

Msg 547, Level 16, State 0, Line 4

The INSERT statement conflicted with the CHECK constraint 
"CHK_Inventory_Valid_Dates_Sequence". 
The conflict occurred in database "Test", table "Data.Inventory".

The statement has been terminated.

SELECT ChangeDate,
  ChangeQty,
  TotalQty,
  PreviousChangeDate,
  PreviousTotalQty
FROM Data.Inventory ORDER BY ChangeDate;

ChangeDate              ChangeQty   TotalQty    PreviousChangeDate      PreviousTotalQty
----------------------- ----------- ----------- ----------------------- -----
2009-01-01 00:00:00.000 10          10          NULL                    NULL
2009-01-03 00:00:00.000 5           15          2009-01-01 00:00:00.000 10
2009-01-04 00:00:00.000 3           18          2009-01-03 00:00:00.000 15
2009-01-05 00:00:00.000 -4          14          2009-01-04 00:00:00.000 18


-- try to change a single row, all updates must fail
UPDATE Data.Inventory SET ChangeQty = ChangeQty + 2 WHERE InventoryID = 3;
UPDATE Data.Inventory SET TotalQty = TotalQty + 2 WHERE InventoryID = 3;

-- try to delete not the last row, all deletes must fail
DELETE FROM Data.Inventory WHERE InventoryID = 1;
DELETE FROM Data.Inventory WHERE InventoryID = 3;

-- the right way to update
DECLARE @IncreaseQty INT;

SET @IncreaseQty = 2;

UPDATE Data.Inventory 
SET 
     ChangeQty = ChangeQty 
   + CASE 
        WHEN ItemID = 1 AND ChangeDate = '20090103' 
        THEN @IncreaseQty 
        ELSE 0 
     END,
  TotalQty = TotalQty + @IncreaseQty,
  PreviousTotalQty = PreviousTotalQty + 
     CASE 
        WHEN ItemID = 1 AND ChangeDate = '20090103' 
        THEN 0 
        ELSE @IncreaseQty 
     END
WHERE ItemID = 1 AND ChangeDate >= '20090103';

SELECT ChangeDate,
  ChangeQty,
  TotalQty,
  PreviousChangeDate,
  PreviousTotalQty
FROM Data.Inventory ORDER BY ChangeDate;

ChangeDate              ChangeQty   TotalQty    PreviousChangeDate      PreviousTotalQty
----------------------- ----------- ----------- ----------------------- ----------------
2009-01-01 00:00:00.000 10          10          NULL                    NULL
2009-01-03 00:00:00.000 7           17          2009-01-01 00:00:00.000 10
2009-01-04 00:00:00.000 3           20          2009-01-03 00:00:00.000 17
2009-01-05 00:00:00.000 -4          16          2009-01-04 00:00:00.000 20

Best Answer

Related Solutions

Sql-server – Search on part of a column in a SQL Server cross-database view

Sql-server – Writing a simple bank schema: How should I keep the balances in sync with their transaction history

Related Question