SQL Server – Why Am I Getting ANSI Warning About NULL Values and Aggregate Functions?

aggregatesql serversql-server-2008-r2

This is the exact warning message:

Warning: Null value is eliminated by an aggregate or other SET operation.

I understand what it means in general, i.e. that there's a null value in one or more rows for a column to which an aggregate function is applied, but I can't quite figure out why it's happening for my particular query or how to rewrite it such that I don't get the warning.

This is the query. The table tracks unit responses to an incident, and the purpose of the query is to find the last unit to arrive that was among the first units to be notified of the incident. Notification times may vary, and not all units will arrive.

SELECT inci_no
    ,unit
    ,notif_dttm
    ,arv_dttm
    ,resp_code
    ,alm_date
FROM (
    SELECT inci_no
        ,alm_date
        ,unit
        ,notif_dttm
        ,arv_dttm
        ,resp_code
        ,row_num = row_number() OVER (PARTITION BY inci_no ORDER BY arv_dttm DESC, unit_id ASC)
    FROM inc_unit AS u
    WHERE resp_code = 'E'
        AND arv_dttm IS NOT NULL
        AND notif_dttm <= (
            SELECT min(notif_dttm)
            FROM inc_unit AS notif
            WHERE notif.inci_no = u.inci_no
                --AND notif.arv_dttm IS NOT NULL
            GROUP BY notif.inci_no
            )
    ) AS erf_partition
WHERE row_num = 1
  and alm_date >= '2016-01-01'

The warning started appearing when I corrected a logical error that was returning incorrect results. That's the commented line above. When it's removed, I get the warning, but when it's included I do not get the warning.

At first I thought it was tied to the window function, but now I think it's more likely to be the min(notif_dttm) in the innermost sub-query. I think the line I removed from the sub-query just happened to be filtering out rows w/ NULL in the notif_dttm column. Logically, I shouldn't get any such rows because of the date filter on the last line.

The worst part is that I can't reproduce the problem on a smaller data set. I scripted out a simplified schema and some sample data to post with this question, but I can't reproduce the error, which makes me think I don't really understand the root of the problem.

I'm familiar with logical processing order for queries (https://msdn.microsoft.com/en-us/library/ms189499.aspx) but I can't quite parse the order with the sub-query and window function. And I'm just barely literate when it comes to reading execution plans, but what it looks like to me is that the sub-query is grouping and calculating the min() on the whole table and then joining it back to the outer query (which has the predicate that would eliminate the rows with null values causing the warning in the sub-query).

So my specific question is: is that what's really happening? And if so, will adding a notif_dttm is not null clause to the sub-query fix the warning associated with the min(notif_dttm) function, regardless of whatever plan the query optimizer happens to come up with in the future?

(the reason the warning itself is an issue is because this query is part of as SSIS package and the warning causes the package to fail. I know I could just set ansi_warnings off and be done with it, but I'd rather understand the root of the problem.)

I pulled the error from the progress/output window of BIDS (this is an old 2008 R2 instance); there's no SQL Agent history because I haven't deployed the update.

I tried to post the execution plan XML, but it makes the post too long, so here's a screen cap of (most of) it.

enter image description here

UPDATE

Kenneth's answer helped me identify the rows that were actually causing the problem, which turned out to be only a small subset of the rows that I thought were causing the problem. It's definitely NULLs being included in the min(notif_dttm) function, but I want to make sure I understand why.

In terms of the Logical Processing Order for the two innermost sub-queries, is it accurate to say that inc_unit as u and inc_unit as notif are joined, then each of their WHERE clauses are applied, and then each of their SELECT clauses are applied?

That would explain why eliminating rows w/ NULLs in either of the WHERE clauses (as in my original query, albeit only as a side-effect) prevents NULL values from being included in the min() function of the innermost one. I think I was assuming that select min() of the innermost sub-query would be evaluated before it was joined with it's containing sub-query, i.e. this whole sub-query

SELECT min(notif_dttm)
FROM inc_unit AS notif
WHERE notif.inci_no = u.inci_no
GROUP BY notif.inci_no

would be evaluated, and the WHERE part would effectively be deferred until the results could be joined with the containing query. And that's why I couldn't reproduce the error with simplified sample data – I had NULLs in my data, just not the right rows with the right NULLs. 🙂

Best Answer

From what I can see you understand the problem perfectly. It sounds like min(notif_dttm) is causing you the grief. You can check this pretty easily by running code like this:

SELECT *
FROM inc_unit AS notif
WHERE notif_dttm IS NULL
  AND EXISTS (SELECT 1 FROM inc_unit u
                WHERE notif.inci_no = u.inci_no
                  AND resp_code = 'E'
                  AND arv_dttm IS NOT NULL)

If you get any rows then that's probably your problem.

Here is some easy sample code that will reproduce the warning you are getting.

CREATE TABLE #test (nullableCol1 int)
INSERT INTO #test VALUES (NULL),(1),(2)
SELECT MIN(nullableCol1) FROM #test

As far as your problem goes there is a property in SSIS that tells it how many errors are acceptable. It's called MaximumErrorCount. Increase that for that object and each of it's containers. That way you can have the warning (which isn't really a big deal) and still have your code move on correctly.

Edit Additional order by information you wanted can be found here.

But basically SQL processes a query in this order:

  1. FROM
  2. ON
  3. OUTER
  4. WHERE
  5. GROUP BY
  6. CUBE | ROLLUP
  7. HAVING
  8. SELECT
  9. DISTINCT
  10. ORDER BY
  11. TOP

So if I take your subquery and place it in the order it's processed it would look like this

        FROM inc_unit AS notif                  -- Find the table
        WHERE notif.inci_no = u.inci_no         -- Restrict the rows used
            --AND notif.arv_dttm IS NOT NULL    -- 
        GROUP BY notif.inci_no                  -- Group up what's left
        SELECT min(notif_dttm)                  -- Process any aggregates

You can see it's a pretty logical progression. We can't GROUP BY until we remove the appropriate rows using the WHERE clause. And then we can't check the MIN until we have finished grouping everything together.