Okay, well, since nobody else chimed in and I needed to get it done, I took one last shot at it. Not quite sure on the why of it all, but I used basically the entire query as a subquery in the FROM clause, and then ran the report against that, and that seemed to do it...
SELECT
[VendorCusts].[VendorPaid],
[VendorCusts].[DE?],
[VendorCusts].[IC?],
[VendorCusts].[AG?],
[VendorCusts].[GB?],
COUNT(VendorCustID) AS [CustCount]
FROM
(
SELECT DISTINCT
[Plan Revenue Expense].[Check To] AS [VendorPaid],
[Support Provider].[DE?],
[Support Provider].[IC?],
[Support Provider].[Agency?] AS [AG?],
[Support Provider].[GeneralBus?] AS [GB?],
Customer.CustID AS [VendorCustID],
Customer.LName AS [VendorCustLName],
Customer.FName AS [VendorCustFName]
FROM
(((Customer
INNER JOIN Plan ON Customer.CustID=Plan.CustID)
INNER JOIN [Plan Revenue] ON [Plan].[Plan ID]=[Plan Revenue].[PlanID])
INNER JOIN [Plan Revenue Expense] ON [Plan Revenue].[Rev ID]=[Plan Revenue Expense].[RevID])
LEFT JOIN [Support Provider] ON [Plan Revenue Expense].[SP]=[Support Provider].[ID]
WHERE
(
(
([Plan Revenue Expense].[First Day]>=[Expense Start "MM/DD/YYYY"])
AND
([Plan Revenue Expense].[First Day]<=[Expense End "MM/DD/YYYY"])
)
OR
(
([Plan Revenue Expense].[Last Day]>=[Expense Start "MM/DD/YYYY"])
AND
([Plan Revenue Expense].[Last Day]<=[Expense End "MM/DD/YYYY"])
)
)
AND NOT
(
[Plan Revenue].[Service]='111' OR
[Plan Revenue].[Service]='222' OR
[Plan Revenue].[Service] LIKE '333*'
)
AND NOT Customer.[Inactive?]=TRUE
ORDER BY
[Plan Revenue Expense].[Check To],
Customer.LName,
Customer.FName
) AS [VendorCusts]
GROUP BY
[VendorCusts].[VendorPaid],
[VendorCusts].[DE?],
[VendorCusts].[IC?],
[VendorCusts].[AG?],
[VendorCusts].[GB?]
HAVING
NOT ([VendorCusts].[DE?]=TRUE OR [VendorCusts].[IC?]=TRUE)
ORDER BY
[VendorCusts].[DE?] ASC,
[VendorCusts].[IC?] ASC,
[VendorCusts].[AG?] ASC,
[VendorCusts].[GB?] ASC,
[VendorCusts].[VendorPaid] ASC;
This is more or less what I finally came up with and it appears to get the job done. Checked a few data points and they seemed to agree with the slightly more verbose version w/o a Count() function. So, I think the results are good.
Then I abstracted out the start and end of the date range so I could run a few different versions. And added a few other parameters I needed...
Seems like it's working as intended, if a bit kludge-y. Sometimes it's better to be 'right' than 'pretty.' ;)
Well, after lots of experimentation, we have found that, yes, if you have a fat table with not enough rows to fill much of a block at the slice level using DISTSTYLE KEY, it is much more efficient, space-wise, to use DISTSTYLE ALL, even though this replicates more data.
For example, our fat tables have, say, 600000 rows and 1000 columns. Then, on a 5-node 32-slice setup, they were using up 1Mb per column per node per slice, or 160 Gb, and we were only filling less than 1% of that space.
DISTSTYLE ALL appears to not use all the slices.
So using DISTSTYLE ALL, we multiplied the data by 5, but this still takes up less than 1/20th of the space that the DISTSTYLE KEY tables did.
It slows down our queries a little (due to the need for IPC between the slices, and because the DISTSTYLE KEY means we have only the data on the node that matches our other data) but, since these tables are loaded once and never change after, we don't suffer from the other problems of DISTSTYLE ALL.
Best Answer
Given the following table:
The SELECT COUNT from the original poster returns 0, the second one returns 2, and the last one returns 1.
Let's see why:
If you take out the counts and retrieve the rows, we get:
(1)
(that is: nothing. The count, in this case, is clearly 0)
(2)
(there are 3 rows, but the count is 2, because you don't count NULL)
And last,
(3)
(There are 2 rows, but Count is 1, because, again, you don't count NULL).
You have to take into account that, according to SQL standard:
If
a IS NULL
, you getNOT (NULL = x OR NULL = y OR NULL = z)
which isNULL
, regardless of the values ofx
,y
orz
.