Sql-server – How to exclude non-distinct rows in a query

sql-server-2012

I have a table that contains references to clients impacted in a problem. The parent table contains the problem info, specifically the ProblemID (PbMID). Since one problem can affect multiple clients, we store the client impacted data in a child table. The child table contains an ID field for housekeeping, a PbMID field which foreign keys back to the parent table, and a Company field containing the text name of the client.
I have a requirement to pull all the problems were a SINGLE client was impacted. If I use DISTINCT, I get all single client rows, but I also get the FIRST row of a multi-client problem, which is not what I'm being asked for.

Here's the client impacted table example

ID  | PbMID | Company    | 
1   | 1     | Company 1  | Valid
2   | 4     | Company 2  | Valid
3   | 6     | Company 3  | Valid
4   | 22    | Company 1  | Invalid
5   | 22    | Company 4  | Invalid
6   | 23    | Company 5  | Valid
7   | 24    | Company 6  | Valid
8   | 25    | Company 1  | Invalid
9   | 25    | Company 8  | Invalid
10  | 25    | Company 10 | Invalid
11  | 26    | Company 2  | Valid
12  | 27    | Company 4  | Valid

The rows marked INVALID would not be included, since they reflect multi-client problems.

So, ideally, the return would be:

ID  | PbMID | Company    | 
1   | 1     | Company 1  | Valid
2   | 4     | Company 2  | Valid
3   | 6     | Company 3  | Valid
6   | 23    | Company 5  | Valid
7   | 24    | Company 6  | Valid
11  | 26    | Company 2  | Valid
12  | 27    | Company 4  | Valid

Any help would be greatly appreciated. SQL isn't my forte, so I've been trying to wrap my head around this with no luck.

Best Answer

You could use GROUP BY, HAVING and a Common Table Expression (CTE) to obtain the data.

The GROUP BY and HAVING provides all those PbMIDs that only impacted a single company. If you need PbMIDs that impacted n companies you could change the HAVING to HAVING =n where n is the required number of companies.

SELECT
    PbMID
FROM
    ChildTable
GROUP BY PbMId
HAVING COUNT(Company) =1;

This can then be combined with a CTE to produce the final query below.

WITH CTE_SingleInstance (PbMIDSinglInstance)
AS
(
    SELECT
        PbMID
    FROM
        ChildTable
    GROUP BY PbMId
    HAVING COUNT(Company) =1    
)
SELECT
    ChildTable.ID,
    ChildTable.PbMId ,
    ChildTable.Company
FROM
    ChildTable CT
JOIN
    CTE_SingleInstance  CTES
ON
    CT.PbMId  = CTES.PbMIDSinglInstance;

Related Solutions

Sql-server – Column Name in separate table SQL Server

What you're looking for is Dynamic SQL Pivots. A PIVOT will turn row values into columns based on an aggregate, however you need to define the column names as part of the PIVOT, luckily we can do that on the fly with dynamic SQL. The following should generate the result set you want for a particular company (replace rowValues with your table name):

DECLARE @SQL NVARCHAR(MAX)
DECLARE @Columns NVARCHAR(MAX)
DECLARE @Company NVARCHAR(5) = '01'

SET @Columns = STUFF( (SELECT ',['+H.description+']' AS [data()] 
                        FROM dbo.Headers H
                        WHERE H.company = @Company  
                        ORDER BY H.fieldNumber
                        FOR XML PATH('')),1,1,'')

SET @SQL = '
SELECT company,jobNumber,'+@Columns+'
FROM
(
    SELECT h.company,RV.jobNumber,RV.information,h.description
    FROM Headers h
            INNER JOIN dbo.RowValues RV
                ON RV.fieldNumber = h.fieldNumber
                   AND RV.company = h.company
    WHERE h.company = '+@Company+'  
) as Data
PIVOT
(
    MAX(information) FOR [description] IN ('+@Columns+')
) as p
ORDER BY jobNumber ASC'

EXEC sp_executesql @sql;

Unfortunately due to the possibility that different companies will have different columns there is no reliable way to combine all the the possible sets the above query could generate. Depending on how you want to use this query the easy option is to loop through all the companies and call an SP that contains the above, allowing you to output each company as a separate SELECT. Or you can do something with your SSIS package to output each one into a file.

If you want to insert the data into a table matching the calculated schema you can do a SELECT INSERT and a bit more dynamic SQL to get the data where it needs to go.

Sql-server – Database design considerations for unused columns with every table has same schema

1) Yes. It's a maintenance/documentation nightmare but technically there's no reason it wouldn't work.

2) In general each null will be one bit of storage. So 80 null fields might be 10 bytes per row. The full answer is that it varies depending on data type but with varchar for the most part it's a good rule of thumb.

Some alternatives where you expect fields to be null, are column sets (see https://msdn.microsoft.com/en-AU/library/cc280521.aspx); where the engine munges those often unused columns into a single XML column. I don't think it's a very good fit for your purposes because it messes up indexing and introduces a bunch of limitations - but it's something to be aware of.

I don't think performance is going to be affected though. It will mean that you'll have additional lookups if queries are doing select * instead of just the columns they need (and so it would be hard to create proper covering indexes). But that's a generic problem and not specific to this design.

3) I think you should probably try to forget refactoring it and just go ahead building your application as best as you can with what you have.

A database developer would probably start by importing one of the existing databases into SSDT (a Visual Studio database project) which will create the schema, then they'd do any minor fixes required to get a build going, and see that it can create a usable empty database for use with the software.

After that they'd go over the source code and start using SSDT's right click refactor functionality to give the tables and columns proper names; which will rename all the procedures/functions etc that use it.

This still leaves any dynamic SQL broken, external reports broken, and anything that lives outside of the database broken (the application, web services, queries from customers with direct access!) This means it's a pretty big and collaborative effort to fix.

I'm almost certain they will run across fields which have multiple meanings - because that's what happens when people build software this way. Luckily if they were doing all of this then they can use that opportunity to split dual-usage fields into separate fields.

But can you do that? I imagine it's outside of your scope, and businesses generally don't like paying for such things when "why would you need to do that, the software works as-is!"

Best Answer

Related Solutions

Sql-server – Column Name in separate table SQL Server

Sql-server – Database design considerations for unused columns with every table has same schema

Related Question