SQL Server Pivot – How to Merge Multiple Rows into Fewest Rows of Distinct Values

pivotsql serversql-server-2012

In SQL Server does anyone know of a nice way to merge/flatten multiple rows of data into the fewest possible rows containing only the distinct non null values.

I.e.

A dataset like this:
before

As this:
after

If it helps, the before dataset is a pivoted line listing but without the aggregate. I can't aggregate it during the pivot as I want to keep each of the distinct values and not take the MAX or MIN.

The only way I can think of doing it involves splitting the data up and joining it all back together which wont be very efficient.

Best Answer

Your data appears to lack any relationship between the various column values. If you can define this relationship, you can PIVOT the data appropriately.

For example, if you simply want to align the data based on the order of the value (based on your default collation), you could use:

with rawdata as (
select * from (values
    ('00000000-0000-0000-0000-000000037850','Col2','Yes_02')
    ,('00000000-0000-0000-0000-000000037850','Col3','Full marketing schedule')
    ,('00000000-0000-0000-0000-000000037850','Col3','Negotiations started, fell through')
    ,('00000000-0000-0000-0000-000000037850','Col3','No budget')
    ,('00000000-0000-0000-0000-000000037850','Col3','Not interest')
    ,('00000000-0000-0000-0000-000000037850','Col3','Passed to Summerhouse')
    ,('00000000-0000-0000-0000-000000037850','Col4','Darren Waters_01')
    ,('00000000-0000-0000-0000-000000037850','Col4','David Edwards_01')
    ,('00000000-0000-0000-0000-000000037850','Col4','David Simons_01')
    ,('00000000-0000-0000-0000-000000037850','Col4','Jason Gould_01')
    ,('00000000-0000-0000-0000-000000037850','Col4','Martin Thorpe_01')
    ,('00000000-0000-0000-0000-000000037850','Col5','BETT New Exhibitor')
    ,('00000000-0000-0000-0000-000000037850','Col5','BETT Standard Exhibitor')
    ,('00000000-0000-0000-0000-000000037850','Col5','Exhibitor Verified')
    ) x ([ID],[Col],[Value])
    ), ordered as (
select
    ID
    ,Col
    ,[Value]
    ,rn = row_number() over (partition by ID, Col order by [Value])
    from rawdata
    )
select
    ID
    ,[Col1],[Col2],[Col3],[Col4],[Col5]
    from ordered o
    pivot(max([Value]) for Col in ([Col1],[Col2],[Col3],[Col4],[Col5])) pvt
    ;

Related Solutions

Sql-server – Efficiently query MAX over multiple ranges

Here's a solution using CROSS APPLY, which does the same TOP query for each customer_id:

SELECT MAX(b.MaxQuantity) AS quantity
  FROM
  (
    SELECT 1 AS customer_id UNION ALL
    SELECT 2
  ) a
  CROSS APPLY
  (
    SELECT TOP 1
      quantity AS MaxQuantity
      FROM orders o
      WHERE o.customer_id = a.customer_id
      ORDER BY quantity DESC
  ) b;

This does the same work as the UNION ALL-based query you wrote in the Fiddle; the difference is that the customer_id input is abstracted from the meat of the query, so it can easily be converted to use a table variable or table-valued parameter, which will result in a static query plan, which is important. This approach will work well for a small number of customer_id values, and simply removing the outer MAX will return the maximum for each customer. I don't believe there's a way to further optimize this query for a small number of customer_ids using these data structures (assuming the customer_ids are random, and not a range).

For a large number of customer_ids, it probably is cheaper to do the index scan and stream aggregate than many seeks. To get this going faster, you'd have to move to some kind of denormalized data structure. MAX isn't supported in an indexed view, so rolling your own mechanism is the only way to go, either in application logic or triggers. Depending on the read/write ratio on this table, that may or may not be faster than the above approach: you'd have to test it in your exact scenario.

SQL Server – How to Query Transfers for Single Source to Single Destination

I don't know all of your source data (or why there isn't any type of unique constraint that would prevent full-on duplicates or a source with multiple destinations), but given only the sample data supplied:

;WITH s AS 
(
  -- first let's eliminate duplicates
  SELECT DISTINCT Source, Destination 
    FROM dbo.MyTable
)
SELECT Source, Destination
FROM s
WHERE NOT EXISTS
(
  SELECT 1 FROM s AS d WHERE 

  -- eliminate chains in either direction:
    d.Destination = s.Source OR d.Source = s.Destination

  -- eliminate any source with multiple destinations:
    OR (d.Source = s.Source AND d.Destination <> s.Destination)

  -- eliminate any destination with more than one source
    OR (d.Destination = s.Destination AND d.Source <> s.Source)
);

SQL fiddle demo

Best Answer

Related Solutions

Sql-server – Efficiently query MAX over multiple ranges

SQL Server – How to Query Transfers for Single Source to Single Destination

Related Question