Sql-server – Transposing repeated rows of datetime into columns

pivotsql serversql-server-2017

I am working with a dataset of inpatient data in SQL Server 2017. The fields contain anonymized ID, admission date, discharge date and other relevant fields. Since some of the patients are readmitted, the ID field is not unique. The table looks like this:

ID       AdmDate1      DischDate    
10001   2012-10-16    2012-10-26 
10001   2014-06-15    2014-06-18 
10001   2014-12-21    2014-12-29 
10002   2013-02-14    2013-02-20 
10003   2013-01-23    2013-01-31 
10004   2012-11-15    2012-11-19 
10004   2014-09-26    2014-10-06 
10005   2014-12-12    2014-12-23 
10006   2013-10-23    2013-10-28

Since I want to calculate readmission rate and intervals between readmissions, I want to create a table like this:

ID       AdmDate1     AdmDate2    AdmDate3   DischDate1 DischDate2 DischDate3   
10001   2012-10-16   2014-06-15  2014-12-21
10002   2013-02-14   None        None

Best Answer

I have counted the rows in the columns of AdmDate1 and found the maximum admission frequency is 35. Since readmissions more than 4 are not many, I can remove the rows with rn>4 for AdmDate1 and DischDate. by doing this, the number of columns to be created will be reduced to manageable numbers

Based on this reply i started looking for a solution which uses 4 of admdates and 4 of Dischdates.

So this works, but I don't dare look at the query plan

Create test data

create table AdmissionDates(id int, AdmDate1 date,  DischDate date)
insert into AdmissionDates(id,AdmDate1,DischDate)
VALUES(  10001 ,  '2012-10-16' ,  '2012-10-26'),
(10001 ,  '2014-06-15'   , '2014-06-18')  ,
(10001 ,  '2014-12-21'   , '2014-12-29')  ,
(10002 ,  '2013-02-14'   , '2013-02-20')  ,
(10003 ,  '2013-01-23'   , '2013-01-31')  ,
(10004 ,  '2012-11-15'   , '2012-11-19')  ,
(10004 ,  '2014-09-26'   , '2014-10-06')  ,
(10005 ,  '2014-12-12'   , '2014-12-23')  ,
(10006 ,  '2013-10-23'   , '2013-10-28')  

select id,row_number() over( partition by id order by admDate1 asc) as rownum, AdmDate1,row_number() over( partition by id order by DischDate asc) as rownum2, DischDate
into #temp
from AdmissionDates;

The Query

select DISTINCT id, 
                t2.admDate1 as admDate1,
                t3.admDate1 as admDate2,
                t4.admDate1 as admDate3,
                t5.admDate1 as admDate4, 
                T6.DischDate as DischDate1,
                T7.DischDate as DischDate2,
                T8.DischDate as DischDate3,
                T9.DischDate as DischDate4
FROM 
#temp t1
OUTER APPLY 
(select admDate1 from #temp T2 where rownum = 1 and T2.id = T1.id) as T2
OUTER APPLY 
(select admDate1 from #temp T2 where rownum = 2 and T2.id = T1.id) as T3
OUTER APPLY 
(select admDate1 from #temp T2 where rownum = 3 and T2.id = T1.id) as T4
OUTER APPLY 
(select admDate1 from #temp T2 where rownum = 4 and T2.id = T1.id) as T5
OUTER APPLY 
(select DischDate from #temp T2 where rownum2 = 1 and T2.id = T1.id) as T6
OUTER APPLY 
(select DischDate from #temp T2 where rownum2 = 2 and T2.id = T1.id) as T7
OUTER APPLY 
(select DischDate from #temp T2 where rownum2 = 3 and T2.id = T1.id) as T8
OUTER APPLY 
(select DischDate from #temp T2 where rownum2 = 4 and T2.id = T1.id) as T9;


DROP TABLE #temp;

Related Solutions

Sql-server – Transposing data

In sql-server-2005 or later, the the UNPIVOT and PIVOT operators can be used:

Sample data:

DECLARE @Source AS TABLE
(
    ID      char(1) PRIMARY KEY,
    PD1     integer NOT NULL,
    PD2     integer NOT NULL,
    PD3     integer NOT NULL,
    PD4     integer NOT NULL,
    PD5     integer NOT NULL
);

INSERT @Source
    (ID, PD1, PD2, PD3, PD4, PD5)
VALUES
    ('A', 1, 2, 3, 4, 5),
    ('B', 6, 7, 8, 9, 0);

Query:

SELECT 
    Pvt.Price, 
    Pvt.A, 
    Pvt.B
FROM @Source AS s
UNPIVOT
(
    Val 
    FOR Price IN (PD1, PD2, PD3, PD4, PD5)
) AS Unpvt
PIVOT 
(
    MAX(Val) 
    FOR ID IN (A, B)
) AS Pvt;

Output:

╔═══════╦═══╦═══╗
║ Price ║ A ║ B ║
╠═══════╬═══╬═══╣
║ PD1   ║ 1 ║ 6 ║
║ PD2   ║ 2 ║ 7 ║
║ PD3   ║ 3 ║ 8 ║
║ PD4   ║ 4 ║ 9 ║
║ PD5   ║ 5 ║ 0 ║
╚═══════╩═══╩═══╝

Execution plan: Unpivot/pivot plan

The query can also be written without PIVOT and UNPIVOT:

SELECT
    f1.Price,
    A = MAX(CASE WHEN s.ID = 'A' THEN f1.Val END),
    B = MAX(CASE WHEN s.ID = 'B' THEN f1.Val END)
FROM @Source AS s
CROSS APPLY
(
    SELECT 'PD1', s.PD1 UNION ALL
    SELECT 'PD2', s.PD2 UNION ALL
    SELECT 'PD3', s.PD3 UNION ALL
    SELECT 'PD4', s.PD4 UNION ALL
    SELECT 'PD5', s.PD5
) AS f1 (Price, Val)
GROUP BY
    f1.Price;

Without PIVOT/UNPIVOT

Try the SQLFiddle here

Sql-server – Optimizing queries for 25+ million rows

The question is mainly about how to optimize the select statement:

SELECT [TABLE], [FIELD], [AFTER], [DATE]
FROM mytbl WITH (NOLOCK)
WHERE [TABLE] = 'OTB' AND
[FIELD] = 'STATUS'

Removing the redundant projections and adding the presumed dbo schema:

SELECT [AFTER], [DATE] 
FROM dbo.mytbl WITH (NOLOCK)
WHERE [TABLE] = 'OTB'
AND FIELD = 'STATUS';

Without an index like ([TABLE],[FIELD]) INCLUDE ([AFTER],[DATE]) SQL Server has two main options:

Scan the heap entirely (3GB+); or
Locate rows matching [TABLE] = 'OTB' and [FIELD] = 'STATUS' (using IDX6), then perform a heap (RID) lookup per row to retrieve the [AFTER] and [DATE] columns.

Whether the optimizer chooses a heap scan or index seek with RID lookup depends on the estimated selectivity of the [TABLE] = 'OTB' and [FIELD] = 'STATUS'predicates. Check to see if the estimated number of rows from the seek matches reality. If not, update your statistics. Test the query with an table hint forcing the use of the index, if that condition is reasonably selective. If the optimizer is currently choosing the index seek, test performance with an INDEX(0) or FORCESCAN hint to scan the heap.

Beyond that, you could look to improve the scan of the heap a little by removing some of the unused space (370MB). In SQL Server 2008 this can be done by rebuilding the heap. Unused space in heaps often results from deletes performed without a table lock being taken (without a table lock, empty pages are not deallocated from a heap). Tables that experience frequent deletions are often better stored as a clustered table for this reason.

The performance of the heap scan depends on how much of the table is stored in memory, how much must be read from disk, how full the pages are, the speed of the persistent storage, whether the scan is I/O or CPU bound (parallelism can help).

If performance is still unacceptable after you have investigated all of the above, try to make the case for a new index. If available on your version of SQL Server, a possible filtered index for the given query would be:

CREATE INDEX index_name
ON dbo.mytbl ([DATE],[AFTER])
WHERE [TABLE] = 'OTB'
AND [FIELD] = 'STATUS';

Also consider index compression, if that is available and beneficial. Without a new index of some kind, there's relatively little you can do to improve the performance of the given query.

Best Answer

Related Solutions

Sql-server – Transposing data

Sql-server – Optimizing queries for 25+ million rows

Related Question