SQL Server – How to Concatenate an Entire Column with a Group By

sql server

Today, I saw that our residents have put a bajillion hours into an excel spreadsheet, tracking everything they need to know about our patients.

The excel sheet is really good for humans to read from, but absolutely horrendous for machines. I'd like to help them out, but I'm stuck on one part: creating a list of every diagnosis per patient – and presenting it in a single field.

I've got this, which will concatenate the entire table:

    select stuff( (
    select top 12  ', ' + icd10 + ' ' + icd10.description   from 
    AdmVisitDiagnoses dx 
    left join ehs_icd10 icd10 on replace(dx.Code,'.','') = icd10.icd10
    where icd10 is not null
    for xml path ('')),1,2,'')

which will produce a result like this:

I10 Essential (primary) hypertension, E780 Pure hypercholesterolemia, E780 Pure hypercholesterolemia, I10 Essential (primary) hypertension, I8291 Chronic embolism and thrombosis of unspecified vein, I8291 Chronic embolism and thrombosis of unspecified vein, F205 Residual schizophrenia, F209 Schizophrenia unspecified, F259 Schizoaffective disorder unspecified, F259 Schizoaffective disorder unspecified, F259 Schizoaffective disorder unspecified, F259 Schizoaffective disorder unspecified

but I think I'm barking up entirely the wrong tree. Also, the performance is horrendous (not a deal breaker, since it will be automated, but still something to be mindful of).

Question: How would you break this up by a foreign key?

Some sort of pivot? The number of diagnoses is quite variable…

Ideally, it would be something I can easily smush into a larger query, but there's obviously ways to work around that if needed.

Here's my current solution, but I'm pretty sure it's not a good one:

    create function dbo.stuffDx(@VisitID varchar(55))
    returns varchar(max)

    as
    begin
    declare @string varchar(max)

    set @string = (select stuff( (
    select top 500  ', ' + isnull(icd10,'') + ' ' + isnull(icd10.description,'')   from 
    AdmVisitDiagnoses dx 
    left join ehs_icd10 icd10 on replace(dx.Code,'.','') = icd10.icd10
    where icd10 is not null
    and VisitID = @VisitID
    for xml path ('')),1,2,''))

    return @string

    end

Here's the data:

AdmVisitDiagnoses:

    VisitID             Code  
    OL0-20100812131135  250.00  
    OL0-20100812131135  295.90  
    OL0-20100812134136  250.00  
    OL0-20100812134136  401.9  
    OL0-20100812134506  250.00  
    OL0-20100812134506  401.9  
    OL0-20100812135224  295.30  
    OL0-20100812135224  250.00  
    OL0-20100817085834  295.90  
    OL0-20100817085834  272.0  

icd10:

    icd10   description  
    A000    Cholera due to Vibrio cholerae 01 biovar cholerae  
    A001    Cholera due to Vibrio cholerae 01 biovar eltor  
    A009    Cholera unspecified  
    A0100   Typhoid fever unspecified  
    A0101   Typhoid meningitis  
    A0102   Typhoid fever with heart involvement  
    A0103   Typhoid pneumonia  
    A0104   Typhoid arthritis  
    A0105   Typhoid osteomyelitis  
    A0109   Typhoid fever with other complications  

Best Answer

You are "barking up entirely the wrong tree"!

What you want is something like:

CREATE TABLE patient_diagnosis
(
  patient_id INTEGER   NOT NULL, -- FK to patient table
  admission_time TIMESTAMP NOT NULL,
  diagnosis_id INTEGER NOT NULL, -- FK to diagnnosis table. Under **no** circumstances
                                 -- allow this to be free text! You might want to allow
                                 -- the text in here but from a dropdown list!
  initial_diagnosis_time TIMESTAMP NOT NULL,
  condition_cleared_time TIMESTAMP -- nullable because a patient might have an incurable 
                                   -- condition like Cystic Fibrosis or Alzheimer's?
);

PRIMARY KEY something like (patient_id, diagnosis_id, initial_diagnosis_time)
or (patient_id, admission_time, diagnosis_id) depending on your needs.

So, for a patient with multiple diagnoses, you will have multiple records - you'll be able to SELECT current diagnoses and history. This is ideal for spreadsheets and much easier than a long comma-separated list.

SQL is not good at manipulating such lists - it's brilliant at "slicing and dicing" small, simple atomic records. For example, "How many people with a diagnosis of bubonic plague were admitted in the first quarter of last year?".

From here (Codd's rules):

Rule 2: The guaranteed access rule:

Each and every datum (atomic value) in a relational data base is guaranteed to be logically accessible by resorting to a combination of table name, primary key value and column name.

Comma separated lists violate this basic rule derived from relational algebra which is the only mathematically sound theory underlying data storage and manipulation!