Excel – How to Remove Duplicate Rows and Count Results

microsoft excel

For a research project, I am trying to find out how often patients have been admitted to various hospital departments. For each department I have an Excel file with loads of data, but have stripped it down to two columns: patient ID and date admitted. Each patient (around 300 of them) has been admitted loads of times, so each sheet has over 20,000 rows, something like this:

ID1   23/07/15
ID1   25/08/15
ID1   09/01/16
ID2   14/06/14
ID2   12/08/15

Except there are way more dates per patient than just two or three.

Now I obviously don't want to go through 20,000 rows (per file, of which I have six) and count everything by hand. What I need is a way to:

a) remove entire rows containing duplicate dates following each other (for some reason the same patient often has two identical admission dates in two adjacent rows)

and

b) have Excel count the total number of dates (i.e. rows) per patient, based on identical ID in the first column.

After playing around with it for a bit I have found ways to highlight duplicates following each other (with conditional formatting), but I can't get Excel to delete the entire row for me, or do the counting.

Is there a way to achieve this in Excel, or will I have to do it all by hand?

Best Answer

To clean up data of duplicates of combination patient ID and date

  1. Select both columns
  2. Navigate to Data tab
  3. Click on Remove Duplicates

Learn more on source - Remove Duplicates

For the fastest result to get all you want, use Group and Subtotal feature.

  1. Sort patient ID.
  2. Select Data tab, then click Subtotal.
  3. At Each Change In select patient ID column.
  4. At Use function select Count.
  5. At Add subtotal to select patient ID column.
  6. Click OK.

Learn more from source - Group and Subtotal

Related Question