Excel – Remove duplicate entries, keeping latest only

microsoft-excel-2010

I have several spreadsheets with information I need to consolidate, each with different (and sometimes overlapping) portions of the final data set I'm hoping to create. Typically, combining spreadsheets with similar data is easy to do by dumping all the data into one sheet and using Remove Duplicates. However, I want to make sure that the line items which are left represent the latest version of the duplicates.

I will be importing the following data from each spreadsheet:

  • A unique identifier field, (UID) which identifies the subject of the line item and will be used to spot duplicates between the input sheets.
  • A status field, (STATUS) which will have information about the item called out in the unique identifier field.
  • A date field, (DATE) which will contain the date the imported data was originally written.

What I need is for my output to contain only one line item for each UID, and have data from the STATUS field which matches the most recent DATE for that UID from the input sheets.

What is the easiest way to do this in Excel?

Best Answer

I don’t know whether this is guaranteed to work, but it seems to work for me (in very small-scale tests in Excel 2007): take the combined data sheet, and sort it in reverse order by DATE, so the newest rows are above the older ones.  Then Remove Duplicates.

This site confirms this behavior: "When Excel scans the table, it removes any subsequent record that has the same Product ID as an earlier record, even if the rest of the data is different."

Related Question