Microsoft Excel – Formula for Separating Multi-Part Last Names

microsoft excelworksheet-function

I have a list of 20,000 + names. The first and last name of each person are all in one column "Full Name". I need to create two columns for the First Name & Last Name.

Doing so is easy for majority. But for some people, its quite tricky.

Here are few examples of the spectrum of names I'm dealing with. Going through one by one would take me weeks.

Any suggestions of how to handle this would be helpful.

EX:

David Batte
Guy-Mael Jacobe de Naurois
Jean-Marc Lioutier
Thibaud Le Seguillon
Mrs. Manuela Junghaehnel
Hussain Mohammed Dipu Kabir

Best Answer

There is not a single formula that will handle this because there is too much variety. But there is an approach that will greatly reduce the task.

You can write a formula to handle any specific name pattern. Where you have many names fitting a common pattern, that's a good investment of time. When you have one or two names fitting a pattern, it will take you longer to find those examples and create the formula than to simply parse those by hand.

The practical solution is to do this with multiple passes. Use helper columns, each with a single pattern formula or a few that can be easily combined. Each successive set of helper columns can ignore rows that have already been parsed. When you're done, you can concatenate helper columns into single results columns using & because they will contain either a parsed result or a blank.

Automate the easy ones:

  • single first and last names (based on the single blank space in the middle)
  • hyphenated names (a hyphenated name is still one string so you can split on the space, also)
  • common name prefixes and suffixes (like le, de, etc.; look for the patterns in a form like _de_ , where the underscore represents a space, and link that with the associated name)
  • titles (Mr., Ms., Mrs., Dr., Sr., Jr., etc.; look for them and link them with the associated name or just use them to locate the name and ignore the title)
  • multiple names if you will be splitting off only the first (look for the first blank)
  • etc.

The first-round formula gets propagated to every row. Sort the results so all blank (unresolved) rows are at the top, or use a filter to show only those rows. Propagate the next formula to the unresolved rows in the next helper columns. Repeat for each successive formula. If you start with the most common patterns, the list will rapidly shrink, and the unresolved examples will be aggregated to simplify scanning for the next pattern.

The formula for any given pattern will be straightforward. It's beyond the scope of an answer to create a comprehensive collection of formulas, but you can always ask about a specific pattern you can't figure out, and a search of the site or the web in general will turn up existing answers that will cover common ones.

This process will leave a successively smaller percentage unresolved. Scan those for any commonalities that can be easily handled in other automated rounds. Eventually, you'll be left with single-case patterns that are faster to do manually than to code, and those will be a much more manageable number.

You can even help automate those using a formula or VBA that lets you specify where to split (like entering 2 for the second blank), or having a collection of formulas to split at different delimiters and you pick the formula). The formula then parses the name.

Related Question