SED XML – Adding Numerical Suffixes to Tag-Names to Distinguish XML Elements

sedxml

I have an XML file with multiple child elements which have the same tag-name, ex. <Name>Luigi</Name>, <Name>Mario</Name>, <Name>Peach</Name>. Here is a mock-up of what my input file looks like:

<!-- names.xml -->
<Names>
    <Name>Luigi</Name>
    <Name>Mario</Name>
    <Name>Peach</Name>
</Names>

When I throw this file into Excel for analysis it creates a new record for each Name element. This is awesome from a readability perspective, but it makes it difficult to discern if I have lots of duplicate data outside of the name fields.

What I want to do is rename the tags to Name1, Name2, Name3 so that they all appear on the same row when I import them into Excel. That way I'll be able to find records that are useless to me or that contain duplicates – without having to constantly look at the raw data.

In other words, I want a script or command which produces the following output:

<!-- names.xml -->
<Names>
    <Name1>Luigi</Name1>
    <Name2>Mario</Name2>
    <Name3>Peach</Name3>
</Names>

Is it possible to do this with a sed command or other Unix script?

Best Answer

Since you specifically asked for sed, here is a sed/bash script that should do what you want, provided that each <Name> element is opened and closed on the same line:

(IFS='';
n=0;
while read line; do
    if echo "${line}" | grep -Pq "<Name>\w+</Name>"; then
        ((n++));
        echo "${line}" | sed "s/<Name>\(\w\+\)<\/Name>/<Name${n}>\1<\/Name${n}>/";
    else
        echo "${line}";
    fi;
done) < names.xml

I tested it with this input file:

<!-- names.xml -->
<Names>
    <Name>Luigi</Name>
    <Name>Mario</Name>
    <Name>Peach</Name>
</Names>

And it produced the following output:

<Names>
    <Name1>Luigi</Name1>
    <Name2>Mario</Name2>
    <Name3>Peach</Name3>
</Names>

That said, this seems like a good candidate for a language with an XML parsing library. Here is a Python script that does what you want:

#!/usr/bin/env python2
# -*- encoding: ascii -*-

# add_suffix.py

import sys
import xml.etree.ElementTree

# Load the data
tree = xml.etree.ElementTree.parse(sys.argv[1])
root = tree.getroot()

# Update the XML tree
suffix = 0
for name in root.iter("Name"):
    suffix += 1
    name.tag += str(suffix)

# Write out the updated data
tree.write(sys.argv[2])

Run it like this:

python add_suffix.py names.xml new_names.xml
Related Question