I have an XML file with multiple child elements which have the same tag-name, ex. <Name>Luigi</Name>
, <Name>Mario</Name>
, <Name>Peach</Name>
. Here is a mock-up of what my input file looks like:
<!-- names.xml -->
<Names>
<Name>Luigi</Name>
<Name>Mario</Name>
<Name>Peach</Name>
</Names>
When I throw this file into Excel for analysis it creates a new record for each Name
element. This is awesome from a readability perspective, but it makes it difficult to discern if I have lots of duplicate data outside of the name fields.
What I want to do is rename the tags to Name1
, Name2
, Name3
so that they all appear on the same row when I import them into Excel. That way I'll be able to find records that are useless to me or that contain duplicates – without having to constantly look at the raw data.
In other words, I want a script or command which produces the following output:
<!-- names.xml -->
<Names>
<Name1>Luigi</Name1>
<Name2>Mario</Name2>
<Name3>Peach</Name3>
</Names>
Is it possible to do this with a sed command or other Unix script?
Best Answer
Since you specifically asked for
sed
, here is ased
/bash
script that should do what you want, provided that each<Name>
element is opened and closed on the same line:I tested it with this input file:
And it produced the following output:
That said, this seems like a good candidate for a language with an XML parsing library. Here is a Python script that does what you want:
Run it like this: