Using sed to extract text between 2 tags

sed

I have a .xml file and I'm trying to do a "groupinstall" on a RHEL6 machine as there are several hundred libraries in that .xml file… (close to 16 000 lines).

I'm therefore trying to extract the group names contained in the .xml file that has this structure:

<b>
<group>
<id> group name </id>
   <packages>
   ...
   </packages>
<id> group name 2 </id>
   <packages>
   ...
   </packages>
<id> etc... </id>
</group>
</b>

Basically, this is what I've tried:

sed -n '/<id>/,/<\/id>/p' test1.txt > test2.txt

I copied the .xml file to test1.txt.
I'm trying to extract the group names from the test1.txt to a second file called test2.txt.
However, with the line above, it is extracting everything from the FIRST <id> tag to the last </id> tag in my file.
How can I change my code to extract it several times?

My second question would be: does the -downloadonly plugin work as well with groups for yum?

Best Answer

Sounds like what you need is more something along the lines of

sed -n 's:.*<id>\(.*\)</id>.*:\1:p'

(assuming like in your sample that the <id> and </id> are on the same line and that there's only one <id>...</id> per line).

Or use a XML-aware tool:

xmlstarlet sel -t -v '//id' -n
Related Question