Bash – alter a line and remove tag using perl from xml file

bashlinuxperlshellxml

I have an xml file (client_23.xml) in which I need to alter one line and remove one whole tag from it so I came up with perl script:

In my xml file, I have a block like this. There will be only one instance of <hello>collect_model = 1</hello> in my xml file:

<world>
    <hello>collect_model = 1</hello>
    <hello>enable_data = 0</hello>
    <hello>session_ms = 2*60*1000</hello>
    <hello>max_collect = string_integer($extract("max_collect"))</hello>
    <hello>max_collect = parenting(max_collect, max_collect, 1.0e99)</hello>
    <hello>output('{')</hello>
</world>

I need to change that line to like this: <hello>collect_model = 0</hello> so my whole block should be like this after the change:

<world>
    <hello>collect_model = 0</hello>
    <hello>enable_data = 0</hello>
    <hello>session_ms = 2*60*1000</hello>
    <hello>max_collect = string_integer($extract("max_collect"))</hello>
    <hello>max_collect = parenting(max_collect, max_collect, 1.0e99)</hello>
    <hello>output('{')</hello>
</world>

Second thing I need to remove this whole tag from the same xml file:

<derta-config>
    <data-users>2000</data-users>
    <test-users>2000</test-users>
    <attributes>hello world</attributes>
    <client-types>Client1</model-types>
    <target>price.world</target>
</derta-config>

So I have below shell script in which I am using perl which tries to do above two things along with replacing some content in a file (which I am doing for some other purpose) but the portion which I added specifically for above two doesn't work and it starts printing bunch of errors:

perl -0pe "s#<eval>collect_model = 0</eval>#<eval>collect_model = 1</eval> s#<derta-config>.* </derta-config>##sm;   s#<function>\s*<name>DUMMY_FUNCTION.+?</function>#$file#sm" client_"$client_id".xml > "$word"_new_file.xml

So I am thinking, can we do this in shell script instead, meaning remvoe the the above two things using shell script and then the output of that, we can pass to my perl script which is working on third step. So we can pass the output of shell script which will remove the above two things for me to this below perl script? Is this possible to do?

perl -0pe "s#<function>\s*<name>DUMMY_FUNCTION.+?</function>#$file#sm" client_"$client_id".xml > "$word"_dyn_model.xml

Here $client_id is 23 and $word is abc.

I am just trying to make this work and whatever is the easiest way will do for me. I will only have one instance of all the above two things I mentioned.

Best Answer

With this as the sample input file:

$ cat client_23.xml 
<world>
    <hello>collect_model = 1</hello>
    <hello>enable_data = 0</hello>
    <hello>session_ms = 2*60*1000</hello>
    <hello>max_collect = string_integer($extract("max_collect"))</hello>
    <hello>max_collect = parenting(max_collect, max_collect, 1.0e99)</hello>
    <hello>output('{')</hello>
</world>
<derta-config>
    <data-users>2000</data-users>
    <test-users>2000</test-users>
    <attributes>hello world</attributes>
    <client-types>Client1</model-types>
    <target>price.world</target>
</derta-config>

We can make both changes using:

$ sed 's|<hello>collect_model = 1</hello>|<hello>collect_model = 0</hello>|; \|<derta-config>|,\|</derta-config>|d' client_23.xml 
<world>
    <hello>collect_model = 0</hello>
    <hello>enable_data = 0</hello>
    <hello>session_ms = 2*60*1000</hello>
    <hello>max_collect = string_integer($extract("max_collect"))</hello>
    <hello>max_collect = parenting(max_collect, max_collect, 1.0e99)</hello>
    <hello>output('{')</hello>
</world>

How it works

We have two sed commands. The first is a substitute, the second is a delete:

  • s|<hello>collect_model = 1</hello>|<hello>collect_model = 0</hello>|

    Substitute commands have the form s|old|new|. So, here old is the original <hello>collect_model = 1</hello> and new is the replacement <hello>collect_model = 0</hello>.

  • \|<derta-config>|,\|</derta-config>|d

    This defines a range of lines. The starting line contains derta-config> and the ending line contains </derta-config>. All lines within this range are deleted by the delete command d.

Related Question