Python – Convert xml subtitle file to srt format

perlpythonscriptingtext processingxml

I have a xml subtitle that looks like this:

<?xml version="1.0" encoding="utf-8" ?><transcript>
  <Item from="1.16" duration="4.68"><![CDATA[(Dong-hyuk is coming
to see you now.)
]]></Item>
  <Item from="5.92" duration="1"><![CDATA[It's cold.
]]></Item>
  <Item from="9.04" duration="2.88"><![CDATA[- Hello.
- Hello.
]]></Item>
  <Item from="12.2" duration="1.76"><![CDATA[You're busy as always.
]]></Item>
  <Item from="14.04" duration="3.48"><![CDATA[Look what I have here. Ta-da!
]]></Item>
  <Item from="18.88" duration="1.5599999999999998"><![CDATA[Let me give it to you.
]]></Item>
  <Item from="20.919999999999998" duration="2.8"><![CDATA[I'll give you
the most valuable present...
]]></Item>
</transcript>

It seems that "]]></Item>" always spreads to a new line.
Would there be a way to convert this to srt format?

Best Answer

I'd use an XML handling tool like xsh:

open subtitles.xml ;
for /transcript/Item {
    echo position() ;
    echo @from '-->' (@from + @duration) ;
    echo text() ;
}

Output:

1
1.16 --> 5.84
(Dong-hyuk is coming
to see you now.)

2
5.92 --> 6.92
It's cold.

3
9.04 --> 11.92
- Hello.
- Hello.

4
12.2 --> 13.96
You're busy as always.

5
14.04 --> 17.52
Look what I have here. Ta-da!

6
18.88 --> 20.44
Let me give it to you.

7
20.919999999999998 --> 23.72
I'll give you
the most valuable present...

Converting the time data is left as an exercise for the reader.

Related Question