Convert image based subtitle to text based subtitle inside MKV file

ffmpegsoftware-recvideovideo-encodingvideo-subtitles

How do I convert a hdmv_pgs_subtitle (which is image based) to a text based subtitle in a MKV file?

I have tried ffmpeg -i in.mkv -c:v copy -c:a copy -c:s mov_text out.mkv but that results with the following error:

Stream mapping:
  Stream #0:0 -> #0:0 (copy)  
  Stream #0:1 -> #0:1 (copy)  
  Stream #0:2 -> #0:2 (hdmv_pgs_subtitle (pgssub) -> mov_text (native))  

Error while opening encoder for output stream #0:2 - maybe incorrect > parameters such as bit_rate, rate, width or height

Best Answer

Converting image based subtitles to text is a nontrivial process, as you will need some kind of OCR system to interpret the bitmaps and figure out what the corresponding text is. ffmpeg alone will not do that for you.

I am not aware of any app that will do the whole process in one go, for Linux/UNIX. However, this process should work:

  • Extract the subtitles with mkvextract or ffmpeg
  • Convert the PGS subtitles to DVD SUB format with BDSup2Sub
  • OCR the subtitles into SRT format with VobSub2SRT
  • Mux the subtitles back into an mkv file with mkvmerge or ffmpeg