Ubuntu – Speech-recognition app to convert MP3 to text

software-recommendationspeech recognition

Does any one know of an application that can convert audio to text? I'm running ubuntu 12.04 LTS.

Best Answer

The software you can use is Vosk-api, a modern speech recognition toolkit based on neural networks. It supports 7+ languages and works on variety of platforms including RPi and mobile.

First you convert the file to the required format and then you recognize it:

ffmpeg -i file.mp3 -ar 16000 -ac 1 file.wav

Then install vosk-api with pip:

pip3 install vosk

Then use these steps:

git clone https://github.com/alphacep/vosk-api
cd vosk-api/python/example
wget https://alphacephei.com/kaldi/models/vosk-model-small-en-us-0.3.zip
unzip vosk-model-small-en-us-0.3.zip
mv vosk-model-small-en-us-0.3 model
python3 ./test_simple.py test.wav > result.json

The result will be stored in json format.

The same directory also contains an srt subtitle output example, which is easier to evaluate and can be directly useful to some users:

python3 -m pip install srt
python3 ./test_srt.py test.wav

The example given in the repository says in perfect American English accent and perfect sound quality three sentences which I transcribe as:

one zero zero zero one
nine oh two one oh
zero one eight zero three

The "nine oh two one oh" is said very fast, but still clear. The "z" of the before last "zero" sounds a bit like an "s".

The SRT generated above reads:

1
00:00:00,870 --> 00:00:02,610
what zero zero zero one

2
00:00:03,930 --> 00:00:04,950
no no to uno

3
00:00:06,240 --> 00:00:08,010
cyril one eight zero three

so we can see that several mistakes were made, presumably in part because we have the understanding that all words are numbers to help us.

Next I also tried with the vosk-model-en-us-aspire-0.2 which was a 1.4GB download compared to 36MB of vosk-model-small-en-us-0.3 and is listed at https://alphacephei.com/vosk/models:

mv model model.vosk-model-small-en-us-0.3
wget https://alphacephei.com/vosk/models/vosk-model-en-us-aspire-0.2.zip
unzip vosk-model-en-us-aspire-0.2.zip
mv vosk-model-en-us-aspire-0.2 model

and the result was:

1
00:00:00,840 --> 00:00:02,610
one zero zero zero one

2
00:00:04,026 --> 00:00:04,980
i know what you window

3
00:00:06,270 --> 00:00:07,980
serial one eight zero three

which got one more word correct.

Tested on vosk-api 7af3e9a334fbb9557f2a41b97ba77b9745e120b3.

Related Solutions

Ubuntu – What program can I use to convert text into binary numbers

After further consideration and removing my first answer, I now note that you don't want to see the readable text in a binary (e.g. with the strings utility), but see text in binary form.

So, I think the KDE utility, okteta is just what you want, and it is available in the repositories and can be installed with

sudo apt-get install okteta

It allows you to view text files in binary form (see the second screenshot below), and you can click the tab at the bottom of the page to switch between binary form, hexadecimal, decimal, and octal. You can even create a new file and start entering text and, if you have the binary mode selected, the characters typed will be shown in binary, just like in the online converter you linked to. The screenshot directly below shows an example of this:

enter image description here

This application does exactly what you have specified in the question, and it is a gui as well, so that seems to tick all the boxes.

Okteta seems to be the most fully featured editor available, and there are also modules and plugins that give additional functionality.

enter image description here

Ubuntu – What program should I use to convert a CDA file to MP3

I think Sound Juicer(Click To Install) is a good tool for conversions.

I use Banshee for my conversions from cd to Flac or Mp3. For Banshee go first to Edit-->preferences and change the folder and the output type and then from the main GUI you can push the button for conversion.

Of course you can use Gnome sound converter.

Best Answer

Related Solutions

Ubuntu – What program can I use to convert text into binary numbers

Ubuntu – What program should I use to convert a CDA file to MP3

Related Question