Ny decent speech recognition software for Linux

software-recspeech recognition

The short version of the question: I am looking for a speech recognition software that runs on Linux and has decent accuracy and usability. Any license and price is fine. It should not be restricted to voice commands, as I want to be able to dictate text.


More details:

I have unsatisfyingly tried the following:

All the above-mentioned native Linux solutions have both poor accuracy and usability (or some don't allow free-text dictation but only voice commands). By poor accuracy, I mean an accuracy significantly below the one the speech recognition software I mentioned below for other platforms have. As for Wine + Dragon NaturallySpeaking, in my experience it keeps crashing, and I don't seem to be the only one to have such issues unfortunately.

On Microsoft Windows I use Dragon NaturallySpeaking, on Apple Mac OS X I use Apple Dictation and DragonDictate, on Android I use Google speech recognition, and on iOS I use the built-in Apple speech recognition.

Baidu Research released yesterday the code for its speech recognition library using Connectionist Temporal Classification implemented with Torch. Benchmarks from Gigaom are encouraging as shown in the table below, but I am not aware of any good wrapper around to make it usable without quite some coding (and a large training data set):

System Clean (94) Noisy (82) Combined (176)
Apple Dictation 14.24 43.76 26.73
Bing Speech 11.73 36.12 22.05
Google API 6.64 30.47 16.72
wit.ai 7.94 35.06 19.41
Deep Speech 6.56 19.06 11.85

Table 4: Results (%WER) for 3 systems evaluated on the original audio. All systems are scored only on the utterances with predictions given by all systems. The number in the parentheses next to each dataset, e.g. Clean (94), is the number of utterances scored.

There exist some very alpha open-source projects:

I am also aware of this attempt at tracking states of the arts and recent results (bibliography) on speech recognition. as well as this benchmark of existing speech recognition APIs.


I am aware of Aenea, which allows speech recognition via Dragonfly on one computer to send events to another, but it has some latency cost:

enter image description here

I am also aware of these two talks exploring Linux option for speech recognition:

Best Answer

Right now I'm experimenting with using KDE connect in combination with Google speech recognition on my android smartphone.

KDE connect allows you to use your android device as an input device for your Linux computer (there are also some other features). You need to install the KDE connect app from the Google play store on your smartphone/tablet and install both kdeconnect and indicator-kdeconnect on your Linux computer. For Ubuntu systems the install goes as follows:

sudo add-apt-repository ppa:vikoadi/ppa
sudo apt update
sudo apt install kdeconnect indicator-kdeconnect

The downside of this installation is that it installs a bunch of KDE packages that you don't need if you don't use the KDE desktop environment.

Once you pair your android device with your computer (they have to be on the same network) you can use the android keyboard and then click/press on the mic to use Google speech recognition. As you talk, text will start to appear where ever your cursor is active on your Linux computer.

As for the results, they are a bit mixed for me as I'm currently writing some technical astrophysics document and Google speech recognition is struggling with the jargon that you don't typically read. Also forget about it figuring out punctuation or proper capitalization.

enter image description here

enter image description here