Ny decent speech recognition software for Linux

The short version of the question: I am looking for a speech recognition software that runs on Linux and has decent accuracy and usability. Any license and price is fine. It should not be restricted to voice commands, as I want to be able to dictate text.

More details:

I have unsatisfyingly tried the following:

CMU Sphinx
CVoiceControl
Ears
Julius
Kaldi (e.g., Kaldi GStreamer server)
IBM ViaVoice (used to run on Linux but was discontinued years ago)
NICO ANN Toolkit
OpenMindSpeech
RWTH ASR
shout
silvius (built on the Kaldi speech recognition toolkit)
Simon Listens
ViaVoice / Xvoice
Wine + Dragon NaturallySpeaking + NatLink + dragonfly + damselfly
https://github.com/DragonComputer/Dragonfire: only accepts voice commands

All the above-mentioned native Linux solutions have both poor accuracy and usability (or some don't allow free-text dictation but only voice commands). By poor accuracy, I mean an accuracy significantly below the one the speech recognition software I mentioned below for other platforms have. As for Wine + Dragon NaturallySpeaking, in my experience it keeps crashing, and I don't seem to be the only one to have such issues unfortunately.

On Microsoft Windows I use Dragon NaturallySpeaking, on Apple Mac OS X I use Apple Dictation and DragonDictate, on Android I use Google speech recognition, and on iOS I use the built-in Apple speech recognition.

Baidu Research released yesterday the code for its speech recognition library using Connectionist Temporal Classification implemented with Torch. Benchmarks from Gigaom are encouraging as shown in the table below, but I am not aware of any good wrapper around to make it usable without quite some coding (and a large training data set):

System Clean (94) Noisy (82) Combined (176)

Apple Dictation 14.24 43.76 26.73

Bing Speech 11.73 36.12 22.05

Google API 6.64 30.47 16.72

wit.ai 7.94 35.06 19.41

Deep Speech 6.56 19.06 11.85

Table 4: Results (%WER) for 3 systems evaluated on the original audio. All systems are scored only on the utterances with predictions given by all systems. The number in the parentheses next to each dataset, e.g. Clean (94), is the number of utterances scored.

System	Clean (94)	Noisy (82)	Combined (176)
Apple Dictation	14.24	43.76	26.73
Bing Speech	11.73	36.12	22.05
Google API	6.64	30.47	16.72
wit.ai	7.94	35.06	19.41
Deep Speech	6.56	19.06	11.85

There exist some very alpha open-source projects:

https://github.com/mozilla/DeepSpeech (part of Mozilla's Vaani project: http://vaani.io (mirror))
https://github.com/pannous/tensorflow-speech-recognition
Vox, a system to control a Linux system using Dragon NaturallySpeaking: https://github.com/Franck-Dernoncourt/vox_linux + https://github.com/Franck-Dernoncourt/vox_windows
https://github.com/facebookresearch/wav2letter
https://github.com/espnet/espnet
http://github.com/tensorflow/lingvo (to be released by Google, mentioned at Interspeech 2018)

I am also aware of this attempt at tracking states of the arts and recent results (bibliography) on speech recognition. as well as this benchmark of existing speech recognition APIs.

I am aware of Aenea, which allows speech recognition via Dragonfly on one computer to send events to another, but it has some latency cost:

I am also aware of these two talks exploring Linux option for speech recognition:

2016 – The Eleventh HOPE: Coding by Voice with Open Source Speech Recognition (David Williams-King)
2014 – Pycon: Using Python to Code by Voice (Tavis Rudd)

Best Answer

Right now I'm experimenting with using KDE connect in combination with Google speech recognition on my android smartphone.

KDE connect allows you to use your android device as an input device for your Linux computer (there are also some other features). You need to install the KDE connect app from the Google play store on your smartphone/tablet and install both kdeconnect and indicator-kdeconnect on your Linux computer. For Ubuntu systems the install goes as follows:

sudo add-apt-repository ppa:vikoadi/ppa
sudo apt update
sudo apt install kdeconnect indicator-kdeconnect

The downside of this installation is that it installs a bunch of KDE packages that you don't need if you don't use the KDE desktop environment.

Once you pair your android device with your computer (they have to be on the same network) you can use the android keyboard and then click/press on the mic to use Google speech recognition. As you talk, text will start to appear where ever your cursor is active on your Linux computer.

As for the results, they are a bit mixed for me as I'm currently writing some technical astrophysics document and Google speech recognition is struggling with the jargon that you don't typically read. Also forget about it figuring out punctuation or proper capitalization.

Best Answer

Related Solutions

Are there any advanced clipboard managers for Linux

Workflow management software for Linux