After further consideration and removing my first answer, I now note that you don't want to see the readable text in a binary (e.g. with the strings
utility), but see text in binary form.
So, I think the KDE utility, okteta is just what you want, and it is available in the repositories and can be installed with
sudo apt-get install okteta
It allows you to view text files in binary form (see the second screenshot below), and you can click the tab at the bottom of the page to switch between binary form, hexadecimal, decimal, and octal. You can even create a new file and start entering text and, if you have the binary mode selected, the characters typed will be shown in binary, just like in the online converter you linked to. The screenshot directly below shows an example of this:
This application does exactly what you have specified in the question, and it is a gui as well, so that seems to tick all the boxes.
Okteta
seems to be the most fully featured editor available, and there are also modules and plugins that give additional functionality.
I think Sound Juicer(Click To Install) is a good tool for conversions.
I use Banshee for my conversions from cd to Flac or Mp3.
For Banshee go first to Edit-->preferences and change the folder and the output type and then from the main GUI you can push the button for conversion.
Of course you can use Gnome sound converter.
Best Answer
The software you can use is Vosk-api, a modern speech recognition toolkit based on neural networks. It supports 7+ languages and works on variety of platforms including RPi and mobile.
First you convert the file to the required format and then you recognize it:
Then install vosk-api with pip:
Then use these steps:
The result will be stored in json format.
The same directory also contains an srt subtitle output example, which is easier to evaluate and can be directly useful to some users:
The example given in the repository says in perfect American English accent and perfect sound quality three sentences which I transcribe as:
The "nine oh two one oh" is said very fast, but still clear. The "z" of the before last "zero" sounds a bit like an "s".
The SRT generated above reads:
so we can see that several mistakes were made, presumably in part because we have the understanding that all words are numbers to help us.
Next I also tried with the
vosk-model-en-us-aspire-0.2
which was a 1.4GB download compared to 36MB ofvosk-model-small-en-us-0.3
and is listed at https://alphacephei.com/vosk/models:and the result was:
which got one more word correct.
Tested on vosk-api 7af3e9a334fbb9557f2a41b97ba77b9745e120b3.