Ubuntu – Improve sound (not voice) quality of Pico2Wave text-to-speech

soundtext to speech

I use Ubuntu 12.04.

I want to make extensive use of the text-to-speech capabilities of Linux to create audio files from text.

I've tried Festival but finding good voices and installing them is overlly complex so I use it with its default voices.

I also tried Pico2Wave.

Festival text-to-speech is totally robotic and un-natural and it's not suitable for long term listening. It has a "whirring" sound in the background but you can hear the words crisply nonetheless, but again, robotic and bad quality in terms of speech.

Festival sample here

Pico2Wave is very natural and comparable to Apple's text-to-speech, in terms of diction and human-like speech, but the quality of the sound itself is awful. It sounds as if it was recorded in a very empty room with a lot of echo. It sounds "stuffy", muddy, tubby, whith too much bass. So much it makes the speakers rattle and it's very difficult to understand sometimes, unless you are using earphones. The sound is not crips at all. I also suspect the sound "clips" but I'm no audio expert.

Pico2Wave sample here

My question is:

How can I improve the sound quality of the generated audio file? I'm no audio expert so I don't know what I have to fiddle with (gain?, bass?, reduce noise? to what extend? etc.) Note that I'm an not asking for recommended tools, but to be explained what is exactly wrong with that audio and what qualities should I fiddle with in my audio editing/improving app of choice.

NOTE: The sample text is the first paragraph of "The Last of the Mohicans":

It was a feature peculiar to the colonial wars of North America, that
the toils and dangers of the wilderness were to be encountered before
the adverse hosts could meet. A wide and apparently an impervious
boundary of forests severed the possessions of the hostile provinces
of France and England. The hardy colonist, and the trained European
who fought at his side, frequently expended months in struggling
against the rapids of the streams, or in effecting the rugged passes
of the mountains, in quest of an opportunity to exhibit their courage
in a more martial conflict. But, emulating the patience and
self-denial of the practiced native warriors, they learned to overcome
every difficulty; and it would seem that, in time, there was no recess
of the woods so dark, nor any secret place so lovely, that it might
claim exemption from the inroads of those who had pledged their blood
to satiate their vengeance, or to uphold the cold and selfish policy
of the distant monarchs of Europe.

Best Answer

I just run into the same issue and at the moment I'm end with something like

pico2wave -l $LANGUAGE -w $WAV "$*" && play -qV0 $WAV treble 24 gain -l 6

which sounds much more "crisp".

Related Question