MacOS – How to use cloud-based text-to-speech services as native voices in Mac OS

accessibilitymacostext to speech

I want to be able to use cloud-based TTS services (like AWS Polly or Google Cloud Text-to-Speech) as if they were locally-installed voices. I've found the native voices to be inadequate for some accessibility needs and the cloud-based TTS services appear to offer the most pleasing and affordable way of addressing these. However, I can't find any tools that can help with this (e.g. creating a system voice that connects to a service on a cloud platform and facilitates the functionality). How might I be able to do this?

Best Answer

https://cloud.google.com/text-to-speech/docs/quickstart-protocol This article lists all the steps required.

  • Enabling the service's trial in Google Account.
  • Installing SDK

^ This is one time setup. The following is to be repeated.

  • Make the curl command that a shell can execute. This has three parts, two of which are fixed: voice type and audio config. The third one, input needs to be changed. *

  • Get the JSON response and save it to text file.

  • Decode the text file, to mp3 using

    base64 synthesize-output-base64.txt --decode > synthesized-audio.mp3
    
  • Play it.
  • Bind all of this in a shortcut for the preferred app as a service.

Most of your shell task can be done via "run shell script" in automator.

* For changing the text part, you can find multiple questions, or even ask one to know how to get selected text in an automator variable. Another option would be to copy paste text in an Automator's app's popup.

Next, put that text variable in the shell script.

Then make a dedicated folder for all the text and audio files that will be made in an action. Save the received JSON response there. The command to decode it will be fixed, since the file location and name is the same.


All of this can be put in an Automator app which displays a popup that has a text field and a Submit/ Play button.