Delv
Google Cloud Text-to-Speech
Getting Started Guide

How to Use Google Cloud Text-to-Speech

A practical guide to get you up and running with Google Cloud Text-to-Speech. Written by Delv Editorial, Delv Team.

Getting started with Google Cloud Text-to-Speech

In this guide, you'll learn how to set up Google Cloud Text-to-Speech and convert text into lifelike speech within minutes. By the end, you'll be able to create audio outputs using various voices and settings.

Step 1: Sign up and set up

  1. Go to the Google Cloud Text-to-Speech website.
  2. Click on the "Get started for free" button. This will direct you to the Google Cloud Console.
  3. If you don’t have a Google account, create one. If you do, sign in.
  4. Once logged in, you'll be prompted to create a new project. Click on "Select a project" and then "New Project". Name your project and click "Create".
  5. After creating your project, navigate to the left menu and select "APIs & Services" > "Library".
  6. Search for "Text-to-Speech API" and click on it. Then, click the "Enable" button to activate the API for your project.

Step 2: Your first text-to-speech conversion

  1. In the left menu, go to "APIs & Services" > "Credentials".
  2. Click on the "Create credentials" button and select "API key". Copy this key for later use.
  3. Now, go to the Text-to-Speech API documentation.
  4. Scroll down to the "Try this API" section. Paste your API key in the relevant field.
  5. In the request body, replace the sample text with your desired text. Adjust the parameters like "voice" and "audioConfig" as needed.
  6. Click the "Execute" button. The response will include an audio file URL. Click on this URL to listen to your generated speech.

Step 3: Get better results

  1. Experiment with different voices by changing the "voice" parameter. You can choose from various languages and accents.
  2. Adjust the "speakingRate" and "pitch" parameters to customise how the speech sounds. For example, set "speakingRate" to 1.2 for faster speech.
  3. Use the "audioEncoding" parameter to choose the format of the audio output (e.g., MP3 or WAV) based on your needs.

Pro tip

Most beginners overlook the "SSML" (Speech Synthesis Markup Language) option. Using SSML allows you to add pauses, emphasis, and other vocal effects, enhancing the naturalness of the speech output.

Common mistake to avoid

A common mistake is not enabling billing for your Google Cloud project. While the Text-to-Speech API offers a free tier, you'll need to set up billing to access it. Just ensure you monitor your usage to avoid unexpected charges.