Delv
Amazon Polly
Getting Started Guide

How to Use Amazon Polly

A practical guide to get you up and running with Amazon Polly. Written by Delv Editorial, Delv Team.

Getting started with Amazon Polly

In this guide, you'll learn how to create lifelike audio from text using Amazon Polly. You’ll be able to convert written content into speech with various voice options in just a few minutes.

Step 1: Sign up and set up

  1. Go to the Amazon Polly website.
  2. Click on the "Get started with Amazon Polly" button.
  3. If you do not have an AWS account, click "Create a Free Account" and follow the prompts to set up your account.
  4. Once logged in, navigate to the AWS Management Console.
  5. In the search bar, type "Polly" and select "Amazon Polly" from the dropdown menu.

Step 2: Your first audio output

  1. In the Amazon Polly dashboard, click on "Text-to-Speech" in the left sidebar.
  2. Enter your text in the provided text box.
  3. Choose your desired voice from the "Voice" dropdown menu. You can select from various languages and accents.
  4. Select the speech output format (MP3 or OGG) from the "Output format" dropdown.
  5. Click the "Listen" button to preview the audio. If you're satisfied, click "Download" to save the audio file to your device.

Step 3: Get better results

  1. Use SSML (Speech Synthesis Markup Language) to enhance your text input. This allows you to control aspects like pitch, volume, and pauses.
  2. Experiment with different voices and languages to find the best fit for your application.
  3. If creating a large volume of audio, consider using the API for batch processing. Refer to the AWS documentation for detailed API integration instructions.

Pro tip

Familiarise yourself with the SSML tags. They can significantly improve the quality of your audio by adding pauses or changing pronunciation, making your speech sound more natural.

Common mistake to avoid

Avoid entering too much text at once. Amazon Polly has a character limit for text input (up to 3000 characters). Break longer texts into smaller segments to ensure they are processed correctly.