Delv
Microsoft Azure Speech
Getting Started Guide

How to Use Microsoft Azure Speech

A practical guide to get you up and running with Microsoft Azure Speech. Written by Delv Editorial, Delv Team.

Getting started with Microsoft Azure Speech

In this guide, you will learn how to quickly set up Microsoft Azure Speech and start using its text-to-speech and speech-to-text features. By the end, you’ll be able to convert text into natural-sounding speech and transcribe audio with high accuracy.

Step 1: Sign up and set up

  1. Go to the Microsoft Azure Speech website.
  2. Click on the "Get started" button.
  3. If you don’t have a Microsoft account, click "Create one!" and follow the prompts to sign up.
  4. Once signed in, navigate to the Azure portal by clicking on "Portal" in the top right corner.
  5. In the Azure portal, select "Create a resource" from the left-hand menu.
  6. Search for "Speech" and select "Speech" from the list.
  7. Click "Create" and fill in the required fields (Subscription, Resource group, Region, and Name).
  8. Click "Review + create" and then "Create" to provision your Speech resource.

Step 2: Your first text-to-speech task

  1. In the Azure portal, navigate to your Speech resource.
  2. Click on "Keys and Endpoint" in the left menu to find your API key and endpoint URL.
  3. Open a new browser tab and go to the Azure Speech Studio.
  4. Sign in with your Microsoft account.
  5. Click on "Try Speech" in the top menu and select "Text to Speech."
  6. In the text box, enter the text you want to convert to speech.
  7. Choose a voice from the dropdown menu and adjust the settings like pitch and speed if desired.
  8. Click the "Play" button to listen to the generated speech.

Step 3: Get better results

  • Explore different voices and languages by selecting them from the dropdown menu to see which fits your content best.
  • Use the "SSML" option for more control over pronunciation, pitch, and pauses by entering Speech Synthesis Markup Language (SSML) tags in your text.
  • For speech-to-text, upload an audio file in the "Speech to Text" section and ensure it’s in a supported format (like WAV or MP3).

Pro tip

Use the "Save as Audio" feature found in the Text to Speech section to download the generated speech as an audio file for later use, saving you the time of having to regenerate it.

Common mistake to avoid

Avoid using unsupported audio formats when transcribing audio files; stick to WAV or MP3 to ensure successful uploads and accurate transcriptions.