Text to Speech API

Our Text to Speech API allows you to automatically generate audio in 60+ languages, with 380+ voices. You can batch-produce audio files from external content, integrate our realistic text to speech voices into your software, and a lot more.

This page explains how to use our text to speech API to create audio files.

NOTE: The easiest way to batch-convert text is to use our command-line utility. This page contains information for people who want to build their own integration.

Choose between Streaming or Polling API

Narakeet has two ways of integrating with the Text to speech API:

  1. Short content (streaming) API is simpler, faster, but restricted to relatively short content.
  2. Long content (JSON polling) API is more complex but allows significantly larger and longer conversions.

If you want to build audio on the fly for short sentences, such as synthesising individual paragraphs or labels for user interface elements, use the short content (streaming) API. To convert large documents, build audiobooks, or produce uncompressed output for professional videos, use the long content (polling) API.

Here is a quick summary of the limitations and differences between the APIs.

FeatureShort content (streaming) APILong content (polling) API
Maximum content length1 KB100 KB
Supported formatsM4A, MP3M4A, MP3, WAV
Process duration30 seconds45 minutes

When executing the requests, you select the API with the accept header. If you provide application/octet-stream as the accept header, the short content (streaming) API will be used, and you will get the result back as a binary stream. If you do not provide the accept header, the long content (polling) API will be used, and you will get back a status URL that you can poll for results.

API Endpoints

There are three endpoints for audio project build requests, which produce different output formats:

  • https://api.narakeet.com/text-to-speech/wav creates uncompressed 16-bit PCM wav files (highest quality, largest file)
  • https://api.narakeet.com/text-to-speech/mp3 creates compressed MP3 files (smaller file, good quality)
  • https://api.narakeet.com/text-to-speech/m4a creates compressed MPEG-4 files (best combination of file size and quality)

Note that the WAV endpoint only works for the long content (polling) API. M4A and MP3 endpoints support both short content API (streaming) and long content API (JSON polling).

Authenticating requests

To use the API, you will need a Narakeet API key. Send an email to contact@narakeet.com to request your key.

You should provide the API key as a header to all requests to the public REST endpoints, using the x-api-key header.

Short content API (Streaming)

The short content API requires just one request, and returns the audio as a binary stream.

To request an audio file build, use one of the endpoints, and:

  • Use the POST HTTP method
  • Set the Content Type to text/plain
  • Provide your API key in the x-api-key header
  • Specify an accept header with the value application/octet-stream
  • In the request body, provide a UTF-8 encoded script text

The snippet below will generate a M4A file using the text “Hi there, this is your API speaking”, and save it to result.m4a.

curl -d "Hi there, this is your API speaking" -H 'Content-Type: text/plain' -H "x-api-key: $APIKEY" -H "accept: application/octet-stream" --output result.m4a https://api.narakeet.com/text-to-speech/m4a 

See Configuring Audio Tasks for information on selecting the voice and adjusting the reading speed.

Error handling

If there is an error during audio conversion, Short content API will contain the error in the immediate response. The response will have status code 400 (for user errors) or 500 (for server errors). The response type will be application/json, and the body of the response will be a JSON object containing more information about the error.

Long content API (JSON Polling)

The large content API allows running longer and larger jobs. To be fault-tolerant, it does not require you to keeping a single HTTPS connection open for a longer period of time. Instead, you make several short requests. This integration is much more complicated than the short content API, but it allows for better resilience and longer processing.

To create an audio file using the long content API, execute the following steps:

  1. Request an audio build, which will provide you with a status URL
  2. Poll the status URL periodically until the build finishes. This will provide you with a URL of the audio file, valid for 24 hours
  3. Download the audio from the URL, or somehow else consume the result (for example, send the URL to another service).

NOTE: Requests to storage endpoints (step 2 and 3) do not require the authentication. The storage URLs provided to you by the REST API will already be pre-signed with authentication tokens. Do not include your API key as a separate header when performing those requests.

Step 1: Request an audio build

To request an audio file build, use one of the endpoints, and:

  • Use the POST HTTP method
  • Set the Content Type to text/plain
  • Provide your API key in the x-api-key header
  • Do not set the accept header
  • In the request body, provide UTF-8 encoded script text

The response will be a JSON structure containing the field statusUrl. This is the URL where you can periodically poll for results.

The snippet below will trigger the build using CURL and extract the status URL:

BODY="Hi there, this is your API speaking"
API_RESPONSE=$(curl -d $BODY -H 'Content-Type: text/plain' -H "x-api-key: $APIKEY" https://api.narakeet.com/text-to-speech/wav)
STATUS_URL=$(echo $API_RESPONSE | jq -r .statusUrl)

Step 2: Poll for results

To get the status of your build job, poll the status URL returned by the previous step periodically. We recommend polling every 5-10 seconds.

  • Use the GET HTTP method
  • Do not provide the API key in the headers. The URL already has all appropriate authorisations

The status URL will contain the build job status as a JSON object, with following properties:

  • finished: boolean value (true/false) signalling if the video build completed. The value true means you should stop polling.
  • percent: numerical value between 0 and 100, signalling the progress of the audio build.
  • succeeded: once the task is finished, a boolean value (true/false) signalling if the video was built, or if there was an error. The value true means that you can download the result video.
  • result : if the task succeeded, a string value with a secure URL, valid for 10 minutes, where you can download the audio file.
  • message: if the task failed, a string value detailing the error

Step 3: Download the result

Once the status URL contains finished value true, and succeeded value true, you will find the URL to the resulting audio file in the result field. This is a secure, temporary URL that expires in 24 hours, so you should download the audio file or immediately process it somehow else.

Error handling

If there is an error with starting the task, the request endpoint will return status code 400 for user errors, and 500 for server errors. The response type will be application/json, and the body of the response will be a JSON object containing more information about the error.

Once the task starts, the status URL will contain more information on processing. In case of an error, the status URL will respond with a JSON object. You can detect an error by the following properties:

  • finished: boolean value true (the job is over)
  • succeeded: boolean value false (the job failed)
  • message: error message

Configuring audio tasks

You can use the full power of Narakeet audio scripting through the API. Here is how to configure your audio conversion job.

Selecting the default voice

You can select the default voice either appending a voice query string parameter, or by supplying the voice header in your script. All our Text to Speech voices are supported through the REST interface.

curl --data-binary '@my-script.txt'  -H 'Content-Type: text/plain' -H "x-api-key: $APIKEY"  -H "accept: application/octet-stream" --output result.mp3 https://api.narakeet.com/text-to-speech/mp3?voice=mickey

Configuring other options (voice speed/volume…)

Narakeet scripts support setting default options in a header section (enclosed in --- above and below, at the start of the script file). You can use the header section to set the default voice speed, volume, choose a voice and a lot more. For more information, check out the Script header formatting reference. For example, the following script sets the default voice and volume.

---
voice: Brian
voice-volume: loud
---

This script will be read by Brian, loudly

Tips and tricks

Sending files using cURL

If you use cURL, instead of pasting larger scripts into a command line, save them into a text file and then use the --data-binary option to load a file.

curl --data-binary '@my-script.txt' -H 'Content-Type: text/plain' -H "x-api-key: $APIKEY" https://api.narakeet.com/text-to-speech/wav

Do not use the --data cURL option for sending files, as this removes newlines and whitespace in some cases, so this will lead to problems for multi-line scripts. The --data-binary option preserves newlines and whitespace.

More information