Text to Speech API
Our Text to Speech API allows you to automatically generate audio in 100 languages, with 700 voices. You can batch-produce audio files from external content, integrate our realistic text to speech voices into your software, and a lot more.
This page explains how to use our text to speech API to create audio files.
NOTE: The easiest way to run simple batch conversion jobs is to use our command-line utility. This page contains information for people who want to build their own integration.
- Choose between Streaming or Polling API
- API Endpoints
- Authenticating requests
- Short content API (Streaming)
- Long Content API (JSON Polling)
- Configuring audio tasks
- Converting subtitle files (SRT and VTT)
- Tips and tricks
- More information
Choose between Streaming or Polling API
Narakeet has two ways of integrating with the Text to speech API:
- Short content (streaming) API is simpler, faster, but restricted to relatively short content.
- Long content (JSON polling) API is more complex but allows significantly larger and longer conversions.
If you want to build audio on the fly for short sentences, such as synthesising individual paragraphs or labels for user interface elements, use the short content (streaming) API. To convert large documents, build audiobooks, or produce uncompressed output for professional videos, use the long content (polling) API.
Here is a quick summary of the limitations and differences between the APIs.
Feature | Short content (streaming) API | Long content (polling) API |
---|---|---|
Maximum content length | 1 KB | 1024 KB |
Supported formats | M4A, MP3 | M4A, MP3, WAV |
Process duration | 30 seconds | 45 minutes |
When executing the requests, you select the API with the accept
header. If you provide application/octet-stream
as the accept
header, the short content (streaming) API will be used, and you will get the result back as a binary stream. If you do not provide the accept
header, the long content (polling) API will be used, and you will get back a status URL that you can poll for results.
API Endpoints
There are three endpoints for audio project build requests, which produce different output formats:
https://api.narakeet.com/text-to-speech/wav
creates uncompressed 16-bit PCM wav files (highest quality, largest file)https://api.narakeet.com/text-to-speech/mp3
creates compressed MP3 files (smaller file, good quality)https://api.narakeet.com/text-to-speech/m4a
creates compressed MPEG-4 files (best combination of file size and quality)
Note that the WAV endpoint only works for the long content (polling) API. M4A and MP3 endpoints support both short content API (streaming) and long content API (JSON polling).
Authenticating requests
To use the API, you will need a Narakeet API key. For information on how to get a key, check out our guide on Managing API Keys.
You should provide the API key as a header to all requests to the public REST endpoints, using the x-api-key
header.
Short content API (Streaming)
The short content API requires just one request, and returns the audio as a binary stream.
To request an audio file build, use one of the endpoints, and:
- Use the
POST
HTTP method - Set the Content Type to
text/plain
(see Converting Subtitle files for additional values) - Provide your API key in the
x-api-key
header - Specify an
accept
header with the valueapplication/octet-stream
- In the request body, provide a UTF-8 encoded script text
The snippet below will generate a M4A file using the text “Hi there, this is your API speaking”, and save it to result.m4a.
curl -d "Hi there, this is your API speaking" -H "Content-Type: text/plain" -H "x-api-key: $APIKEY" -H "accept: application/octet-stream" --output result.m4a https://api.narakeet.com/text-to-speech/m4a
Note that on Windows, if you use CURL from the terminal, you may need to URL encode the content before sending it. See the tips section for an example.
You can read the audio duration of the generated file, rounded up to the nearest second, from the x-duration-seconds
header.
See Configuring Audio Tasks for information on selecting the voice and adjusting the reading speed.
Narakeet API NodeJS/JavaScript example
For a simple example of how to access the short content (streaming) API from JavaScript/NodeJS, check out https://github.com/narakeet/text-to-speech-api-nodejs-example.
Narakeet API Python example
For a simple example of how to access the short content (streaming) API from Python, check out https://github.com/narakeet/text-to-speech-api-python-example.
Narakeet API CSharp/.NET Core example
For a simple example of how to access the short content (streaming) API from CSharp/.NET Core, check out https://github.com/narakeet/text-to-speech-api-csharp-example.
Narakeet API PHP example
For a simple example of how to access the short content (streaming) API from PHP, check out https://github.com/narakeet/text-to-speech-api-php-example
Narakeet API Java example
For a simple example of how to access the short content (streaming) API from Java, check out https://github.com/narakeet/text-to-speech-api-java-example
Narakeet API Dart example
For a simple example of how to access the short content (streaming) API from Dart, check out https://github.com/narakeet/text-to-speech-api-dart-example
Error handling
If there is an error during audio conversion, Short content API will contain the error in the immediate response. The response will have status code 400 (for user errors) or 500 (for server errors). The response type will be application/json
, and the body of the response will be a JSON object containing more information about the error.
Long Content API (JSON Polling)
The large content API allows running longer and larger jobs. To be fault-tolerant, it does not require you to keeping a single HTTPS connection open for a longer period of time. Instead, you make several short requests. This integration is much more complicated than the short content API, but it allows for better resilience and longer processing.
To create an audio file using the long content API, execute the following steps:
- Request an audio build, which will provide you with a status URL
- Poll the status URL periodically until the build finishes. This will provide you with a URL of the audio file, valid for 24 hours
- Download the audio from the URL, or somehow else consume the result (for example, send the URL to another service).
NOTE: Requests to storage endpoints (step 2 and 3) do not require the authentication. The storage URLs provided to you by the REST API will already be pre-signed with authentication tokens. Do not include your API key as a separate header when performing those requests.
Long Content API Python example
For a simple example of how to access the long content (polling) API from Python, check out https://github.com/narakeet/text-to-speech-polling-api-python-example.
Long Content API PHP example
For a simple example of how to access the long content (polling) API from PHP, check out https://github.com/narakeet/text-to-speech-polling-api-php-example.
Long Content API Java example
For a simple example of how to access the long content (polling) API from Java, check out https://github.com/narakeet/text-to-speech-polling-api-java-example.
Step 1: Request an audio build
To request an audio file build, use one of the endpoints, and:
- Use the
POST
HTTP method - Set the Content Type to
text/plain
(see Converting Subtitle files for additional values) - Provide your API key in the
x-api-key
header - Do not set the
accept
header - In the request body, provide UTF-8 encoded script text
The response will be a JSON structure containing the field statusUrl
. This is the URL where you can periodically poll for results.
The snippet below will trigger the build using CURL and extract the status URL:
BODY="Hi there, this is your API speaking"
API_RESPONSE=$(curl -d $BODY -H "Content-Type: text/plain" -H "x-api-key: $APIKEY" https://api.narakeet.com/text-to-speech/wav)
STATUS_URL=$(echo $API_RESPONSE | jq -r .statusUrl)
Note that on Windows, if you use CURL from the terminal, you may need to URL encode the content before sending it. See the tips section for an example.
Step 2: Poll for results
To get the status of your build job, poll the status URL returned by the previous step periodically. We recommend polling every 5-10 seconds.
- Use the
GET
HTTP method - Do not provide the API key in the headers. The URL already has all appropriate authorisations
The status URL will contain the build job status as a JSON object, with following properties:
finished
: boolean value (true
/false
) signalling if the video build completed. The valuetrue
means you should stop polling.percent
: numerical value between 0 and 100, signalling the progress of the audio build.succeeded
: once the task is finished, a boolean value (true
/false
) signalling if the video was built, or if there was an error. The valuetrue
means that you can download the result video.result
: if the task succeeded, a string value with a secure URL, valid for 10 minutes, where you can download the audio file.message
: if the task failed, a string value detailing the errordurationInSeconds
: If the task succeeded, an integer value with the generated audio duration in seconds, rounded up to the nearest second.
Step 3: Download the result
Once the status URL contains finished
value true
, and succeeded
value true
, you will find the URL to the resulting audio file in the result
field. This is a secure, temporary URL that expires in 24 hours, so you should download the audio file or immediately process it somehow else.
Error handling
If there is an error with starting the task, the request endpoint will return status code 400 for user errors, and 500 for server errors. The response type will be application/json
, and the body of the response will be a JSON object containing more information about the error.
Once the task starts, the status URL will contain more information on processing. In case of an error, the status URL will respond with a JSON object. You can detect an error by the following properties:
finished
: boolean valuetrue
(the job is over)succeeded
: boolean valuefalse
(the job failed)message
: error message
Configuring audio tasks
You can use the full power of Narakeet audio scripting through the API. Here is how to configure your audio conversion job.
Selecting the default voice
You can select the default voice either appending a voice
query string parameter, or by supplying the voice
header in your script. All our Text to Speech voices are supported through the REST interface.
curl --data-binary "@my-script.txt" -H "Content-Type: text/plain" -H "x-api-key: $APIKEY" -H "accept: application/octet-stream" --output result.mp3 https://api.narakeet.com/text-to-speech/mp3?voice=mickey
Controlling the reading speed
You can select the default voice speed either appending a voice-speed
query string parameter, or by supplying the voice-speed
header in your script.
curl --data-binary "@my-script.txt" -H "Content-Type: text/plain" -H "x-api-key: $APIKEY" -H "accept: application/octet-stream" --output result.mp3 https://api.narakeet.com/text-to-speech/mp3?voice=mickey&voice-speed=1.1
Controlling voice volume
You can select the default voice volume either appending a voice-volume
query string parameter, or by supplying the voice-volume
header in your script.
curl --data-binary "@my-script.txt" -H "Content-Type: text/plain" -H "x-api-key: $APIKEY" -H "accept: application/octet-stream" --output result.mp3 https://api.narakeet.com/text-to-speech/mp3?voice=mickey&voice-volume=soft
Configuring other options
Narakeet scripts support setting default options in a header section (enclosed in ---
above and below, at the start of the script file). You can use the header section to set the default voice speed, volume, choose a voice and a lot more. For more information, check out the Script header formatting reference. For example, the following script sets the default voice and pitch.
---
voice: Victoria
voice-pitch: high
---
This script will be read by Victoria, in high pitch
Converting subtitle files (SRT and VTT)
You can use the Content-Type
header to control how Narakeet interprets your input. By default, the text/plain
content type will read out the entire body of the request as a Narakeet script. You can also automatically convert popular subtitle and closed caption file formats (SubRip SRT and WebVTT) by supplying a different content type.
- for SubRip (.srt) files, use
application/x-subrip
ortext/srt
- for WebVTT (.vtt) files, use
text/vtt
Provide the subtitle file contents in the request body, and make sure that the content is UTF-8 encoded.
Note that Narakeet aligns entire sentences when processing subtitle and closed caption files, and does not automatically compress the audio if the chosen voice speaks slower than the subtitle timings dictate. If the voice you choose reads content slower than your subtitles, you may need to increase the voice speed.
curl --data-binary "@subtitles.vtt" -H 'Content-Type: text/vtt' -H "x-api-key: $APIKEY" "$URL/text-to-speech/mp3?voice=marion&voice-speed=1.1"
For more information on converting subtitle files to audio works, and the limitations of Narakeet when turning subtitles to speech, see our guide on how to make closed captions and subtitles for text to speech audio.
Tips and tricks
Using international characters on Windows
This trick is not necessary for UTF8 Linux or MacOS terminals.
The Windows terminal and CURL do not work nicely with Unicode characters. To pass Unicode characters outside the basic ASCII range, you can use the following options
- Save the content as UTF-8 encoded into a file, then use the
--data-binary
option (see the next tip for an example) - URL-encode the content, and then post it with content type
application/x-www-form-urlencoded
.
Do not use the --data-urlencode
option of CURL, it has the same problem as posting --data
; you will need to URL encode the content yourself. For example, using the encodeURIComponent
javascript method in NodeJS or your browser.
Here is an example:
curl --data "Rad%C5%A1ej%20by%20som%20i%C5%A1iel%20do%20da%C5%BE%C4%8Fa." -H "Content-Type: application/x-www-form-urlencoded" -H "x-api-key: %APIKEY%" https://api.narakeet.com/text-to-speech/wav?voice=juraj
Sending files using cURL
If you use cURL, instead of pasting larger scripts into a command line, save them into a text file and then use the --data-binary
option to load a file.
curl --data-binary "@my-script.txt" -H "Content-Type: text/plain" -H "x-api-key: $APIKEY" https://api.narakeet.com/text-to-speech/wav
Do not use the --data
cURL option for sending files, as this removes newlines and whitespace in some cases, so this will lead to problems for multi-line scripts. The --data-binary
option preserves newlines and whitespace.
Getting the generated audio duration
You can get the audio file duration rounded up to the nearest second.
If you use the streaming API, retrieve it using the x-duration-seconds
response header. See https://github.com/narakeet/text-to-speech-api-php-example/blob/master/tts-extract-duration.php for an example.
In the long content polling API, the final status JSON will contain a field called durationInSeconds
, containing the audio duration.
More information
- See this flow in action, implemented using Node.js, in the narakeet/api-client GitHub project.
- For general API limitations and pricing, see the main Developer API page.