Normalize text to speech voices

Automatically adjust text to voice generator volume for professionally sounding AI text to speech

Narakeet can help you automatically normalize the perceived loudness of natural text to speech voices, to make the audio sound great, using the same normalization standard that professional broadcasters use. This page explains how to normalize voice volume with text to speech voices.

What is loudness normalization?
Why is loudness normalization important?
How to normalize text to speech audio?
How to use voice normalization in video projects?
How to normalize voices in scripted video?
More information

What is loudness normalization?

Loudness normalization is an adjustment to the volume of individual audio samples, to make the whole recording sound perceptually the same. The most popular standard for loudness normalization is EBU R 128. It is used by professional broadcasters all over the world, and it’s part of the European Union audiovisual legislation.

Narakeet implements EBU R 128 with the normalized voice volume option, giving you the power to achieve professionally sounding results without expensive equipment or wasting time on editing and post-production.

Why is loudness normalization important?

Similar to human voice actors, different text to voice converters will speak differently, including the loudness of the voice. This is not noticeable for audio clips recorded with just a single voice, but if you combine several voice generators in the same audio, the results can sound inconsistent.

By normalizing the loudness of an audio clip or a video file, you can make the results more enjoyable for your audience.

To hear the effects of normalization, listen to the following two samples. They were both created from the same script, combining three voices: Amy, Beatrice and Benedict.

(voice: Amy)

Welcome to "Many Voices". I'm Amy, your host.

Today we're joined by Bea, the famous mathematician. Welcome to the programme.

(voice: Beatrice)

Thank you! It's lovely to be back. 

(voice: Amy)

Let's start immediately with our first caller. Benedict, you're in the programme. What would you like to know?

(voice: Benedict)

Yes, well, my question for Beatrice is... Why is it so difficult to divide with zero?

The first recording uses the voices at their standard volume:

The second uses the normalized volume to make all the voices perceptually the same in terms of loudness. The conversation between different text to speech voices sounds more natural, and switching from one text to speech generator to another sounds less abrupt.

How to normalize text to speech audio?

To normalize the volume of AI voice text to speech generators in our Text to Audio tool, first click the plus button to open all the voice options.

You will see several new settings. In the “Volume” section, choose the “Normalized” option. This will activate EBU R 128 audio normalization for text to voice generators.

How to use voice normalization in video projects?

To normalize the volume of our natural text to speech voices in the Powerpoint to Video tool, click the “Edit Settings” button after the presentation is uploaded, so you can see all the voice settings.

On the next screen, check out the “Volume” section. Choose the “Normalized” option to use EBU R 128 audio normalization for text to speech voices.

How to normalize voices in scripted video?

To use volume normalization in Markdown to Video projects, just set the voice-volume header to normalized.

---
voice-volume: normalized
---

This is the first narration part...

More information

For information on how to create interviews and dialog using natural text to speech voices, check out How to use multiple voices in text to speech narration
For more information on the voice-volume stage direction and header property, check out our Format reference
For more information on the EBU R 128 standard, check out the EBU tech specifications