Normalize text to speech voices

Automatically adjust text to voice generator volume for professionally sounding AI text to speech

Narakeet can help you automatically normalize the perceived loudness of natural text to speech voices, to make the audio sound great, using the same normalization standard that professional broadcasters use. This page explains how to normalize voice volume with text to speech voices.

What is loudness normalization?

Loudness normalization is an adjustment to the volume of individual audio samples, to make the whole recording sound perceptually the same. The most popular standard for loudness normalization is EBU R 128. It is used by professional broadcasters all over the world, and it’s part of the European Union audiovisual legislation.

Narakeet implements EBU R 128 with the normalized voice volume option, giving you the power to achieve professionally sounding results without expensive equipment or wasting time on editing and post-production.

Why is loudness normalization important?

Similar to human voice actors, different text to voice converters will speak differently, including the loudness of the voice. This is not noticeable for audio clips recorded with just a single voice, but if you combine several voice generators in the same audio, the results can sound inconsistent.

By normalizing the loudness of an audio clip or a video file, you can make the results more enjoyable for your audience.

To hear the effects of normalization, listen to the following two samples. They were both created from the same script, combining three voices: Amy, Beatrice and Charles.

(voice: Amy)

Welcome to "Many Voices". I'm Amy, your host.

Today we're joined by Bea, the famous mathematician. Welcome to the programme.

(voice: Beatrice)

Thank you! It's lovely to be back. 

(voice: Amy)

Let's start immediately with our first caller. Charles, you're in the programme. What would you like to know?

(voice: Charles)

Yes, well, my question for Beatrice is... Why is it so difficult to divide with zero? 

The first recording uses the voices at their standard volume:

The second uses the normalized volume to make all the voices perceptually the same in terms of loudness. The conversation between different text to speech voices sounds more natural, and switching from one text to speech generator to another sounds less abrupt.

How to normalize text to speech audio?

To normalize the volume of AI voice text to speech generators in our Text to Audio tool, first click the plus button to open all the voice options.

You will see several new settings. In the “Volume” section, choose the “Normalized” option. This will activate EBU R 128 audio normalization for text to voice generators.

How to use voice normalization in video projects?

To normalize the volume of our natural text to speech voices in the Powerpoint to Video tool, click the “Edit Settings” button after the presentation is uploaded, so you can see all the voice settings.

On the next screen, check out the “Volume” section. Choose the “Normalized” option to use EBU R 128 audio normalization for text to speech voices.

How to normalize voices in scripted video?

To use volume normalization in Markdown to Video projects, just set the voice-volume header to normalized.

voice-volume: normalized

This is the first narration part...

More information