Add pauses to text-to-speech voiceovers

Adding pauses to text to speech voiceovers is very easy. This lesson shows you a few tricks how to force a speech synthesis generator to take a short break in narration.

Add punctuation marks for smaller breaks
Add ellipsis for inline breaks
Split text into paragraphs for longer breaks
Pause for a specified time
Start a sentence at a specific time

Unlike human speakers, text to speech voices don’t need to catch a breath. But audible pauses can be quite useful in many situations. For example:

pause after an important bit of information, to let the audience consume it better.
stop after a question and give the audience a few seconds to think about it
take a dramatic pause before an important announcement to let the audience prepare.

Here are three good ways for creating narration breaks.

Add punctuation marks for smaller breaks

Text to speech voices can understand all the usual punctuation marks, and they will usually try to understand the context around a single sentence by consuming the whole paragraph where it belongs, then decide how to read individual parts of the paragraph. When writing a script for text to speech voices, make sure to use full sentences with punctuation marks. You can use full stops, commas, semi-colons and the equivalent punctuation marks in native scripts to make the text to speech readers take short breaks.

Add ellipsis for inline breaks

To create a small inline pause, use the horizontal ellipsis symbol (…). You can also use three full stops without any breaks or spaced between them (...).

My name is Bond… James Bond.

Split text into paragraphs for longer breaks

With Narakeet, a block of text separated by a blank line from the rest of the script is considered a paragraph. To make the voice take a slightly longer break than the one after a sentence, split your script into separate paragraphs by adding blank lines. (There should be at least one blank line between paragraphs, but adding more does not help to create longer breaks).

This is the first paragraph. It contains three sentences. The text to speech reading voice will try to understand the whole paragraph, in order to decide how to read the content inside it, and take short breaks after commas and between different sentences.

This is the second paragraph. There was a blank line above it. The voice reader will take a slightly longer break between the paragraphs.

Pause for a specified time

Force the text reader to stop between two paragraphs of text using the pause stage direction. Just start a new line with the word “pause” in brackets, add a colon and then specify the number of seconds the voice should pause.

Stage directions only work if they are in a separate paragraph, so make sure to leave at least one blank line between the stage direction and the text.

For example, the following script takes a five-second break between the question and the answer.

Is the correct answer 600 or 100? Think about it for five seconds.

(pause: 5)

It should be 600.

Start a sentence at a specific time

To synchronize the audio with an external video, you can also make the text to speech synthesizers delay a part of the narration until a specific time. This can be useful, for example, when putting voice over a screencast so that the audio follows the action on the screen. Use the pause-until stage direction, and provide a timecode or a number representing the time in seconds since the start of the scene.

Open the app, and wait for it to load.

(pause-until: 10)

Now click the button in the middle of the page.

(pause-until: 01:20.5)

The application should be installed now.

In the previous example, the second sentence will start exactly at 10 seconds after the beginning of the audio, regardless of how long the first sentence takes (as long as it’s not more than 10 seconds). The third sentence will start at one minute 20.5 seconds.

Note that the pause-until stage direction cannot be used to create pauses longer than 10 minutes, and that it will have no effect if the previous audio segment ends after the requested timestamp (you can only use it to delay the audio, but not to move segments earlier in time so they play in parallel).

By the way, there are lots of other stage directions you can use to control different aspects of the text to speech synthesis.