Estimate the duration of text to speech

Different AI voices read text aloud at different speed, similar to humans. When making text to speech videos and converting articles to MP3 or books to audio books, it’s often important to know roughly the duration of the result. This guide explains how to estimate the duration of your text to speech audio.

General guidelines for text to voice generators

Most text to speech generators mimic human speech, which is roughly 150 to 250 words per minute at normal speed. A simple way to estimate the duration of your text is to divide the word count by 200, and you’ll get a rough prediction for the duration in minutes.

Use the text to audio tool for a quick estimate

The simplest way to predict the audio duration of a text document is to upload it to our Text to Audio tool. After the upload, wait for a second or two, and Narakeet will display a rough estimate above the top-right corner of the script box.

The estimate depends on the number of words in the document and the chosen voice, and will be relatively accurate for documents containing mostly regular text, such as books, articles and voiceover scripts containing usual words for the chosen language. If you have lots of processing instructions (such as pauses, custom reading speed or pronunciation instructions), the estimate might not be fully accurate.

Calculate the exact duration using a preview

For situations where you need more precision, we suggest taking a smaller representative segment and generating a preview. Previews are free, so this will not cost you any credits, and you will get an exact duration for the generated segment that you can then multiply with the duration of the remaining script. The duration will be shown in the top-right corner, next to the full estimated duration.

Calculate text to speech duration using previews