Text to Voice in Swift

Generate spoken audio from Swift using the Narakeet REST API. The example on this page uses URLSession from Foundation — no external package or CocoaPod required. It runs in Docker with the official Swift image, on Linux servers, and on macOS.

For endpoint details, authentication, and cross-language features, see the main Text to Speech API reference.

Swift Text to Speech Example

The following script sends text to the Narakeet API and saves the result as an MP3 file. The textToSpeech function accepts a URLSession as its first parameter so you can swap in a custom session configuration or mock it during testing.

import Foundation
#if canImport(FoundationNetworking)
import FoundationNetworking
#endif

final class ResultBox: @unchecked Sendable {
	var error: Error?
}

func textToSpeech(session: URLSession, apiKey: String, voice: String, text: String, outputPath: String) throws {
	let semaphore = DispatchSemaphore(value: 0)
	let result = ResultBox()

	let url = URL(string: "https://api.narakeet.com/text-to-speech/mp3?voice=\(voice)")!
	var request = URLRequest(url: url, timeoutInterval: 30)
	request.httpMethod = "POST"
	request.setValue("application/octet-stream", forHTTPHeaderField: "Accept")
	request.setValue("text/plain", forHTTPHeaderField: "Content-Type")
	request.setValue(apiKey, forHTTPHeaderField: "x-api-key")
	request.httpBody = text.data(using: .utf8)

	let task = session.dataTask(with: request) { data, response, error in
		defer { semaphore.signal() }

		if let error = error {
			result.error = error
			return
		}

		guard let httpResponse = response as? HTTPURLResponse else {
			result.error = NSError(domain: "TTS", code: -1,
				userInfo: [NSLocalizedDescriptionKey: "Invalid response"])
			return
		}

		if httpResponse.statusCode != 200 {
			let body = data.flatMap { String(data: $0, encoding: .utf8) } ?? "no body"
			result.error = NSError(domain: "TTS", code: httpResponse.statusCode,
				userInfo: [NSLocalizedDescriptionKey: "API error \(httpResponse.statusCode): \(body)"])
			return
		}

		guard let data = data else {
			result.error = NSError(domain: "TTS", code: -1,
				userInfo: [NSLocalizedDescriptionKey: "No data received"])
			return
		}

		do {
			try data.write(to: URL(fileURLWithPath: outputPath))
		} catch {
			result.error = error
		}
	}
	task.resume()
	semaphore.wait()

	if let error = result.error {
		throw error
	}
}

guard let apiKey = ProcessInfo.processInfo.environment["NARAKEET_API_KEY"],
	!apiKey.isEmpty else {
	print("Please set NARAKEET_API_KEY environment variable")
	exit(1)
}

let config = URLSessionConfiguration.default
config.timeoutIntervalForRequest = 30
let session = URLSession(configuration: config)

do {
	try textToSpeech(session: session, apiKey: apiKey, voice: "amy",
		text: "Hi there from Swift", outputPath: "output.mp3")
	print("File saved at: output.mp3")
} catch {
	print("Error: \(error.localizedDescription)")
	exit(1)
}

Save this as tts.swift and run with swift tts.swift. On Linux, the FoundationNetworking import provides URLSession; on macOS and iOS it is part of Foundation.

For the complete project with Docker support, see the Swift streaming API example on GitHub.

iOS and Server-Side Swift

Swift’s cross-platform Foundation layer makes the same URLSession code work on iOS, macOS, and Linux servers. Where Swift text to voice fits well in practice:

  • iOS or macOS apps that call the API with an encrypted key — store keys encrypted rather than in plain text, since app bundles can be decompiled
  • Vapor or Hummingbird backends that return spoken audio from REST endpoints
  • Swift CLI tools distributed via Swift Package Manager for batch audio generation
  • Xcode Cloud or CI scripts that produce audio assets during the build
  • Server-side Swift microservices behind an API gateway that convert text on demand

For input longer than 1 KB or uncompressed WAV output, switch to the Long Content (Polling) API.

Swift Concurrency and Async/Await

The example above uses a completion handler with DispatchSemaphore for maximum compatibility. If your project uses Swift concurrency (Swift 5.5+), you can wrap the call in withCheckedThrowingContinuation or use URLSession.data(for:) directly in an async context. The API itself is stateless — each POST is independent — so firing multiple requests with a TaskGroup parallelises audio generation across voices or text segments without any shared state to manage.

Controlling Voice and Audio Settings

The endpoint URL and query parameters control the output. Combine these options to match your use case:

  • Audio format — change the path: /text-to-speech/mp3 for compressed audio, /text-to-speech/m4a for higher quality at similar size, /text-to-speech/wav for uncompressed PCM (requires polling API)
  • Voice — set ?voice=amy to pick from 900 voices across 100 languages; browse all options at Text to Speech Voices
  • Speed — add &voice-speed=1.2 to read 20% faster, or &voice-speed=0.85 to slow down
  • Volume — add &voice-volume=soft or &voice-volume=loud to shift the output level

For pitch adjustments, sentence pauses, and multi-voice scripts, use the script header format in the request body. Full details at Configuring Audio Tasks.

Swift TTS Without External Dependencies

A common question is whether a dedicated Swift TTS package or CocoaPod exists for cloud-based speech synthesis. With Narakeet, Foundation’s URLSession handles the entire integration — POST text, receive audio bytes, write to disk. There is no SPM package to add, no CocoaPod to install, and no C library to link. The textToSpeech function in the example above drops into any Swift project that can import Foundation.