Text to Voice in Swift
Generate spoken audio from Swift using the Narakeet REST API. The example on this page uses URLSession from Foundation — no external package or CocoaPod required. It runs in Docker with the official Swift image, on Linux servers, and on macOS.
For endpoint details, authentication, and cross-language features, see the main Text to Speech API reference.
- Swift Text to Speech Example
- iOS and Server-Side Swift
- Swift Concurrency and Async/Await
- Controlling Voice and Audio Settings
- Swift TTS Without External Dependencies
Swift Text to Speech Example
The following script sends text to the Narakeet API and saves the result as an MP3 file. The textToSpeech function accepts a URLSession as its first parameter so you can swap in a custom session configuration or mock it during testing.
import Foundation
#if canImport(FoundationNetworking)
import FoundationNetworking
#endif
final class ResultBox: @unchecked Sendable {
var error: Error?
}
func textToSpeech(session: URLSession, apiKey: String, voice: String, text: String, outputPath: String) throws {
let semaphore = DispatchSemaphore(value: 0)
let result = ResultBox()
let url = URL(string: "https://api.narakeet.com/text-to-speech/mp3?voice=\(voice)")!
var request = URLRequest(url: url, timeoutInterval: 30)
request.httpMethod = "POST"
request.setValue("application/octet-stream", forHTTPHeaderField: "Accept")
request.setValue("text/plain", forHTTPHeaderField: "Content-Type")
request.setValue(apiKey, forHTTPHeaderField: "x-api-key")
request.httpBody = text.data(using: .utf8)
let task = session.dataTask(with: request) { data, response, error in
defer { semaphore.signal() }
if let error = error {
result.error = error
return
}
guard let httpResponse = response as? HTTPURLResponse else {
result.error = NSError(domain: "TTS", code: -1,
userInfo: [NSLocalizedDescriptionKey: "Invalid response"])
return
}
if httpResponse.statusCode != 200 {
let body = data.flatMap { String(data: $0, encoding: .utf8) } ?? "no body"
result.error = NSError(domain: "TTS", code: httpResponse.statusCode,
userInfo: [NSLocalizedDescriptionKey: "API error \(httpResponse.statusCode): \(body)"])
return
}
guard let data = data else {
result.error = NSError(domain: "TTS", code: -1,
userInfo: [NSLocalizedDescriptionKey: "No data received"])
return
}
do {
try data.write(to: URL(fileURLWithPath: outputPath))
} catch {
result.error = error
}
}
task.resume()
semaphore.wait()
if let error = result.error {
throw error
}
}
guard let apiKey = ProcessInfo.processInfo.environment["NARAKEET_API_KEY"],
!apiKey.isEmpty else {
print("Please set NARAKEET_API_KEY environment variable")
exit(1)
}
let config = URLSessionConfiguration.default
config.timeoutIntervalForRequest = 30
let session = URLSession(configuration: config)
do {
try textToSpeech(session: session, apiKey: apiKey, voice: "amy",
text: "Hi there from Swift", outputPath: "output.mp3")
print("File saved at: output.mp3")
} catch {
print("Error: \(error.localizedDescription)")
exit(1)
}
Save this as tts.swift and run with swift tts.swift. On Linux, the FoundationNetworking import provides URLSession; on macOS and iOS it is part of Foundation.
For the complete project with Docker support, see the Swift streaming API example on GitHub.
iOS and Server-Side Swift
Swift’s cross-platform Foundation layer makes the same URLSession code work on iOS, macOS, and Linux servers. Where Swift text to voice fits well in practice:
- iOS or macOS apps that call the API with an encrypted key — store keys encrypted rather than in plain text, since app bundles can be decompiled
- Vapor or Hummingbird backends that return spoken audio from REST endpoints
- Swift CLI tools distributed via Swift Package Manager for batch audio generation
- Xcode Cloud or CI scripts that produce audio assets during the build
- Server-side Swift microservices behind an API gateway that convert text on demand
For input longer than 1 KB or uncompressed WAV output, switch to the Long Content (Polling) API.
Swift Concurrency and Async/Await
The example above uses a completion handler with DispatchSemaphore for maximum compatibility. If your project uses Swift concurrency (Swift 5.5+), you can wrap the call in withCheckedThrowingContinuation or use URLSession.data(for:) directly in an async context. The API itself is stateless — each POST is independent — so firing multiple requests with a TaskGroup parallelises audio generation across voices or text segments without any shared state to manage.
Controlling Voice and Audio Settings
The endpoint URL and query parameters control the output. Combine these options to match your use case:
- Audio format — change the path:
/text-to-speech/mp3for compressed audio,/text-to-speech/m4afor higher quality at similar size,/text-to-speech/wavfor uncompressed PCM (requires polling API) - Voice — set
?voice=amyto pick from 900 voices across 100 languages; browse all options at Text to Speech Voices - Speed — add
&voice-speed=1.2to read 20% faster, or&voice-speed=0.85to slow down - Volume — add
&voice-volume=softor&voice-volume=loudto shift the output level
For pitch adjustments, sentence pauses, and multi-voice scripts, use the script header format in the request body. Full details at Configuring Audio Tasks.
Swift TTS Without External Dependencies
A common question is whether a dedicated Swift TTS package or CocoaPod exists for cloud-based speech synthesis. With Narakeet, Foundation’s URLSession handles the entire integration — POST text, receive audio bytes, write to disk. There is no SPM package to add, no CocoaPod to install, and no C library to link. The textToSpeech function in the example above drops into any Swift project that can import Foundation.