Speech endpoints
/v1/speech/*TTS, STT, dialogue, dubbing, voice isolation, and related speech helpers.
What this section covers
Endpoint-level reference for the current /v1/speech/* surface.
Speech routes mix JSON and multipart inputs, so the biggest integration mistake is usually the wrong request format rather than the wrong path.
Some speech workflows are async, some are mixed, and the reserved streaming routes still return 501. Plan for that explicitly in your client.
Start here
Use POST/v1/speech/text-to-speech for standard async TTS.
Use POST/v1/speech/speech-to-text with multipart audio for transcription.
Use GET/v1/speech/voices before building a picker UI.
Synthesis
Turn text into voice or multi-speaker audio.
Text to speech
/v1/speech/text-to-speechGenerate speech asynchronously from text.
Streaming text to speech
/v1/speech/text-to-speech/streamReserved streaming TTS surface.
List voices
/v1/speech/voicesFetch the speech voice catalog for picker UIs and integrations.
Text to dialog
/v1/speech/text-to-dialogGenerate multi-speaker dialogue from structured line inputs.
Transcription and cleanup
Transcribe files or isolate clean voice audio.
Speech to text
/v1/speech/speech-to-textTranscribe uploaded audio. Returns 200 for short jobs or 202 for longer work.
Streaming speech to text
/v1/speech/speech-to-text/streamReserved streaming STT surface.
Voice isolator
/v1/speech/voice-isolatorIsolate speech from a mixed audio file.
Speech status
/v1/speech/status/:requestIdRead a speech-specific status payload for a prior speech request.
Localization and advanced flows
Dubbing and future voice transformation surfaces.
Dubbing
/v1/speech/dubbingCreate a dubbing job from uploaded media or a source URL.
Fetch dubbing transcript
/v1/speech/dubbing/:dubbingId/transcriptFetch the transcript for a dubbing job when available.
Voice changer
/v1/speech/voice-changerReserved speech surface that is not available in API v1.