API Reference / Endpoints

Speech endpoints

/v1/speech/*

TTS, STT, dialogue, dubbing, voice isolation, and related speech helpers.

What this section covers

Endpoint-level reference for the current /v1/speech/* surface.

Speech routes mix JSON and multipart inputs, so the biggest integration mistake is usually the wrong request format rather than the wrong path.

Some speech workflows are async, some are mixed, and the reserved streaming routes still return 501. Plan for that explicitly in your client.

Start here

Recommended starting point

Use POST/v1/speech/text-to-speech for standard async TTS.

Recommended starting point

Use POST/v1/speech/speech-to-text with multipart audio for transcription.

Recommended starting point

Use GET/v1/speech/voices before building a picker UI.

Synthesis

Turn text into voice or multi-speaker audio.

POSTAsync

View reference

Text to speech

/v1/speech/text-to-speech

Generate speech asynchronously from text.

Poll response

After the POST returns request_id, poll the response endpoint.GET/v1/requests/:requestId

audio:generatejsonConsumes 1 API call

POSTReserved

View reference

Streaming text to speech

/v1/speech/text-to-speech/stream

Reserved streaming TTS surface.

audio:generateNo bodyReserved

GETSync

View reference

List voices

/v1/speech/voices

Fetch the speech voice catalog for picker UIs and integrations.

audio:generateNo bodyFree

POSTAsync

View reference

Text to dialog

/v1/speech/text-to-dialog

Generate multi-speaker dialogue from structured line inputs.

Poll response

After the POST returns request_id, poll the response endpoint.GET/v1/requests/:requestId

audio:generatejsonConsumes 1 API call

Transcription and cleanup

Transcribe files or isolate clean voice audio.

POSTSync or async

View reference

Speech to text

/v1/speech/speech-to-text

Transcribe uploaded audio. Returns 200 for short jobs or 202 for longer work.

Poll response

If this returns 202 with a request_id, poll the response endpoint.GET/v1/requests/:requestId

audio:generatemultipartBillable when generation starts

POSTReserved

View reference

Streaming speech to text

/v1/speech/speech-to-text/stream

Reserved streaming STT surface.

audio:generateNo bodyReserved

POSTAsync

View reference

Voice isolator

/v1/speech/voice-isolator

Isolate speech from a mixed audio file.

Poll response

After the POST returns request_id, poll the response endpoint.GET/v1/requests/:requestId

audio:generatemultipartConsumes 1 API call

GETSync

View reference

Speech status

/v1/speech/status/:requestId

Read a speech-specific status payload for a prior speech request.

analytics:readNo bodyFree

Localization and advanced flows

Dubbing and future voice transformation surfaces.

POSTAsync

View reference

Dubbing

/v1/speech/dubbing

Create a dubbing job from uploaded media or a source URL.

Poll response

After the POST returns request_id, poll the response endpoint.GET/v1/requests/:requestId

audio:generatemultipartConsumes 1 API call

GETSync

View reference

Fetch dubbing transcript

/v1/speech/dubbing/:dubbingId/transcript

Fetch the transcript for a dubbing job when available.

audio:generateNo bodyFree

POSTReserved

View reference

Voice changer

/v1/speech/voice-changer

Reserved speech surface that is not available in API v1.

audio:generateNo bodyReserved

Was this page helpful?

Sound Effects endpointsNext

Speech endpoints

What this section covers

Start here

Synthesis

Text to speech

Streaming text to speech

List voices

Text to dialog

Transcription and cleanup

Speech to text

Streaming speech to text

Voice isolator

Speech status

Localization and advanced flows

Dubbing

Fetch dubbing transcript

Voice changer

Related docs

Introduction

Authentication

Rate limits

API keys

Billing & usage

Platform endpoints

Music endpoints

Sound effects endpoints