Text to Speech Generation

Convert Text into Speech

The text-to-speech feature enables you to convert written text into spoken audio with a selection of voices and languages. You can seamlessly integrate this audio into a clip by specifying the desired voice and the text to be spoken.

{
    "asset": {
        "type": "text-to-speech",
        "text": "Good evening, in Sydney tonight we’re tracking a developing story as unexpected storms roll in across the city, bringing with them flash flooding warnings and major disruptions to the evening commute.",
        "voice": "Amy"
    },
    "start": 0,
    "length": "auto"
}

This request will embed the generated speech at the beginning of your video, with the duration automatically matching the length of the generated audio. For more details on optimizing timing, refer to smart clips.

Translating text

To create an audio file in a different language, use the language option. Ensure that you select a voice compatible with the desired language.

{
    "asset": {
        "type": "text-to-speech",
        "text": "Good evening, in Sydney tonight we’re tracking a developing story as unexpected storms roll in across the city, bringing with them flash flooding warnings and major disruptions to the evening commute.",
        "voice": "Seoyeon",
        "language": "ko-KR"
    },
    "start": 0,
    "length": "auto"
}

The above example creates an audio file in Korean. The English text is translated to and spoken in Korean.

Supported translations

Language	Value
Chinese (Mandarin)	cmn-CN
Danish	da-DK
German	de-DE
English (Australian)	en-AU
English (British)	en-GB
English (Indian)	en-IN
English (US)	en-US
Spanish (European)	es-ES
Spanish (Mexican)	es-MX
Spanish (US)	es-US
French (Canadian)	fr-CA
French	fr-FR
Italian	it-IT
Japanese	ja-JP
Hindi	hi-IN
Korean	ko-KR
Norwegian Bokmål	nb-NO
Dutch	nl-NL
Polish	pl-PL
Portuguese (Brazilian)	pt-BR
Portuguese (European)	pt-PT
Swedish	sv-SE
English (New Zealand)	en-NZ
English (South African)	en-ZA
Catalan	ca-ES
German (Austrian)	de-AT
Chinese (Cantonese)	yue-CN
Arabic (Gulf)	ar-AE
Finnish	fi-FI

Newscaster mode

Shotstack’s text-to-speech service includes a newscaster mode, which produces audio that emulates a newsreader’s delivery. To enable this mode, set the newscaster option to true.

{
    "asset": {
        "type": "text-to-speech",
        "text": "Good evening, in Sydney tonight we’re tracking a developing story as unexpected storms roll in across the city, bringing with them flash flooding warnings and major disruptions to the evening commute.",
        "voice": "Joanna",
        "newscaster": true
    },
    "start": 0,
    "length": "auto"
}

The newscaster style is available with the Matthew and Joanna voices in US English, the Lupe voice in US Spanish, and the Amy voice in British English.

Voices

The Shotstack text-to-speech service offers a variety of voices in different languages and genders:

Voice Name	Language	Gender
Hala	Arabic (Gulf)	Female
Lisa	Dutch (Belgian)	Female
Arlet	Catalan	Female
Hiujin	Chinese (Cantonese)	Female
Zhiyu	Chinese (Mandarin)	Female
Sofie	Danish	Female
Laura	Dutch	Female
Olivia	English (Australian)	Female
Amy	English (British)	Female
Emma	English (British)	Female
Brian	English (British)	Male
Arthur	English (British)	Male
Kajal	English (Indian)	Female
Niamh	English (Ireland)	Female
Aria	English (New Zealand)	Female
Ayanda	English (South African)	Female
Ivy	English (US)	Female (child)
Joanna	English (US)	Female
Kendra	English (US)	Female
Kimberly	English (US)	Female
Salli	English (US)	Female
Joey	English (US)	Male
Justin	English (US)	Male (child)
Kevin	English (US)	Male (child)
Matthew	English (US)	Male
Ruth	English (US)	Female
Stephen	English (US)	Male
Suvi	Finnish	Female
Léa	French	Female
Rémi	French	Male
Gabrielle	French (Canadian)	Female
Liam	French (Canadian)	Male
Vicki	German	Female
Daniel	German	Male
Hannah	German (Austrian)	Female
Bianca	Italian	Female
Adriano	Italian	Male
Takumi	Japanese	Male
Kazuha	Japanese	Female
Tomoko	Japanese	Female
Seoyeon	Korean	Female
Ida	Norwegian	Female
Ola	Polish	Female
Camila	Portuguese (Brazilian)	Female
Vitória/Vitoria	Portuguese (Brazilian)	Female
Thiago	Portuguese (Brazilian)	Male
Inês/Ines	Portuguese (European)	Female
Lucia	Spanish (European)	Female
Sergio	Spanish (European)	Male
Mia	Spanish (Mexican)	Female
Andrés	Spanish (Mexican)	Male
Lupe	Spanish (US)	Female
Pedro	Spanish (US)	Male
Elin	Swedish	Female

ElevenLabs Integration

Our ElevenLabs integration is currently unavailable.

Warning

Generated AI assets in the sandbox environment will incur credits.

Text to Speech Generation

Convert Text into Speech​

Translating text​

Supported translations​

Newscaster mode​

Voices​

ElevenLabs Integration​