Rich Captions (beta)
The rich-caption asset is currently in beta. As this asset is subject to change we do not recommend using it in production workflows yet.
The rich-caption asset provides advanced captioning with word-level animations, active word highlighting, and rich-text styling. It supports karaoke-style effects, bounce, pop, slide, and more — all synchronized to your audio. Use it with SRT/VTT subtitle files or auto-generate captions from audio and video clips using aliases.
Common rich-caption patterns
Before diving into individual properties, here are solutions to common captioning needs:
Automated Styled word-by-Word Captions
Create custom-styled captions where each word highlight is highlighted as its spoken, auto-generated from a video clip. Captions are automatically transcribed by using the alias property. Simply set the alias property on the clip you wish to have transcribed, and reference that clip in the src property of the rich-caption by using the alias:// prefix.
Simple SRT/VTT Captions
Create captions using an SRT or VTT file by referencing the url with the SRT or VTT file:
Basic usage
Using an SRT or VTT file
The simplest rich-caption uses an external subtitle file:
{
"asset": {
"type": "rich-caption",
"src": "https://shotstack-assets.s3.amazonaws.com/captions/transcript.srt"
},
"start": 0,
"length": "end"
}
The src property accepts a URL to a publicly accessible SRT or VTT file. The captions are timed and displayed automatically based on the subtitle file's timestamps.
Auto-generated captions
You can automatically generate captions from audio or video clips using aliases. Assign an alias to your media clip, then reference it with the alias:// prefix. Shotstack automatically detects the language and generates word-level captions from the audio. We support 99 languages.
Auto-captioning works with video, audio, and text-to-speech clips. See Aliases for more details on declaring and referencing aliases.
Fonts
Rich captions use the same font system as Rich Text, including the same built-in fonts, custom font loading, and international font support.
Active word styling
What are active words?
The active property isolates the exact word currently being spoken in the audio and applies distinct styling to it (such as a color change, size increase, or background highlight). It temporarily overrides the base font style to create dynamic, real-time text tracking.
The active property supports font, stroke and shadow. The font, stroke, and shadow properties work the same way as the base properties — only the values you set override the base styling, everything else is inherited.
Here the base styling sets white text with a black stroke. The active override changes only the color to gold (#efbf04) — the font family, size, weight, and stroke are all inherited from the base.
Animations
The wordAnimation property controls how words are highlighted or revealed as they're spoken. Choose from 8 animation styles.
karaoke
Word-by-word color fill as spoken. All words are visible from the start; the active word changes to the active color.
"wordAnimation": {
"style": "karaoke"
}
highlight
Similar to karaoke — the active word changes to the active color when spoken, but with a more immediate color transition.
"wordAnimation": {
"style": "highlight"
}
pop
Each word scales up when active, creating an energetic, punchy effect.
"wordAnimation": {
"style": "pop"
}
fade
Gradual opacity transition per word. Words fade in as they become active.
"wordAnimation": {
"style": "fade"
}
slide
Words slide in from a direction. Use the direction property to control where words slide from.
"wordAnimation": {
"style": "slide",
"direction": "up"
}
Direction options: left, right, up (default), down
bounce
Spring animation on word appearance. Words bounce into place with a natural spring effect.
"wordAnimation": {
"style": "bounce"
}
typewriter
Words appear one by one and stay visible. Each word is revealed in sequence as it's spoken, building up the full caption.
"wordAnimation": {
"style": "typewriter"
}
none
No animation. All words are visible immediately with no highlighting or transitions.
"wordAnimation": {
"style": "none"
}
Migrating from legacy captions
The rich-caption asset is the successor to the caption asset. Your existing SRT/VTT files and alias:// references work identically — change "type": "caption" to "type": "rich-caption" and you're ready to use the new styling options.
Related topics
- Rich Text - Static text with rich styling
- Aliases - Auto-generate captions from audio/video clips
- Positioning - Position and scale caption containers