
import CollapsibleCode from '@site/src/components/CollapsibleCode';

# Rich Captions

The rich-caption asset provides advanced captioning with word-level animations, active word highlighting, and rich-text styling. It supports karaoke-style effects, bounce, pop, slide, and more — all synchronized to your audio. Use it with SRT/VTT subtitle files or auto-generate captions from audio and video clips using aliases.

## Common rich-caption patterns

Before diving into individual properties, here are solutions to common captioning needs:

### Automated Styled word-by-Word Captions

Create custom-styled captions where each word highlight is highlighted as its spoken, auto-generated from a video clip. Captions are automatically transcribed by using the `alias` property. Simply set the `alias` property on the clip you wish to have transcribed, and reference that clip in the `src` property of the `rich-caption` by using the `alias://` prefix.

<video playsinline controls width="100%" poster="https://d1uej6xx5jo4cd.cloudfront.net/documentation/rich-captions/405a1866-2d86-4864-b360-f92546900a1b-poster.png">
    <source src="https://d1uej6xx5jo4cd.cloudfront.net/documentation/rich-captions/405a1866-2d86-4864-b360-f92546900a1b.mp4" type="video/mp4" />
</video>

<CollapsibleCode>{`
{
  "timeline": {
    "tracks": [
      {
        "clips": [
          {
            "asset": {
              "type": "rich-caption",
              "src": "alias://scott",
              "font": {
                "family": "_gP_1RrxsjcxVyin9l9n_j2RStR3qDpraA",
                "size": 52,
                "color": "#ffffff",
                "opacity": 1,
                "weight": 700
              },
              "animation": {
                "style": "highlight"
              },
              "border": {
                "width": 0,
                "color": "#000000",
                "opacity": 1,
                "radius": 18
              },
              "style": {
                "textTransform": "uppercase"
              },
              "padding": {
                "top": 0,
                "right": 0,
                "bottom": 0,
                "left": 0
              },
              "stroke": {
                "width": 3,
                "color": "#000000",
                "opacity": 1
              },
              "active": {
                "stroke": {
                  "width": 3,
                  "color": "#000000",
                  "opacity": 1
                },
                "font": {
                  "background": "#690be9"
                }
              }
            },
            "start": 0,
            "length": "end",
            "width": 522,
            "height": 187,
            "offset": {
              "x": 0,
              "y": 0
            },
            "transform": {
              "rotate": {
                "angle": -7.5
              }
            }
          }
        ]
      },
      {
        "clips": [
          {
            "asset": {
              "type": "video",
              "src": "https://shotstack-assets.s3.amazonaws.com/footage/scott-ko.mp4"
            },
            "start": 0,
            "length": "auto",
            "alias": "scott"
          }
        ]
      }
    ],
    "fonts": [
      {
        "src": "https://fonts.gstatic.com/s/luckiestguy/v25/_gP_1RrxsjcxVyin9l9n_j2RStR3qDpraA.ttf"
      }
    ]
  },
  "output": {
    "size": {
      "width": 1024,
      "height": 576
    },
    "format": "mp4"
  }
}
`}</CollapsibleCode>

### Simple SRT/VTT Captions

Create captions using an SRT or VTT file by referencing the url with the SRT or VTT file:

<video playsinline controls width="100%" poster="https://d1uej6xx5jo4cd.cloudfront.net/documentation/rich-captions/fe54e5c6-d593-4a67-ae71-426d3b46f8f6-poster.png">
    <source src="https://d1uej6xx5jo4cd.cloudfront.net/documentation/rich-captions/fe54e5c6-d593-4a67-ae71-426d3b46f8f6.mp4" type="video/mp4" />
</video>

<CollapsibleCode>{`
{
    "timeline": {
        "tracks": [
            {
                "clips": [
                    {
                        "asset": {
                            "type": "rich-caption",
                            "src": "https://shotstack-assets.s3.amazonaws.com/captions/transcript.srt",
                            "font": {
                                "family": "Roboto",
                                "size": 28,
                                "color": "#ffffff"
                            },
                            "align": {
                                "vertical": "bottom"
                            },
                            "stroke": {
                                "width": 2
                            }
                        },
                        "start": 0,
                        "length": "end"
                    }
                ]
            },
            {
                "clips": [
                    {
                        "asset": {
                            "type": "video",
                            "src": "https://shotstack-assets.s3.amazonaws.com/footage/scott-ko.mp4"
                        },
                        "start": 0,
                        "length": "auto"
                    }
                ]
            }
        ]
    },
    "output": {
        "format": "mp4",
        "size": {
            "width": 1280,
            "height": 720
        }
    }
}
`}</CollapsibleCode>

## Basic usage

### Using an SRT or VTT file

The simplest rich-caption uses an external subtitle file:

```json
{
    "asset": {
        "type": "rich-caption",
        "src": "https://shotstack-assets.s3.amazonaws.com/captions/transcript.srt"
    },
    "start": 0,
    "length": "end"
}
```

The `src` property accepts a URL to a publicly accessible SRT or VTT file. The captions are timed and displayed automatically based on the subtitle file's timestamps.

### Auto-generated captions

You can automatically generate captions from audio or video clips using [aliases](/docs/guide/architecting-an-application/aliases). Assign an alias to your media clip, then reference it with the `alias://` prefix. Shotstack automatically detects the language and generates word-level captions from the audio. We support [99 languages](https://docs.google.com/spreadsheets/d/1Fwc8lH5Vyxv_5AblFagM2dgyeAX7j-yvif0t1j41Eu0/edit?usp=sharing).

<CollapsibleCode>{`
{
    "timeline": {
        "tracks": [
            {
                "clips": [
                    {
                        "asset": {
                            "type": "rich-caption",
                            "src": "alias://speech"
                        },
                        "start": 0,
                        "length": "end"
                    }
                ]
            },
            {
                "clips": [
                    {
                        "alias": "speech",
                        "asset": {
                            "type": "video",
                            "src": "https://shotstack-assets.s3.amazonaws.com/footage/scott-ko.mp4"
                        },
                        "start": 0,
                        "length": "auto"
                    }
                ]
            }
        ]
    },
    "output": {
        "format": "mp4",
        "size": {
            "width": 1280,
            "height": 720
        }
    }
}
`}</CollapsibleCode>

:::info
Auto-captioning works with video, audio, and text-to-speech clips. See [Aliases](/docs/guide/architecting-an-application/aliases) for more details on declaring and referencing aliases.
:::

## Fonts

Rich captions use the same font system as [Rich Text](/docs/guide/architecting-an-application/rich-text#fonts), including the same [built-in fonts](/docs/guide/architecting-an-application/rich-text#available-fonts), [custom font loading](/docs/guide/architecting-an-application/rich-text#custom-fonts), and [international font support](/docs/guide/architecting-an-application/rich-text#international-fonts).

## Active word styling

**What are active words?**
The `active` property isolates the exact word currently being spoken in the audio and applies distinct styling to it (such as a color change, size increase, or background highlight). It temporarily overrides the base font style to create dynamic, real-time text tracking.

<div style={{display: 'flex', alignItems: 'flex-start', gap: '1.5rem', marginBottom: '1.5rem'}}>
<div style={{flex: 1}}>

The `active` property supports `font`, `stroke` and `shadow`. The `font`, `stroke`, and `shadow` properties work the same way as the base properties — only the values you set override the base styling, everything else is inherited.

</div>
<div style={{flex: '0 0 50%'}}>
<video playsinline controls width="100%" poster="https://d1uej6xx5jo4cd.cloudfront.net/documentation/rich-captions/active-word-default-poster.png">
    <source src="https://d1uej6xx5jo4cd.cloudfront.net/documentation/rich-captions/active-word-default.mp4" type="video/mp4" />
</video>
</div>
</div>

<CollapsibleCode>{`
{
    "timeline": {
        "tracks": [
            {
                "clips": [
                    {
                        "asset": {
                            "type": "rich-caption",
                            "src": "alias://source_310e0028",
                            "font": {
                                "family": "V8mDoQDjQSkFtoMM3T6r8E7mDbZyCts0DqQ",
                                "size": 152,
                                "color": "#ffffff",
                                "weight": 700
                            },
                            "align": {
                                "vertical": "middle"
                            },
                            "stroke": {
                                "width": 2,
                                "color": "#000000",
                                "opacity": 1
                            },
                            "animation": {
                                "style": "slide"
                            },
                            "active": {
                                "font": {
                                    "color": "#efbf04",
                                    "opacity": 1
                                }
                            }
                        },
                        "start": 0,
                        "length": "end"
                    }
                ]
            },
            {
                "clips": [
                    {
                        "asset": {
                            "type": "audio",
                            "src": "https://shotstack-video-hosting.s3.amazonaws.com/documentation/rich-captions/the-great-dictator-trimmed.m4a",
                            "effect": "fadeOut"
                        },
                        "start": 0,
                        "length": "auto",
                        "alias": "source_310e0028"
                    }
                ]
            }
        ],
        "fonts": [
            {
                "src": "https://fonts.gstatic.com/s/spacegrotesk/v22/V8mDoQDjQSkFtoMM3T6r8E7mDbZyCts0DqQ.ttf"
            }
        ]
    },
    "output": {
        "format": "mp4",
        "size": {
            "width": 1080,
            "height": 1920
        }
    }
}
`}</CollapsibleCode>

<!-- TODO: add video demo for active word styling -->

Here the base styling sets white text with a black stroke. The `active` override changes only the color to gold (`#efbf04`) — the font family, size, weight, and stroke are all inherited from the base.

## Animations

The `animation` property controls how words are highlighted or revealed as they're spoken. Choose from 8 animation styles.

<div style={{display: 'flex', alignItems: 'flex-start', gap: '1.5rem', marginBottom: '1.5rem'}}>
<div style={{flex: 1}}>

### karaoke

Word-by-word color fill as spoken. All words are visible from the start; the active word changes to the active color.

```json
"animation": {
    "style": "karaoke"
}
```

</div>
<div style={{flex: '0 0 50%'}}>

<video playsinline controls width="100%" poster="https://d1uej6xx5jo4cd.cloudfront.net/documentation/rich-captions/active-word-karaoke-poster.png">
    <source src="https://d1uej6xx5jo4cd.cloudfront.net/documentation/rich-captions/active-word-karaoke.mp4" type="video/mp4" />
</video>

</div>
</div>

<div style={{display: 'flex', alignItems: 'flex-start', gap: '1.5rem', marginBottom: '1.5rem'}}>
<div style={{flex: 1}}>

### highlight

Similar to karaoke — the active word changes to the active color when spoken, but with a more immediate color transition.

```json
"animation": {
    "style": "highlight"
}
```

</div>
<div style={{flex: '0 0 50%'}}>

<video playsinline controls width="100%" poster="https://d1uej6xx5jo4cd.cloudfront.net/documentation/rich-captions/active-word-highlight-poster.png">
    <source src="https://d1uej6xx5jo4cd.cloudfront.net/documentation/rich-captions/active-word-highlight.mp4" type="video/mp4" />
</video>

</div>
</div>

<div style={{display: 'flex', alignItems: 'flex-start', gap: '1.5rem', marginBottom: '1.5rem'}}>
<div style={{flex: 1}}>

### pop

Each word scales up when active, creating an energetic, punchy effect.

```json
"animation": {
    "style": "pop"
}
```

</div>
<div style={{flex: '0 0 50%'}}>

<video playsinline controls width="100%" poster="https://d1uej6xx5jo4cd.cloudfront.net/documentation/rich-captions/active-word-pop-poster.png">
    <source src="https://d1uej6xx5jo4cd.cloudfront.net/documentation/rich-captions/active-word-pop.mp4" type="video/mp4" />
</video>

</div>
</div>

<div style={{display: 'flex', alignItems: 'flex-start', gap: '1.5rem', marginBottom: '1.5rem'}}>
<div style={{flex: 1}}>

### fade

Gradual opacity transition per word. Words fade in as they become active.

```json
"animation": {
    "style": "fade"
}
```

</div>
<div style={{flex: '0 0 50%'}}>

<video playsinline controls width="100%" poster="https://d1uej6xx5jo4cd.cloudfront.net/documentation/rich-captions/active-word-fade-poster.png">
    <source src="https://d1uej6xx5jo4cd.cloudfront.net/documentation/rich-captions/active-word-fade.mp4" type="video/mp4" />
</video>

</div>
</div>

<div style={{display: 'flex', alignItems: 'flex-start', gap: '1.5rem', marginBottom: '1.5rem'}}>
<div style={{flex: 1}}>

### slide

Words slide in from a direction. Use the `direction` property to control where words slide from.

```json
"animation": {
    "style": "slide",
    "direction": "up"
}
```

**Direction options:** `left`, `right`, `up` (default), `down`

</div>
<div style={{flex: '0 0 50%'}}>

<video playsinline controls width="100%" poster="https://d1uej6xx5jo4cd.cloudfront.net/documentation/rich-captions/active-word-slide-poster.png">
    <source src="https://d1uej6xx5jo4cd.cloudfront.net/documentation/rich-captions/active-word-slide.mp4" type="video/mp4" />
</video>

</div>
</div>

<div style={{display: 'flex', alignItems: 'flex-start', gap: '1.5rem', marginBottom: '1.5rem'}}>
<div style={{flex: 1}}>

### bounce

Spring animation on word appearance. Words bounce into place with a natural spring effect.

```json
"animation": {
    "style": "bounce"
}
```

</div>
<div style={{flex: '0 0 50%'}}>

<video playsinline controls width="100%" poster="https://d1uej6xx5jo4cd.cloudfront.net/documentation/rich-captions/active-word-bounce-poster.png">
    <source src="https://d1uej6xx5jo4cd.cloudfront.net/documentation/rich-captions/active-word-bounce.mp4" type="video/mp4" />
</video>

</div>
</div>

<div style={{display: 'flex', alignItems: 'flex-start', gap: '1.5rem', marginBottom: '1.5rem'}}>
<div style={{flex: 1}}>

### typewriter

Words appear one by one and stay visible. Each word is revealed in sequence as it's spoken, building up the full caption.

```json
"animation": {
    "style": "typewriter"
}
```

</div>
<div style={{flex: '0 0 50%'}}>

<video playsinline controls width="100%" poster="https://d1uej6xx5jo4cd.cloudfront.net/documentation/rich-captions/active-word-typewriter-poster.png">
    <source src="https://d1uej6xx5jo4cd.cloudfront.net/documentation/rich-captions/active-word-typewriter.mp4" type="video/mp4" />
</video>

</div>
</div>

<div style={{display: 'flex', alignItems: 'flex-start', gap: '1.5rem', marginBottom: '1.5rem'}}>
<div style={{flex: 1}}>

### none

No animation. All words are visible immediately with no highlighting or transitions.

```json
"animation": {
    "style": "none"
}
```


</div>
<div style={{flex: '0 0 50%'}}>

<video playsinline controls width="100%" poster="https://d1uej6xx5jo4cd.cloudfront.net/documentation/rich-captions/active-word-none-poster.png">
    <source src="https://d1uej6xx5jo4cd.cloudfront.net/documentation/rich-captions/active-word-none.mp4" type="video/mp4" />
</video>

</div>
</div>

## Migrating from legacy captions

The rich-caption asset is the successor to the [caption](/docs/guide/architecting-an-application/captions) asset. Your existing SRT/VTT files and `alias://` references work identically — change `"type": "caption"` to `"type": "rich-caption"` and you're ready to use the new styling options.

## Related topics

- [Rich Text](/docs/guide/architecting-an-application/rich-text) - Static text with rich styling
- [Aliases](/docs/guide/architecting-an-application/aliases) - Auto-generate captions from audio/video clips
- [Positioning](/docs/guide/architecting-an-application/positioning) - Position and scale caption containers
