Generate SRT and VTT subtitles using an API

In this post, you will learn how to use the Shotstack Ingest service as a subtitles API to effortlessly extract subtitles and transcribe a video. Why might you want to do this? Well, there are several practical reasons I can think of.

Subtitles make your video accessible to a broader audience including people who are deaf or hard of hearing. Also many websites, apps and players mute the volume of videos by default or a user might chose to use their device with the volume turned off.

A newer use case is passing subtitles extracted using an API as an input to generative AI tools for content analysis and repurposing.

About Shotstack and the Ingest API

Shotstack is an API-first platform that lets you process, edit, render and build video applications - all without the need for extensive technical expertise or traditional video editing skills. The Ingest API provides a way to fetch source files from anywhere on the internet, and process them as needed; no need to host your own assets or build a custom uploader.

For this guide, we will use the Ingest API to retrieve a remote file and generate subtitles from it, both in SRT and VTT formats. So, if you were looking for an easy to use video to srt or vtt converter API, follow along and you won't need to search elsewhere.

Pre-requisites

To get started, sign up on the Shotstack website and get a free API key. This key will allow you to make requests to the Ingest API and generate subtitles. We also expect you to have some familiarity with the cURL utility and running commands in a terminal window.

What is an SRT file?

An SRT (SubRip Subtitle) file is a plain text file format commonly used to store subtitles for videos. It contains timecodes for each subtitle entry, which allow for precise synchronization with corresponding video frames. This subtitles guide explains in more details.

What is a VTT file?

A VTT (WebVTT) file is a text-based file format mainly used for subtitles in web videos. Similar to SRT, it includes timecodes for synchronization with video frames. VTT files also support additional features, like text styling and positioning options.

Use the API to generate an SRT file from source footage

To generate an SRT file from an input video, we will be sending a POST request to the Ingest API with a JSON payload. The payload will specify the URL of the video file to fetch, and the output format of the transcription, i.e. srt.

From your shell or command line, execute the following command. Make sure to replace YOUR_API_KEY with your own API key for the stage/sandbox environment.

curl -X POST \
     -H "Content-Type: application/json" \
     -H "x-api-key: YOUR_API_KEY" \
     https://api.shotstack.io/ingest/stage/sources \
     -d '
	{
		"url": "https://github.com/shotstack/test-media/raw/main/captioning/scott-ko.mp4",
		"outputs": {
			"transcription": {
				"format": "srt"
			}
		}
	}'

The command sends a POST request with a JSON payload that specifies:

The URL of the file to fetch/ingest, i.e. https://github.com/shotstack/test-media/raw/main/captioning/scott-ko.mp4.
The output format of the transcription, i.e. srt.

If successful the API will respond with the request ID like this:

{
    "data": {
        "type": "source",
        "id": "zzy8xtam-0xha-mg1q-r3cl-4ladmk1upgjq"
    }
}

Copy the id from the response. We will be use it to check the status in the next step.

Check the status of the subtitle generation

Note that the video file is first fetched and stored on the Shotstack servers, and then the SRT file is generated. Wait for about 30 seconds for the entire process to complete, and then run the following command.

As before, make sure you replace YOUR_API_KEY with your own API key, and id in the URL with the one received in the previous API call response:

curl -X GET https://api.shotstack.io/ingest/stage/sources/zzy8xtam-0xha-mg1q-r3cl-4ladmk1upgjq \
  -H 'Accept: application/json' \
  -H 'x-api-key: YOUR_API_KEY'

The response includes all teh details about the source file, including the status of the transcription. It should look:

{
    "data": {
        "type": "source",
        "id": "zzy8xtam-0xha-mg1q-r3cl-4ladmk1upgjq",
        "attributes": {
            "id": "zzy8xtam-0xha-mg1q-r3cl-4ladmk1upgjq",
            "owner": "c2jsl2d4xd",
            "input": "https://github.com/shotstack/test-media/raw/main/captioning/scott-ko.mp4",
            "source": "https://shotstack-ingest-api-stage-sources.s3.ap-southeast-2.amazonaws.com/c2jsl2d4xd/zzy8xtam-0xha-mg1q-r3cl-4ladmk1upgjq/source.mp4",
            "status": "ready",
            "outputs": {
                "transcription": {
                    "status": "ready",
                    "url": "https://shotstack-ingest-api-stage-sources.s3.ap-southeast-2.amazonaws.com/c2jsl2d4xd/zzy8xtam-0xha-mg1q-r3cl-4ladmk1upgjq/transcript.srt"
                }
            },
            "width": 1920,
            "height": 1080,
            "duration": 25.86,
            "fps": 23.976,
            "created": "2024-01-24T06:07:01.735Z",
            "updated": "2024-01-24T06:07:27.772Z"
        }
    }
}

Check the status parameter in the response to see if the subtitles have been generated by the API; if they are, it should say ready. If they are still in progress, you might see statuses like queued, saving, or fetching. In that case, simply wait for a few seconds and then retry the same GET request.

Once you receive the ready status in the response, visit the link at the url parameter in your browser to view the SRT file. It should look like this:

1
00:00:00,409 --> 00:00:01,879
Hi, my name's Scott co as

2
00:00:02,079 --> 00:00:03,150
an entrepreneur.

3
00:00:03,160 --> 00:00:06,050
I cannot overstate how important it is these days to

4
00:00:06,059 --> 00:00:08,630
use video as a tool to reach your audience,

5
00:00:08,640 --> 00:00:10,970
your community and your customers,

6
00:00:11,119 --> 00:00:14,640
people connect with stories and video allows us to be the

7
00:00:14,649 --> 00:00:18,059
most authentic we can be in order to tell those stories.

8
00:00:18,069 --> 00:00:20,430
And so if you can present in front of a camera

9
00:00:20,440 --> 00:00:23,469
and you have the right tools to support and amplify you,

10
00:00:23,649 --> 00:00:25,420
you can be unstoppable.

Use the API to generate a VTT file from source footage

Now we will use the Ingest API to generate VTT subtitles from our video. The only thing we need to change in our POST request is the transcription format. From your shell or command line, run the following command. Make sure to replace YOUR_API_KEY with your own API key.

curl -X POST \
     -H "Content-Type: application/json" \
     -H "x-api-key: YOUR_API_KEY" \
     https://api.shotstack.io/ingest/stage/sources \
     -d '
	{
		"url": "https://github.com/shotstack/test-media/raw/main/captioning/scott-ko.mp4",
		"outputs": {
			"transcription": {
				"format": "vtt"
			}
		}
	}'

As you can see, we have only changed the value of the format parameter to vtt.

The response should be in the same format as before except with a different id. Follow the same process as before to check the status of the subtitle generation.

Once the status is ready, visit the URL in the url parameter of the response to view the VTT file. It should look like this:

WEBVTT

1
00:00:00.409 --> 00:00:01.879
Hi, my name's Scott co as

2
00:00:02.079 --> 00:00:03.150
an entrepreneur.

3
00:00:03.160 --> 00:00:06.050
I cannot overstate how important it is these days to

4
00:00:06.059 --> 00:00:08.630
use video as a tool to reach your audience,

5
00:00:08.640 --> 00:00:10.970
your community and your customers,

6
00:00:11.119 --> 00:00:14.640
people connect with stories and video allows us to be the

7
00:00:14.649 --> 00:00:18.059
most authentic we can be in order to tell those stories.

8
00:00:18.069 --> 00:00:20.430
And so if you can present in front of a camera

9
00:00:20.440 --> 00:00:23.469
and you have the right tools to support and amplify you,

10
00:00:23.649 --> 00:00:25.420
you can be unstoppable.

In both examples you'll notice that the transcription is not perfect, for example, the narrators name is displayed as "Scott co" instead of "Scott Ko". You can download, modify and save the SRT and VTT files as needed before using them.

Using the API generated subtitles

Now that we have our API generated SRT and VTT files we can put these to use in various ways.

Add subtitles to an HTML 5 video

You can use the VTT file to add subtitles to the browsers built in HTML 5 video player. Here is the HTML how this can be done:

<video controls preload="metadata" crossorigin="anonymous">
  <source src="https://d1uej6xx5jo4cd.cloudfront.net/scott-ko.mp4" type="video/mp4" />
  <track
      label="English"
      kind="subtitles"
      srclang="en"
      src="https://shotstack-ingest-api-stage-sources.s3.ap-southeast-2.amazonaws.com/c2jsl2d4xd/zzy8xtj7-4rsb-zf3w-2fmg-1ufq4t1qrpdw/transcript.vtt"
      default />
</video>

Here we have the HTML video element with the video src pointing to the Scott Ko video and the track element src pointing to the VTT file generated by our subtitles API.

This is how it displays in the browser:

Burn subtitles into the video

Burning subtitles in to a video is when you permanently write the subtitles in to the video. Unlike using the HTML player, the subtitles will be displayed regardless of the player or platform. One way to do this is using a video editing API. In this guide, we use the Shotstack Edit API and PHP to burn the subtitles into the video.

Create a video summary using ChatGPT

With the latest advances in Large Language Models (LLM's), like ChatGPT you can generate a summary of the video using a prompt including the SRT or VTT file.

As an example, I gave ChatGPT the following prompt plus the SRT file contents:

Please write a short summary of a video based on it's SRT subtitles below:

ChatGPT interface with video summary prompt

And here is the response I received:

ChatGPT video summary from SRT transcript response

The video features Scott, an entrepreneur, emphasizing the critical role of video
as a tool for reaching audiences, communities, and customers. He highlights the
power of storytelling through video, asserting that it allows individuals to be
their most authentic selves in conveying their stories. Scott suggests that
mastering presentation in front of a camera, combined with the right tools,
can make someone unstoppable in their outreach and communication efforts.

You could of course recreate this using the ChatGPT API and the SRT file generated by the Shotstack Ingest API to create a completely automated video summary application.

I hope this article inspires you to explore more ways you can generate subtitles using an API and how the SRT and VTT files can be used in various applications and help you build the next big thing.

Get started with Shotstack's video editing API in two steps:

Sign up for free to get your API key.

Send an API request to create your video:

curl --request POST 'https://api.shotstack.io/v1/render' \
--header 'x-api-key: YOUR_API_KEY' \
--data-raw '{
  "timeline": {
    "tracks": [
      {
        "clips": [
          {
            "asset": {
              "type": "video",
              "src": "https://shotstack-assets.s3.amazonaws.com/footage/beach-overhead.mp4"
            },
            "start": 0,
            "length": "auto"
          }
        ]
      }
    ]
  },
  "output": {
    "format": "mp4",
    "size": {
      "width": 1280,
      "height": 720
    }
  }
}'