In the last couple of years we have seen the rise of AI Large Language Models (LLM's), such as ChatGPT and GPT-4 which are capable of processing and understanding natural language. These models can be used to generate human-like responses to prompts and questions, and can also be used to summarize text, including video transcripts and subtitles.
In this article I'll show you how you can build a fully automated AI video summarizer application using the Shotstack Ingest API and OpenAI's text generation Chat API. We'll use the Ingest API to extract subtitles from a video as an SRT file and then use OpenAI's Chat API to generate a summary description of the video based on the SRT file content.
Shotstack is a media automation platform that helps developers rapidly build video applications and automated media workflows. The platform offers a number of API products including the Ingest API which allows you to fetch videos from remote URLs or uploads and perform transformations and enhancements. One of the features is that it can be used as a subtitles API that can generate a transcript in SRT or VTT format from a video.
OpenAI is the AI company that introduced the world to Large Language Models (LLM's) with ChatGPT. Behind the scenes ChatGPT uses the models GPT-3.5 and GPT-4. OpenAI's API provides programmatic access to the text generation models allowing you to create what are commonly referred to as a ChatGPT wrapper, or ChatGPT wrapper application.
To follow along with this tutorial you will need to have a few things set up first:
To create an SRT file from the source video, we'll send a POST request to the Ingest API with a JSON payload. The payload includes the video URL to transcribe and the subtitles format, which in our case is SRT. For this tutorial, we will use this video file hosted on GitHub at: https://github.com/shotstack/test-media/raw/main/captioning/scott-ko.mp4:
Run the following command in your terminal. Make sure to substitute SHOTSTACK_API_KEY
with your own API key for the stage/sandbox environment.
curl -X POST \
-H "Content-Type: application/json" \
-H "x-api-key: SHOTSTACK_API_KEY" \
https://api.shotstack.io/ingest/stage/sources \
-d '
{
"url": "https://github.com/shotstack/test-media/raw/main/captioning/scott-ko.mp4",
"outputs": {
"transcription": {
"format": "srt"
}
}
}'
The command sends the POST request to the API with a JSON payload with the file UR and subtitles format. If successful, you should receive a response similar to below, which includes the id
of the video file being ingested and transcribed, like this:
{
"data": {
"type": "source",
"id": "zzy8s0xq-2xog-n93j-fezr-0lswed1cfssj"
}
}
Copy the id
from the response. We will use it to check the status in the next step.
Note that the Ingest service first fetches and stores the video on the Shotstack servers, and then generates the SRT file. Wait for around 30 seconds for this process to finish, and then execute the command below.
As before, substitute SHOTSTACK_API_KEY
with your own API key, and the id
in the URL with the one received in the last step:
curl -X GET https://api.shotstack.io/ingest/stage/sources/zzy8s0xq-2xog-n93j-fezr-0lswed1cfssj \
-H 'Accept: application/json' \
-H 'x-api-key: SHOTSTACK_API_KEY'
If successful, the response should include all the details of the source file, and the status of the transcription, similar to this:
{
"data": {
"type": "source",
"id": "zzy8s0xq-2xog-n93j-fezr-0lswed1cfssj",
"attributes": {
"id": "zzy8s0xq-2xog-n93j-fezr-0lswed1cfssj",
"owner": "c2jsl2e4xd",
"input": "https://github.com/shotstack/test-media/raw/main/captioning/scott-ko.mp4",
"source": "https://shotstack-ingest-api-stage-sources.s3.ap-southeast-2.amazonaws.com/c2jsl2e4xd/zzy8s0xq-2xog-n93j-fezr-0lswed1cfssj/source.mp4",
"status": "ready",
"outputs": {
"transcription": {
"status": "ready",
"url": "https://shotstack-ingest-api-stage-sources.s3.ap-southeast-2.amazonaws.com/c2jsl2e4xd/zzy8s0xq-2xog-n93j-fezr-0lswed1cfssj/transcript.srt"
}
},
"width": 1920,
"height": 1080,
"duration": 25.86,
"fps": 23.976,
"created": "2024-02-08T07:41:47.408Z",
"updated": "2024-02-08T07:42:12.576Z"
}
}
}
Retry the request until the status
parameter in the response contains ready
for both the top level source and the outputs.transcript
.
When the status is ready
for the transcript, the response should include a URL to the SRT file. If you visit the URL you should see a text file that looks like this:
1
00:00:00,409 --> 00:00:01,879
Hi, my name's Scott co as
2
00:00:02,079 --> 00:00:03,150
an entrepreneur.
3
00:00:03,160 --> 00:00:06,050
I cannot overstate how important it is these days to
4
00:00:06,059 --> 00:00:08,630
use video as a tool to reach your audience,
5
00:00:08,640 --> 00:00:10,970
your community and your customers,
6
00:00:11,119 --> 00:00:14,640
people connect with stories and video allows us to be the
7
00:00:14,649 --> 00:00:18,059
most authentic we can be in order to tell those stories.
8
00:00:18,069 --> 00:00:20,430
And so if you can present in front of a camera
9
00:00:20,440 --> 00:00:23,469
and you have the right tools to support and amplify you,
10
00:00:23,649 --> 00:00:25,420
you can be unstoppable.
If you want to learn more about generating SRT and VTT captions using the Shotstack Ingest API, you can read more in our Generate SRT and VTT Subtitles using an API guide.
Now that we have the SRT file, we can use OpenAI's text generation API to generate a summary of the video based on the text contained in the file.
Using the subtitles file generated in the previous step, we will create a prompt that asks OpenAI's text generation API to summarize the video transcript. We will then send a POST request to the API with the prompt and the contents of the SRT file included in the JSON payload.
First, copy the contents of the SRT file into a new file. Name it transcript.srt.
Then create another file named summary.sh, copy the following bash script to it and replace OPENAI_API_KEY
with your own OpenAI API key.
# Read the contents of the SRT file into a variable
SRT_CONTENTS=$(<transcript.srt)
# Replace newlines with escaped newlines
SRT_CONTENTS="${SRT_CONTENTS//$'\n'/\\n}"
# Make the curl request with the SRT contents included in the JSON payload
curl https://api.openai.com/v1/chat/completions \
-H "Content-Type: application/json" \
-H "Authorization: Bearer OPENAI_API_KEY" \
-d '
{
"model": "gpt-4",
"messages": [
{
"role": "user",
"content": "Please write a short summary of a video based on the following SRT transcript: '"$SRT_CONTENTS"'"
}
]
}'
This script reads the contents of our SRT file into a variable, and then appends it to the prompt inside the JSON body of the curl POST request.
Finally, run the summary.sh bash file using this command:
./summary.sh
Note that you will need to set permissions to make the script executable on Linux/OS X and on Windows use the Windows Subsystem for Linux or modify the script slightly.
Expect an output like this:
{
"id": "chatcmpl-8ptjZSkTPw810zJpKCPg8irBaUqd8",
"object": "chat.completion",
"created": 1707379661,
"model": "gpt-4-0613",
"choices": [
{
"index": 0,
"message": {
"role": "assistant",
"content": "In this video, entrepreneur Scott Co discusses the significance of utilizing video as a tool
for reaching out to audiences, communities, and customers. He emphasizes how video allows
for authentic storytelling, encouraging viewers to learn how to present in front of a camera
and use the right tools to amplify their message."
},
"logprobs": null,
"finish_reason": "stop"
}
],
"usage": {
"prompt_tokens": 307,
"completion_tokens": 58,
"total_tokens": 365
},
"system_fingerprint": null
}
As you can see, the video summary is available inside the choices[0].message.content
parameter of the response.
In this article, we explored the basic steps used to transcribe a video and then generate a summary using the ChatGPT API.
These simple steps could be used as a starting point to build a fully automated video summarizer AI powered application. You could use this application to summarize meetings, video tutorials, YouTube videos or even audio podcasts.
Using the programming language of your choice or with the help of one of Shotstack's SDK's, you could automate the process of sending requests to the Ingest API and OpenAI's Chat API, and then parse the responses to extract the video summary, save it to a database and display it on a website or mobile app.
I hope this article has inspired you to explore the Shotstack platform further and build your own AI application to summarize video.
curl --request POST 'https://api.shotstack.io/v1/render' \
--header 'x-api-key: YOUR_API_KEY' \
--data-raw '{
"timeline": {
"tracks": [
{
"clips": [
{
"asset": {
"type": "video",
"src": "https://shotstack-assets.s3.amazonaws.com/footage/beach-overhead.mp4"
},
"start": 0,
"length": "auto"
}
]
}
]
},
"output": {
"format": "mp4",
"size": {
"width": 1280,
"height": 720
}
}
}'