Build an AI video summarizer app using ChatGPT

In the last couple of years we have seen the rise of AI Large Language Models (LLM's), such as ChatGPT and GPT-4 which are capable of processing and understanding natural language. These models can be used to generate human-like responses to prompts and questions, and can also be used to summarize text, including video transcripts and subtitles.

In this article I'll show you how you can build a fully automated AI video summarizer application using the Shotstack Ingest API and OpenAI's text generation Chat API. We'll use the Ingest API to extract subtitles from a video as an SRT file and then use OpenAI's Chat API to generate a summary description of the video based on the SRT file content.

About Shotstack and the Ingest API

Shotstack is a media automation platform that helps developers rapidly build video applications and automated media workflows. The platform offers a number of API products including the Ingest API which allows you to fetch videos from remote URLs or uploads and perform transformations and enhancements. One of the features is that it can be used as a subtitles API that can generate a transcript in SRT or VTT format from a video.

About OpenAI and the Chat API

OpenAI is the AI company that introduced the world to Large Language Models (LLM's) with ChatGPT. Behind the scenes ChatGPT uses the models GPT-3.5 and GPT-4. OpenAI's API provides programmatic access to the text generation models allowing you to create what are commonly referred to as a ChatGPT wrapper, or ChatGPT wrapper application.

Pre-requisites

To follow along with this tutorial you will need to have a few things set up first:

  • A Shotstack API key - sign up and get a free to use sandbox account.
  • An OpenAI API key - sign up and get $5 of credits for free, to use for testing.
  • Basic knowledge of the cURL utility and executing commands in a terminal window/command line.

Extract an SRT transcript from a video using the Ingest API

To create an SRT file from the source video, we'll send a POST request to the Ingest API with a JSON payload. The payload includes the video URL to transcribe and the subtitles format, which in our case is SRT. For this tutorial, we will use this video file hosted on GitHub at: https://github.com/shotstack/test-media/raw/main/captioning/scott-ko.mp4:

Run the following command in your terminal. Make sure to substitute SHOTSTACK_API_KEY with your own API key for the stage/sandbox environment.

curl -X POST \
-H "Content-Type: application/json" \
-H "x-api-key: SHOTSTACK_API_KEY" \
https://api.shotstack.io/ingest/stage/sources \
-d '
{
"url": "https://github.com/shotstack/test-media/raw/main/captioning/scott-ko.mp4",
"outputs": {
"transcription": {
"format": "srt"
}
}
}'

The command sends the POST request to the API with a JSON payload with the file UR and subtitles format. If successful, you should receive a response similar to below, which includes the id of the video file being ingested and transcribed, like this:

{
"data": {
"type": "source",
"id": "zzy8s0xq-2xog-n93j-fezr-0lswed1cfssj"
}
}

Copy the id from the response. We will use it to check the status in the next step.

Check the status of the SRT file generation

Note that the Ingest service first fetches and stores the video on the Shotstack servers, and then generates the SRT file. Wait for around 30 seconds for this process to finish, and then execute the command below.

As before, substitute SHOTSTACK_API_KEY with your own API key, and the id in the URL with the one received in the last step:

curl -X GET https://api.shotstack.io/ingest/stage/sources/zzy8s0xq-2xog-n93j-fezr-0lswed1cfssj \
-H 'Accept: application/json' \
-H 'x-api-key: SHOTSTACK_API_KEY'

If successful, the response should include all the details of the source file, and the status of the transcription, similar to this:

{
"data": {
"type": "source",
"id": "zzy8s0xq-2xog-n93j-fezr-0lswed1cfssj",
"attributes": {
"id": "zzy8s0xq-2xog-n93j-fezr-0lswed1cfssj",
"owner": "c2jsl2e4xd",
"input": "https://github.com/shotstack/test-media/raw/main/captioning/scott-ko.mp4",
"source": "https://shotstack-ingest-api-stage-sources.s3.ap-southeast-2.amazonaws.com/c2jsl2e4xd/zzy8s0xq-2xog-n93j-fezr-0lswed1cfssj/source.mp4",
"status": "ready",
"outputs": {
"transcription": {
"status": "ready",
"url": "https://shotstack-ingest-api-stage-sources.s3.ap-southeast-2.amazonaws.com/c2jsl2e4xd/zzy8s0xq-2xog-n93j-fezr-0lswed1cfssj/transcript.srt"
}
},
"width": 1920,
"height": 1080,
"duration": 25.86,
"fps": 23.976,
"created": "2024-02-08T07:41:47.408Z",
"updated": "2024-02-08T07:42:12.576Z"
}
}
}

Retry the request until the status parameter in the response contains ready for both the top level source and the outputs.transcript.

When the status is ready for the transcript, the response should include a URL to the SRT file. If you visit the URL you should see a text file that looks like this:

1
00:00:00,409 --> 00:00:01,879
Hi, my name's Scott co as

2
00:00:02,079 --> 00:00:03,150
an entrepreneur.

3
00:00:03,160 --> 00:00:06,050
I cannot overstate how important it is these days to

4
00:00:06,059 --> 00:00:08,630
use video as a tool to reach your audience,

5
00:00:08,640 --> 00:00:10,970
your community and your customers,

6
00:00:11,119 --> 00:00:14,640
people connect with stories and video allows us to be the

7
00:00:14,649 --> 00:00:18,059
most authentic we can be in order to tell those stories.

8
00:00:18,069 --> 00:00:20,430
And so if you can present in front of a camera

9
00:00:20,440 --> 00:00:23,469
and you have the right tools to support and amplify you,

10
00:00:23,649 --> 00:00:25,420
you can be unstoppable.

If you want to learn more about generating SRT and VTT captions using the Shotstack Ingest API, you can read more in our Generate SRT and VTT Subtitles using an API guide.

Now that we have the SRT file, we can use OpenAI's text generation API to generate a summary of the video based on the text contained in the file.

Use OpenAI's text generation API to create the video summary

Using the subtitles file generated in the previous step, we will create a prompt that asks OpenAI's text generation API to summarize the video transcript. We will then send a POST request to the API with the prompt and the contents of the SRT file included in the JSON payload.

First, copy the contents of the SRT file into a new file. Name it transcript.srt.

Then create another file named summary.sh, copy the following bash script to it and replace OPENAI_API_KEY with your own OpenAI API key.

# Read the contents of the SRT file into a variable
SRT_CONTENTS=$(<transcript.srt)

# Replace newlines with escaped newlines
SRT_CONTENTS="${SRT_CONTENTS//$'\n'/\\n}"

# Make the curl request with the SRT contents included in the JSON payload
curl https://api.openai.com/v1/chat/completions \
-H "Content-Type: application/json" \
-H "Authorization: Bearer OPENAI_API_KEY" \
-d '
{
"model": "gpt-4",
"messages": [
{
"role": "user",
"content": "Please write a short summary of a video based on the following SRT transcript: '
"$SRT_CONTENTS"'"
}
]
}'

This script reads the contents of our SRT file into a variable, and then appends it to the prompt inside the JSON body of the curl POST request.

Finally, run the summary.sh bash file using this command:

./summary.sh

Note that you will need to set permissions to make the script executable on Linux/OS X and on Windows use the Windows Subsystem for Linux or modify the script slightly.

Expect an output like this:

{
"id": "chatcmpl-8ptjZSkTPw810zJpKCPg8irBaUqd8",
"object": "chat.completion",
"created": 1707379661,
"model": "gpt-4-0613",
"choices": [
{
"index": 0,
"message": {
"role": "assistant",
"content": "In this video, entrepreneur Scott Co discusses the significance of utilizing video as a tool
for reaching out to audiences, communities, and customers. He emphasizes how video allows
for authentic storytelling, encouraging viewers to learn how to present in front of a camera
and use the right tools to amplify their message."
},
"logprobs": null,
"finish_reason": "stop"
}
],
"usage": {
"prompt_tokens": 307,
"completion_tokens": 58,
"total_tokens": 365
},
"system_fingerprint": null
}

As you can see, the video summary is available inside the choices[0].message.content parameter of the response.

Building your own video summarization application

In this article, we explored the basic steps used to transcribe a video and then generate a summary using the ChatGPT API.

These simple steps could be used as a starting point to build a fully automated video summarizer AI powered application. You could use this application to summarize meetings, video tutorials, YouTube videos or even audio podcasts.

Using the programming language of your choice or with the help of one of Shotstack's SDK's, you could automate the process of sending requests to the Ingest API and OpenAI's Chat API, and then parse the responses to extract the video summary, save it to a database and display it on a website or mobile app.

I hope this article has inspired you to explore the Shotstack platform further and build your own AI application to summarize video.

Maab Saleem

BY MAAB SALEEM
27th February, 2024

Become an Automated Video Editing Pro

Every month we share articles like this one to keep you up to speed with automated video editing.


You might also like

Convert articles to videos with ChatGPT

Convert articles to videos with ChatGPT

Maab Saleem
Generate SRT and VTT subtitles using an API

Generate SRT and VTT subtitles using an API

Maab Saleem