Convert MP4 video to MP3 audio using Python

Converting MP4 to MP3 will turn a video file into audio file. Audio is commonly used for podcasts, music, news narration, and similar media use cases. You can easily convert video files using existing conversion apps. But most don't offer a scalable solution. Even better, what if you can build your own app, bot, or a plugin that will convert MP4 files to MP3.

Cloudconvert, a media conversion web app ranks number one for the phrase convert MP4 to MP3 and gets an estimated 454,000 organic monthly users. That should get them significant revenue from CPC banner ads and charging users for a premium account. So why not leverage the power of programming and build your own media application to solve similar problems?

That is exactly what this tutorial aims to do. Well, not the getting successful part. But this will teach you to programmatically convert media files using Python, which is a great starting point to build the next big media app.

This tutorial has two parts:

converting a single MP4 file to MP3
converting a list of MP4 files to MP3

The Shotstack API and SDK

Shotstack offers a cloud-based video editing API. Editing and generating videos at scale requires a lot of resource and can take hours. Shotstack's rendering infrastructure makes building and scaling media applications a breeze.

This tutorial also uses the Shotstack Python SDK for video editing. Python 3 is required for the SDK.

Install and configure the Shotstack SDK

Let's install the the Shotstack Python SDK from the command line:

pip install shotstack_sdk

Set your API key as an environment variable (Linux/Mac):

export SHOTSTACK_KEY=your_key_here

or, if using Windows (Make sure to add the SHOTSTACK_KEY to the path):

set SHOTSTACK_KEY=your_key_here

Replace your_key_here with your provided sandbox API key which is free for testing and development.

Converting a single MP4 file to MP3

Create a Python script to convert mp4 to mp3

Use your favorite IDE or text editor to create a Python script. For this tutorial, we created a file called mp4-to-mp3.py. Select the file and begin editing it.

Import the required modules

Let's import the required modules for the project. We need to import modules from the Shotstack SDK to edit and render our video plus a couple of built in modules:

import shotstack_sdk as shotstack
import os
import sys

from shotstack_sdk.api import edit_api
from shotstack_sdk.model.clip import Clip
from shotstack_sdk.model.track import Track
from shotstack_sdk.model.timeline import Timeline
from shotstack_sdk.model.output import Output
from shotstack_sdk.model.edit import Edit
from shotstack_sdk.model.video_asset import VideoAsset

Configuring the API client

Next, set up the client with the API URL and key. It should use the key you added to the environment variables in the previous step:

host = "https://api.shotstack.io/stage"

configuration = shotstack.Configuration(host = host)

configuration.api_key['DeveloperKey'] = os.getenv('SHOTSTACK_KEY')

with shotstack.ApiClient(configuration) as api_client:
    api_instance = edit_api.EditApi(api_client)

Understanding the timeline architecture

The Shotstack API follows many of the principles of desktop editing software such as the use of a timeline, tracks, and clips. A timeline is like a container for multiple tracks and tracks include multiple clips which plays over time.

Setting up the MP4 track

The video should be hosted somewhere accessible via a public or signed URL. We will use the following transcription example video from the AWS Transcription tutorial. You can replace it with your direct video url.

Next, add the code below to create a VideoAsset using the video URL:

video_asset = VideoAsset(
    src = "https://d1uej6xx5jo4cd.cloudfront.net/scott-ko-w-captions.mp4"
)

A clip is defined as a type of asset in Shotstack. We can configure different attributes like length and start time. The video_clip variable below will be used to add the video_asset on the timeline. The start and length for the video are defined below.

video_clip = Clip(
    asset = video_asset,
    start = 0.0,
    length= 25.0
)

Adding the video clip to the timeline

Now, let’s create a timeline, which is like a container for multiple video clips which play over time. Tracks on the timeline allow us to layer clips on top of each other. Let's add the video_clip in the track and then track on the timeline.

track = Track(clips=[video_clip])
timeline = Timeline(
    background = "#000000",
    tracks = [track]
)

Configuring the final output

Next, we need to configure the output. To convert to a MP3, let's set the output format to mp3 and resolution to preview.

output = Output(
    format = "mp3",
    resolution = "preview"
)

edit = Edit(
    timeline = timeline,
    output   = output
)

Sending the edit to render via API

Finally, let's send the edit for processing and rendering using the API. The Shotstack SDK takes care of converting our objects to JSON, including our key to the request header, and sending everything to the API.

try:
    api_response = api_instance.post_render(edit)

    message = api_response['response']['message']
    id = api_response['response']['id']

    print(f"{message}\n")
    print(f">> render id: {id}")
except Exception as e:
    print(f"Unable to resolve API call: {e}")

Final script

Below is the completed Python script:

import shotstack_sdk as shotstack
import os
import sys

from shotstack_sdk.api import edit_api
from shotstack_sdk.model.clip import Clip
from shotstack_sdk.model.track import Track
from shotstack_sdk.model.timeline import Timeline
from shotstack_sdk.model.output import Output
from shotstack_sdk.model.edit import Edit
from shotstack_sdk.model.video_asset import VideoAsset

if __name__ == "__main__":
    host = "https://api.shotstack.io/stage"

    configuration = shotstack.Configuration(host = host)

    configuration.api_key['DeveloperKey'] = os.getenv("SHOTSTACK_KEY") 

    with shotstack.ApiClient(configuration) as api_client:
        api_instance = edit_api.EditApi(api_client)

        video_asset = VideoAsset(
            src = "https://d1uej6xx5jo4cd.cloudfront.net/scott-ko-w-captions.mp4"
        )

        video_clip = Clip(
            asset = video_asset,
            start = 0.0,
            length= 25.0
        )

        track = Track(clips=[video_clip])

        timeline = Timeline(
            background = "#000000",
            tracks     = [track]
        )
        output = Output(
            format      = "mp3"
        )

        edit = Edit(
            timeline = timeline,
            output   = output
        )

        try:
            api_response = api_instance.post_render(edit)

            message = api_response['response']['message']
            id = api_response['response']['id']
        
            print(f"{message}\n")
            print(f">> render id: {id}")
        except Exception as e:
            print(f"Unable to resolve API call: {e}")

Running the script

Run the script using Python:

python mp4-to-mp3.py

You may need to use python3 instead of python depending on your configuration.

The API will return the render id if the render request is successful. We need the render id to retrieve the render status.

Checking the render status and output URL

The render process takes place in the background and may take several seconds. We need another short script that will check the render status endpoint.

Create a file called status.py and paste the following:

import sys
import os
import shotstack_sdk as shotstack

from shotstack_sdk.api import edit_api

if __name__ == "__main__":
    host = "https://api.shotstack.io/stage"
    configuration = shotstack.Configuration(host = host)
    configuration.api_key['DeveloperKey'] = os.getenv("SHOTSTACK_KEY")

    with shotstack.ApiClient(configuration) as api_client:
        api_instance = edit_api.EditApi(api_client)
        api_response = api_instance.get_render(sys.argv[1], data=False, merged=True)
        status = api_response['response']['status']

        print(f"Status: {status}")

        if status == "done":
            url = api_response['response']['url']
            print(f">> Asset URL: {url}")

Then run the script using Python:

python status.py {renderId}

Replace {renderId} with the render id returned from the mp4-to-mp3.py script.

Re-run the status.py script every 4-5 seconds until the status is done and a URL is returned. If something goes wrong the status will return as failed.

If everything ran successfully you should now have the URL of the final video, just like the one in the example below.

Rendered MP3 example

The final rendered MP3 is ready to be hosted or transferred to your application:

Accessing your rendered media using the dashboard

You can also view your rendered media files inside the Shotstack dashboard under Renders. Media files are deleted after 24 hours and need to be transferred to your own storage provider. All files are however copied to Shotstack hosting and you can configure other destinations including AWS S3 and Mux.

Shotstack renders list in the dashboard

Converting a list of MP4 to MP3

As you can see, how easy it is to generate an audio from a video. The big advantage of using the Shotstack API is how seamless it is scale this process without having to worry about the rendering infrastructure.

To demonstrate the scalability, we will convert the following list of MP4 files to MP3. Create a new csv file called mp4.csv in the current working folder. Then paste the video url under the url column and length for each video under the length column. Length is required for each video as video length is different for each one.

url,length
https://d1uej6xx5jo4cd.cloudfront.net/slideshow-with-audio.mp4,35.0
https://cdn.shotstack.io/au/v1/msgtwx8iw6/d724e03c-1c4f-4ffa-805a-a47aab70a28f.mp4,13.0
https://cdn.shotstack.io/au/v1/msgtwx8iw6/b03c7b50-07f3-4463-992b-f5241ea15c18.mp4,36.0
https://cdn.shotstack.io/au/stage/c9npc4w5c4/d2552fc9-f05a-4e89-9749-a87d9a1ae9aa.mp4,12.0
https://cdn.shotstack.io/au/v1/msgtwx8iw6/c900a02f-e008-4c37-969f-7c9578279100.mp4,29.0

You can also inspect media using the probe endpoint to retrieve metadata of each video. The response includes width, height, duration, framerate and more. You can write a script to automatically fetch the video length. For the sake of simplicity of this tutorial, we have manually added it to the csv column.

We have used the Python csv module. So you will need to import it first with the following command:

import csv

Next, create a new file called mp4-to-mp3-list.py, paste the following script, and save it.

import shotstack_sdk as shotstack
import os
import sys
import csv

from shotstack_sdk.api import edit_api
from shotstack_sdk.model.soundtrack import Soundtrack
from shotstack_sdk.model.clip import Clip
from shotstack_sdk.model.track import Track
from shotstack_sdk.model.timeline import Timeline
from shotstack_sdk.model.output import Output
from shotstack_sdk.model.edit import Edit
from shotstack_sdk.model.video_asset import VideoAsset    

if __name__ == "__main__":
    host = "https://api.shotstack.io/stage"
    
    configuration = shotstack.Configuration(host = host)

    configuration.api_key['DeveloperKey'] = os.getenv("SHOTSTACK_KEY") 

    with shotstack.ApiClient(configuration) as api_client:
        with open("mp4.csv", 'r') as file:
            csvreader = csv.reader(file)
            header = next(csvreader)

            for row in csvreader:
                length = float(row[1])

                api_instance = edit_api.EditApi(api_client)

                video_asset = VideoAsset(
                    src = row[0]
                )

                video_clip = Clip(
                    asset = video_asset,
                    start = 0.0,
                    length = length
                )

                track = Track(clips=[video_clip])

                timeline = Timeline(
                    background = "#000000",
                    tracks = [track]
                )

                output = Output(
                    format = "mp3",
                    resolution = "preview"
                )

                edit = Edit(
                    timeline = timeline,
                    output = output
                )

                try:
                    api_response = api_instance.post_render(edit)

                    message = api_response['response']['message']
                    id = api_response['response']['id']

                    print(f"{message}\n")
                    print(f">> render id: {id}")
                except Exception as e:
                    print(f"Unable to resolve API call: {e}")

Then use the python command to run the script.

python mp4-to-mp3-list.py

To check the render status, run the status.py file we created in the first part and run it using the command line:

python status.py {renderId}

Replace the renderId from the IDs from returned from the mp4-to-mp3-list.py.

Final thoughts

This tutorial should have given you a basic understanding of how to programmatically convert MP4 videos to MP3 using Python and the Shotstack video editing API. As a next step you could learn more to add other assets like text and images to create a media application.

This is just an introductory tutorial to programmatically working with media but we can so much more. Different use cases like

video automation
video personalization
developing media apps, etc. can be achieved. You can visit our blog and YouTube videos to learn more.

Get started with Shotstack's video editing API in two steps:

Sign up for free to get your API key.

Send an API request to create your video:

curl --request POST 'https://api.shotstack.io/v1/render' \
--header 'x-api-key: YOUR_API_KEY' \
--data-raw '{
  "timeline": {
    "tracks": [
      {
        "clips": [
          {
            "asset": {
              "type": "video",
              "src": "https://shotstack-assets.s3.amazonaws.com/footage/beach-overhead.mp4"
            },
            "start": 0,
            "length": "auto"
          }
        ]
      }
    ]
  },
  "output": {
    "format": "mp4",
    "size": {
      "width": 1280,
      "height": 720
    }
  }
}'

Convert MP4 video to MP3 audio using Python

The Shotstack API and SDK

Install and configure the Shotstack SDK

Converting a single MP4 file to MP3

Create a Python script to convert mp4 to mp3

Import the required modules

Configuring the API client

Understanding the timeline architecture

Setting up the MP4 track

Adding the video clip to the timeline

Configuring the final output

Sending the edit to render via API

Final script

Running the script

Checking the render status and output URL

Rendered MP3 example

Accessing your rendered media using the dashboard

Converting a list of MP4 to MP3

Final thoughts

Get started with Shotstack's video editing API in two steps:

Experience Shotstack for yourself.

You might also like

Turn images into slideshow videos using Python

What is video automation and how does it work?

Convert MP4 video to GIF using Python

PRODUCT

SDK'S

SOLUTIONS

INDUSTRIES

RESOURCES

ABOUT

DEMOS & TOOLS