Migrate your MP3 library to Spotify

6 min readNov 30, 2019

How to automatically identify your MP3s and upload them to a Spotify playlist using Python.

If, like me, you have a large library of MP3s that you have collected over the years, you might want to have them available on the music streaming service of your choice, so that you don’t have to lug them around everywhere with you. In this article, we are going to focus on Spotify, but it should be possible to do something similar with YouTube and Deezer.

We are going to use the music recognition service provided by ACRCloud and the Spotify Developer API. To begin with, you will need to register with these and obtain your credentials.

Get your credentials from ACRCloud

It’s important to check the box to enable 3rd party ID integration. You should now have a set of values for Host, Access Key and Access Secret. You’ll be needing these later. As part of a free 14 day trial, you will able to identify 10,000 MP3s a day.

Get your credentials from Spotify

Head over to the Spotify for Developers Dashboard and create a new client ID. (If you don’t already have a Spotify account, you can easily obtain a free one.) You’ll have to provide some blah blah about what you are doing but this is not important, as long as you don’t intend to use it for commercial purposes. You’ll now have access to your Client ID and your Client Secret. However, it is also important to edit the settings and include a Redirect URI. As part of the authentication process, Spotify will call you back at this web address with a URL that contains a magic code. You can choose any website that allows you to grab this information (e.g., https://github.com) without being redirected to a 404 error page. Make a note of your Client ID, Client Secret and Redirect URI.

TL;DR

Now that you have all the pieces you need, you can run my notebook in Google Colab and just follow the steps.

Google Colaboratory

Edit description

colab.research.google.com

Identify MP3s using Python

First, we have to import some libraries. It should be straightforward to install them if you have not done so already.

import os
import sys
import hmac
import time
import json
import tqdm
import base64
import hashlib
import urllib.request
import urllib.parse
import datetime
from pydub import AudioSegment

Now we define a couple of functions to handle POSTing requests to ACRCloud in the multi-part format which it expects. I copied this code from the ACRCloud Python SDK.

def post_multipart(url, fields, files):
    content_type, body = encode_multipart_formdata(fields, files)    req = urllib.request.Request(url, data=body)
    req.add_header('Content-Type', content_type)
    req.add_header('Referer', url)
    resp = urllib.request.urlopen(req)
    ares = resp.read().decode('utf8')
    return aresdef encode_multipart_formdata(fields, files):
    boundary = "*****2016.05.27.acrcloud.rec.copyright." + str(
        time.time()) + "*****"
    body = b''
    CRLF = '\r\n'
    L = []
    for (key, value) in list(fields.items()):
        L.append('--' + boundary)
        L.append('Content-Disposition: form-data; name="%s"' % key)
        L.append('')
        L.append(value)
    body = CRLF.join(L).encode('ascii')
    for (key, value) in list(files.items()):
        L = []
        L.append(CRLF + '--' + boundary)
        L.append('Content-Disposition: form-data; name="%s"; filename="%s"' %(key, key))
        L.append('Content-Type: application/octet-stream')
        L.append(CRLF)
        body = body + CRLF.join(L).encode('ascii') + value
    body = body + (CRLF + '--' + boundary + '--' + CRLF + CRLF).encode('ascii')
    content_type = 'multipart/form-data; boundary=%s' % boundary
    return content_type, body

Now we can define a function which identifies an MP3 and returns all the available information about it.

def get_track_info(sample):
    http_method = "POST"
    http_url_file = "/v1/identify"
    data_type = "audio"
    signature_version = "1"
    timestamp = int( time.mktime(datetime.datetime.utcfromtimestamp(time.time()).timetuple()))
    query_data = sample[:5000000]  # make sure sample is not too big
    sample_bytes = str(len(query_data))
    string_to_sign = http_method + "\n" + http_url_file + "\n" + access_key + "\n" + data_type + "\n" + signature_version + "\n" + str(timestamp)
    hmac_res = hmac.new(access_secret.encode('ascii'),
                        string_to_sign.encode('ascii'),
                        digestmod=hashlib.sha1).digest()
    sign = base64.b64encode(hmac_res).decode('ascii')    fields = {
        'access_key': access_key,
        'sample_bytes': sample_bytes,
        'timestamp': str(timestamp),
        'signature': sign,
        'data_type': data_type,
        "signature_version": signature_version
    }    res = post_multipart('http://' + host + http_url_file, fields,
                         {"sample": query_data})
    parsed_resp = json.loads(res)
    return parsed_resp

Note that I have truncated the size of the sample to 5 Mb in order to avoid an error.

This is where you should fill in your ACRCloud API credentials:

host = 'fill this in with your details'
access_key = 'fill this in with your details' 
access_secret = 'fill this in with your details'

We can try it out with an example like so:

f = open("01 Push It Along.mp3", "rb")
sample = f.read()
f.close()
get_track_info(sample)

We get the following results:

{'metadata': {'timestamp_utc': '2019-11-30 12:07:45',
  'music': [{'label': 'Jive',
    'play_offset_ms': 14480,
    'external_ids': {'isrc': 'USJI10300139', 'upc': '012414133120'},
    'artists': [{'name': 'A Tribe Called Quest'}],
    'result_from': 1,
    'acrid': '71678fbabfbf26d9d4ec1a85d0655631',
    'title': 'Push It Along',
    'duration_ms': 462200,
    'album': {'name': "Peoples' Instinctive Travels & the Paths of Rhythm"},
    'score': 100,
    'external_metadata': {'deezer': {'track': {'name': 'Push It Along',
       'id': '2467796'},
      'artists': [{'name': 'A Tribe Called Quest', 'id': '1862'}],
      'album': {'name': "Peoples' Instinctive Travels & the Paths of Rhythm",
       'id': '242435'}},
     'spotify': {'track': {'name': 'Push It Along',
       'id': '6RwONnsgzkvNEwwxoPmg04'},
      'artists': [{'name': 'A Tribe Called Quest',
        'id': '09hVIj6vWgoCDtT03h8ZCa'}],
      'album': {'name': "Peoples' Instinctive Travels & the Paths of Rhythm",
       'id': '4Qt1ZvWZ3DoKDimDMesZd5'}},
     'youtube': {'vid': 'qRPvKh4JCLg'},
     'musicstory': {'track': {'id': '1718770'}}},
    'release_date': '1990-04-11'}]},
 'cost_time': 1.3910000324249,
 'status': {'msg': 'Success', 'version': '1.0', 'code': 0},
 'result_type': 0}

Not only has it correctly recognized the song, but we have information about how to identify it on Spotify, Deezer, YouTube and Music Story! Now it is a simple matter to iterate over a directory and compile a list of Spotify IDs.

directory = '/path/to/your/mp3s/and/m4as'
ids = {}
mp3s = []
for root, dirs, files in os.walk(directory):
    for file in files:
        if file[-3:] == 'mp3' or file[-3:] == 'm4a':
            mp3s.append(root + '/' + file)

If the track is not in the appropriate format, it is re-coded as an MP3 by Pydub. If the track is still not recognized by ACRCloud or if there is no Spotify ID associated with it, a message is displayed.

for sound_file in tqdm.tqdm_notebook(mp3s):
    if sound_file in ids:
        continue    try:
        f = open(sound_file, "rb")
        sample = f.read()
        f.close()
        parsed_resp = get_track_info(sample)
        ids[sound_file] = parsed_resp['metadata']['music'][0][
            'external_metadata']['spotify']['track']['id']except Exception as e:
        if parsed_resp['status']['code'] == 2004:
            try:
                # re-encode sample as mp3
                audio = AudioSegment.from_file(sound_file, format=sound_file[-3:])
                audio.export("audio.mp3", format="mp3")
                f = open("audio.mp3", "rb")
                sample = f.read()
                f.close()
                parsed_resp = get_track_info(sample)
                ids[sound_file] = parsed_resp['metadata']['music'][0]['external_metadata']['spotify']['track']['id']
                continue
            except:
                pass        if 'limit exceeded' in parsed_resp['status']['msg']:
            print(
                f"{parsed_resp['status']['msg']}: Got to {mp3s.index(sound_file)}"
            )
            break
        if parsed_resp['status']['msg'] == 'Success':
            print(f'{e}: Skipping {sound_file}...')
        else:
            print(f"{parsed_resp['status']['msg']}: Skipping {sound_file}...")
        continue

Add the tracks to a new playlist in Spotify

For this, we need to install and import the Spotipy (not a typo) library.

import spotipy
import spotipy.util as util

Fill in your previously obtained API credentials, as well as your Spotify username and the name of the playlist you’d like to create.

client_id = 'fill this in with your details'
client_secret = 'fill this in with your details'
redirect_uri = 'fill this in with your details'
username = 'fill this in with your details'
playlist_name = 'fill this in with your details'

Unfortunately, there is a small bug in the current version of Spotipy (and it doesn’t look as though it will be corrected any time soon). To get around this, we have to make our own function to create a new playlist.

def user_playlist_create(sp,
                         username,
                         playlist_name,
                         description='',
                         public=True):
    data = {
        'name': playlist_name,
        'public': public,
        'description': description
    }
    return sp._post("users/%s/playlists" % (username, ), payload=data)['id']

If you run the following code, you will be directed to another website (the one you provided as a callback) and be asked to copy paste the full URL in order to authenticate yourself with Spotify. This is necessary because we are using the API to create and modify your playlists.

token = util.prompt_for_user_token(username, scope, client_id, client_secret, redirect_uri)
sp = spotipy.Spotify(token)
playlists = sp.user_playlists(username)
playlist_ids = [playlist['id'] for playlist in playlists['items'] if playlist['name'] == playlist_name]
if len(playlist_ids) == 0:
    user_playlist_create(sp, username, playlist_name)
else:
    playlist_ids = playlist_id

The final step is to iterate over the Spotify IDs we obtained earlier and add them to the new playlist. The Spotify API limits us to only being able to add 100 tracks at a time.

tracks = []
replace = True
for id in ids:
    tracks.append(ids[id])
    if len(tracks) == 100 or id == len(ids)-1:
        if replace:
            sp.user_playlist_replace_tracks(username, playlist_id, tracks)
            replace = False
        else:
            sp.user_playlist_add_tracks(username, playlist_id, tracks)
            tracks = []

Et voilà! You should now have a playlist in Spotify containing your library of MP3s.

The full code is available in my GitHub repository

teticio/Deej-A.I.

You can't perform that action at this time. You signed in with another tab or window. You signed out in another tab or…

github.com