Specifications
View full project on Github Muse was created using:- Python Flask backend
- Spotify API for music attirbutes and recommendations
- YouTube API for video suggestions
- Material-UI and Jquery UI for front-end design
- Selenium for web scraping
Contributed by http://blog.nycdatascience.com/author/sjstebbins/","Spencer";)">Spencer James Stebbins. He enrolled in the NYC Data Science Academy 12-week full time Data Science Bootcamp program taking place between July 5th to September 23rd, 2016. This post is based on their third project - Web Scraping, due on 6th week of the program. The original article can be found here.
Github: Github
The average American spends 4 hours a day listening to music (SPIN) and 93% of Americans listen to music daily in some capacity (EDISON). Almost half of that listening is via AM/FM radio, but music streaming services are gaining a greater and greater market share and have even turned around the declining revenues of the music industry.
This success breeds competition among streaming providers to increase customer acquisition and retention by creating music recommendation algorithms that have just the right ingredients to serve up the tunes their customers crave.
Let’s first take a look at one of the pioneers in the music streaming service world: Pandora. Pandora was established in 2001 and has had a decreasing market share among newer streaming competitors.
Why is this? I personally believe it is because of the nature of the recommendations on Pandora. Pandora only allows users to create playlists based on one artist or song, which often doesn’t encapsulate the original mood the listener was looking for and yields poor suggestions that often are repeated after as little as 10 songs later. On the other end of the streaming service spectrum are platforms like 8 tracks, which do suggest playlists that a user may be interested in listening, but all of these playlists have to be manually created by users. Services like 8 tracks have more dynamic playlists than Pandora, but again these services rely on users to actively create playlists and then suggesting the correct playlists to users. Somewhere in the middle of the human editorial and algorithmically generated playlist spectrum of music streaming services is Spotify. This compromise is one of the main reasons Spotify has the largest market share as of March 2016. Spotify uses a combination of collaborative filtering and other playlist statistics to suggest very poignant songs to users that feel like that music guru friend you trust is making the suggestion. Spotify's collaborative filtering works by analyzing a users' playlist and finding other users' playlists that have the same songs, but maybe a a few more, and then will suggest those additional songs to the original user. Spotify not only uses this method for song suggestion, but also weights songs based on whether a user favorited it and listened to it many times following the initial like or even when a user is suggested a song and skips it within the first minute. Because of this combination of collaborative filtering and certain statistics tracking, Spotify can suggest extremely accurate song recommendations that feel eerily familiar. As great as Spotify's recommendation engine is, what if there was way to build upon its already impressive algorithm and to suggest songs that are even more playlist specific.
The general concept behind Muse is simple. Where Pandora suggests songs based on a single seed artist or song, Muse takes into account an entire playlist, the attributes of each song within a playlist, and the play counts of each song to form a better query on the new Spotify API endpoint for recommendations. In early 2014, Spotify acquired EchoNest; the industry’s leading music intelligence company, providing developers with the deepest understanding of music content and music fans. Through the use of Spotify's API, which now allows access to these EchoNest services, developers can provide seed information and target attributes in a query and the API will send back recommended songs in its response. This EchoNest API endpoint is the backbone of many suggestive services like Shazam, Pandora etc.., As you can probably imagine, choosing the best query parameters is thus very important and indeed this is what determines the accuracy and relevance os the recommendations the API responds with. As stated previously, Muse optimizes these query parameters to be as representative of an entire playlist as possible and therefore bring back recommendations that are more relevant. So how does Muse do it?
Upon login via Spotify oAuth, users automatically see their local iTunes playlists and Spotify playlists on the left-hand column. If for some reason, their iTunes Library XML file is not at a standard location on their computer, users can specify the correct path for Muse to look for the file.
From here, a user can click on a playlist, which is where the magic of Muse takes place through the following algorithmic process which I have appended a diagram below followed by bullet points to explain each step in greater detail:
# API: get collected song attributes attributes_api_endpoint = SPOTIFY_API_URL + "/audio-features?ids=" + ",".join(seed_data['song_ids']) attributes_response = requests.get(attributes_api_endpoint, headers=GLOBAL['authorization_header'])
seed_data['attributes'] = json.loads(attributes_response.text)['audio_features']
library_tracks = [] for i,attributes in enumerate(seed_data['attributes']): library_tracks.extend(repeat(attributes, seed_data['play_counts'][i]))
# create phantom_average_track phantom_average_track = {} target_attributes = ['energy','liveness','tempo','speechiness','acousticness','instrumentalness','danceability','loudness']
for attribute in target_attributes:
phantom_average_track[attribute] = sum(track[attribute] for track in library_tracks) / len(library_tracks)
# get seed_distances from phantom_average_trackseed_distances = [phantom_average_track.values()] + playlist_tracks_attributesdist = DistanceMetric.get_metric('euclidean')
distances = dist.pairwise(seed_distances)[0]
seed_data['distances'] = distances[1:len(distances)]
# get attributes of the 5 closest tracks to the phantom_average_track
seed_indexes = seed_data['distances'].argsort()[:5]
seed_songs = [seed_data['song_ids'][i] for i in seed_indexes]
seed_artists = [seed_data['artist_ids'][i] for i in seed_indexes]
# get target attributes from phantom_average_track (roudn to two decimals)
target_energy = str(round(phantom_average_track['energy'],2))
target_liveness = str(round(phantom_average_track['liveness'],2))
target_tempo = str(round(phantom_average_track['tempo'],2))
target_speechiness = str(round(phantom_average_track['speechiness'],2))
target_acousticness = str(round(phantom_average_track['acousticness'],2))
target_instrumentalness = str(round(phantom_average_track['instrumentalness'],2))
target_danceability = str(round(phantom_average_track['danceability'],2))
target_loudness = str(round(phantom_average_track['loudness'],2))
# API: get recommended tracks data based on seed and target valuesrecommendations_api_endpoint = SPOTIFY_API_URL + "/recommendations?seed_artists=" + ",".join(seed_artists) + "&target_energy=" + target_energy + "&target_liveness=" + target_liveness + "&target_tempo=" + target_tempo + "&target_speechiness=" + target_speechiness + "&target_acousticness=" + target_acousticness + "&target_instrumentalness=" + target_instrumentalness + "&target_danceability=" + target_danceability + "&target_loudness=" + target_loudness + "&limit=20"recommendations_response = requests.get(recommendations_api_endpoint, headers=GLOBAL['authorization_header'])Once the Muse algorithm has completed, users can then click on any recommended song. The chosen song's title and artist name will then be used to query YouTube's API and return and then play the top related video. Users can also click the play button which will play both playlist tracks and recommendations in order by relevance score.
recommendation_data['data'] = json.loads(recommendations_response.text)['tracks']
# set recommended track title and ids
for track in recommendation_data['data']:
# Dont add duplicate recommendations or songs already in your playlist
if track['id'] not in recommendation_data['ids'] and track['id'] not in seed_data['song_ids']:
recommendation_data['titles'].append(track['name'])
recommendation_data['artists'].append(track['artists'][0]['name'])
recommendation_data['ids'].append(track['id'])
recommendation_data['images'].append(track['album']['images'][0]['url'])
# API: get collected tracks attributes
attributes_api_endpoint = SPOTIFY_API_URL + "/audio-features?ids=" + ",".join(recommendation_data['ids'])
attributes_response = requests.get(attributes_api_endpoint, headers=GLOBAL['authorization_header'])
recommendation_data['attributes'] = json.loads(attributes_response.text)['audio_features']
# create tracks with just the float variables
recommendation_track_attributes = []
for track in recommendation_data['attributes']:
track_float_values = {}
for attribute in target_attributes:
track_float_values[attribute] = track[attribute]
recommendation_track_attributes.append(track_float_values.values())
# get recommendation_distances from phantom_average_track
recommendation_distances = [phantom_average_track.values()] + recommendation_track_attributes
dist = DistanceMetric.get_metric('euclidean')
distances = dist.pairwise(recommendation_distances)[0]
recommendation_data['distances'] = distances[1:len(distances)]
# sort recommendation_data object based on recommendation_data distances
sorted_recommendation_indexes = recommendation_data['distances'].argsort()[:len(recommendation_data['distances'])]
for key in recommendation_data.keys():
if recommendation_data[key] != []:
recommendation_data[key] = [recommendation_data[key][i] for i in sorted_recommendation_indexes]
# if making request for billboard playlistif request.args.get('day', None, type=str) != None:from selenium import webdriver
from selenium.webdriver.common.keys import Keys
driver = webdriver.Chrome(executable_path="./chromedriver")
driver.get("http://www.umdmusic.com/default.asp?Chart=D")
elem = driver.find_element_by_name('ChDay')
elem.clear()
elem.send_keys(request.args.get('day', 0, type=str))
elem = driver.find_element_by_name('ChMonth')
elem.clear()
elem.send_keys(request.args.get('month', 0, type=str))
elem = driver.find_element_by_name('ChYear')
elem.clear()
elem.send_keys(request.args.get('year', 0, type=str))
driver.find_element_by_name('charts').submit()
rows = driver.find_elements_by_tag_name('tr');
rows_data = []
for row in rows[15:40]:
cells = row.find_elements_by_tag_name('td')
artist = cells[4].get_attribute("innerHTML").split("</b>")[0].split('<b>')[1].rstrip()
title = cells[4].get_attribute("innerHTML").split("<br>")[1].rstrip()
plays = int(cells[3].get_attribute("innerHTML").strip())
rows_data.append([artist,title,plays])
playlist = rows_data
# extract playlist track data
for track in playlist:
getSong(track, False)
Posted 1 March 2021
© 2021 TechTarget, Inc.
Powered by
Badges | Report an Issue | Privacy Policy | Terms of Service
Most Popular Content on DSC
To not miss this type of content in the future, subscribe to our newsletter.
Other popular resources
Archives: 2008-2014 | 2015-2016 | 2017-2019 | Book 1 | Book 2 | More
Most popular articles
You need to be a member of Data Science Central to add comments!
Join Data Science Central