Category Archives: Tech

Insights from 500,000 Deezer playlists using Google’s BigQuery

A few days ago, Warner Music acquired, and as Techcrunch pointed out, one reason can be be its data, and the related insights.

But, what can we learn from such a dataset? Well, a lot actually: Discovering top-tracks, building content-based recommendations, mining new trends, and finding influencers to target during album releases. This can be invaluable for a record label or an artist, and it’s no surprise that compagnies like Musicmetric or The Next Big Sound tackle it from the analytics perspective, while Gracenote or The Echo Nest focus on data, recommendations of user profiling.

To prove some of those points, I’ve run a small experiment using 500,000 playlists from Deezer, together with Google’s BigQuery infrastructure.

The setup

Analyzing playlists is not a new thing, and you could read about various Big Data architectures such as Spark at Spotify, from the music discovery standpoint. I’ve used Google’s BigQuery in order to quickly get insights without setting-up my own stack. As I’ve experimented with it in the past, it was a good time to try with my own dataset.

With a few Python scripts, here are the steps to setup the experiment. [Update 2014-10-29: The scripts, as well as links to the dataset, are now available on Github]:

– First, gather about 500,000 playlists from the Deezer API [1], using a threaded crawler, randomly picking playlists with ID between 1 and 10M, for a total of 9.7Go of JSON data;

– Then, prepare the playlists for Google’s BigQuery, concatenating the 500K original files into 9 gzip-ped JSON files ([1-9].json.gz), and uploading them to Google Cloud Storage, for a total of 1Go;

– Finally, defining a schema to map the data to tables, and loading it from Cloud Storage to BigQuery. It took only 12 seconds to load the 1Go of compressed data, for a total of 510,187 playlists, with 12M tracks (and 900K distinct ones) in total.

Defining a schema to load JSON data into BigQuery
Defining a schema to load JSON data into BigQuery

Content recommendations

With such an amount of data, and not only in the music domain, it’s relatively easy to build a content recommendation platform, based on the “If you like X you’ll like Y”. Using this simple SQL query, you can find the top-related artist for anyone in the dataset:

  FLATTEN([Playlists.Playlists], a
  EACH FLATTEN([Playlists.Playlists], b
ON ==
WHERE == <artist_id>
  AND != <artist_id>

For instance:

### Related to Rihanna
* Britney Spears
* Beyoncé
* The Black Eyed Peas
* David Guetta
* Justin Timberlake
### Related to Daft Punk
* Justice
* Muse
* David Guetta
* Moby
* The Chemical Brothers
### Related to Agnostic Front
* Blood for Blood	 
* Hatebreed	 
* Dropkick Murphys	 
* Helga Hahnemann	 
* Bad Religion

A good way to bootstrap an artist-based radio station!

Going further, building a song-to-song recommendations algorithm is not really complicated neither. Here are for instance the most frequent tracks played together with “Harder Better Faster Stronger”, which are not by Daft Punk.

### Related to Harder Better Faster Stronger, non Daft-Punk
* David Guetta: Cozi Baby When The Light
* Laurent Wolf: No Stress (Radio edit)
* David Guetta: Love Don't Let Me Go (Original Edit)
* David Guetta: Love Is Gone (Radio Edit Rmx)
* Mika: Relax, Take It Easy

Top artists and tracks, popularity, and more

Besides recommendations, an obvious use-case is to identify top-tracks or top-artists. For instance, here are the top-tracks for some artists based on their popularity in the full dataset.

### Most popular tracks from Daft Punk
* Around The World
* Harder Better Faster Stronger
* Da Funk
* Technologic
* Around The World / Harder Better Faster Stronger
### Most popular tracks from Weezer
* Island In The Sun	 
* My Name Is Jonas	 
* Beverly Hills	 
* Buddy Holly	 
* Hash Pipe

Combining with temporal attributes (not available here unfortunately, more on this later), one could also identify how fast a track progress from its release to a top-X.

Regarding top-artist, the easy way is to simply track the top-ones in the list, with the number of tracks they have on the full dataset (900K distinct ones).

### Top-artists by number of tracks
* Linkin Park (65,415)
* Muse (59,550)
* U2 (54,688)
* Rihanna (53,354)
* Queen (51,717)

But another way is to sort artists by number of playlists they appear in

SELECT COUNT(id) as c,,
 SELECT id,,
 FROM [Playlists.Playlists]
 GROUP EACH BY 1, 2, 3

Surprisingly, the most popular artist it then a Karaoke cover band, included in 23,993 of the 900K playlists, more than Rihanna or U2!

### Top-artists by playlists appearance
* Studio Group (23,993)
* Rihanna (23,398)
* U2 (17,860)
* Queen (17,463)
* Linkin Park (17,232)

Another interesting insight – that is not surprising if you’re into music discovery and the long tail – concerns the way popular artists outweight less popular ones in their distribution: 43346 artists, i.e. about a third of them, appear only once in the dataset, and 37864 appear between 2 and 10 times.

Trends, influencers and targeted recommendations

Finally, what about identifying trends and influencers?

One approach would be to identify which artists jump from top-1000, to top-100 and event to top-50 in a given timeframe. Unfortunately, Deezer playlists do not contain any temporal information. Yet, coming back to the starting point of this post, that’s definitely something valuable that WMG could get from

They could then identify and target influencers, for instance users who’re among the top 10% to listen to them, which could be a goldmine when marketing new artists or releases.

Definitely, this acquisition makes sense considering the trends in the industry, and the recent consolidation around various services (Rhapsody, etc.), most of them focusing on the the analytics / discovery domain. An domain which matters for artists and labels, but also for streaming services and data-providers, providing them with valuable insights and ways to beat competitors, ensuring their users are given the best listening experience they could possibly expect, depending on who they are, and how they listen to music.

If you have an interesting dataset and want to run analytics or recommendation experiments, let’s get in touch! And if you’re mostly interested in the discovery / recommendation part, have a look at our turn-key solution at

[1] I used Deezer and not Spotify, even though is Spotify-based, as there’s no rate limiting on their API for playlist search and retrieval (whether it’s a bug or a feature is another topic for discussion)

How mood and tempo can influence artist discovery?

If you log-in to Deezer, Spotify, YouTube, etc. to listen to a particular artist, you can simply pick their top-tracks. Yet, while they’re the most popular, they are not necessarily the ones providing a good understanding of their style, or – on the opposite – might not surprise you enough. Plus, depending on which platform you use, unexpected results can appear!

Using the Gracenote API, here’s an experiment using their mood and tempo detection features to answers questions like “What a band generally plays”, “How eclectic an album is” or “How can I listen to something unexpected my favorite artist”.

You Can’t, You Won’t And You Don’t Stop

First, let’s try to understand how eclectic an artist is: do they tend to play diverse style, or do they stick to common patterns? In the first case, is that something we can experience through a single album, or did they simply switch genres over the years?

Take for instance the Beastie Boys, who played hardcore punk in their early years, before becoming hip-hop stars in the 90’s. If you look as their old recordings, compiled in “Same old bullshit“, you’ll find the following top-3 tempos and moods.

Beastie boys - Some Old Bullshit
# Tempo
- Medium Tempo: 8 (57.14%)
- Fast Tempo: 6 (42.86%)
- Medium Fast: 5 (35.71%)
# Mood
- Aggressive: 8 (57.14%)
- Cool Confidence: 4 (28.57%)
- Heavy Triumphant: 4 (28.57%)

While a more recent record like “Hello Nasty” seems to leave away the aggressive parts of their early years, even though the defiant mood is definitely here to stay!

Beastie boys - Hello Nasty
# Tempo
- Medium Tempo: 21 (95.45%)
- Medium Fast: 12 (54.55%)
- 90s: 7 (31.82%)
# Mood
- Attitude / Defiant: 7 (31.82%)
- Defiant: 7 (31.82%)
- Cool Confidence: 6 (27.27%)

Looking at individual albums, there are interesting patterns as well. London Calling from The Clash combines elements of Punk-Rock, Jazz, Ska, R&B and more. Consequently, lots of different moods are covered in the same album:

The Clash - London Calling
# Mood
- Rowdy: 5 (26.32%)
- Excited: 4 (21.05%)
- Ramshackle / Rollicking: 4 (21.05%)
- Cool: 4 (21.05%)
- Loud Celebratory: 3 (15.79%)
- Casual Groove: 3 (15.79%)
- Carefree Pop: 2 (10.53%)
- Upbeat: 2 (10.53%)
- Empowering: 1 (5.26%)
- Cool Confidence: 1 (5.26%)

Fear of the Dark

On the other hand, some bands didn’t significantly evolved during decades. Running the same test on the first and most recent studio albums of Iron Maiden (“Iron Maiden” and “The Final Frontier“) shows that tempo remains the same, while the Defiant mood is still a strong part of their style, 30 years after their first release.

Iron Maiden - Iron Maiden
# Tempo
- Medium Tempo: 6 (75.00%)
- Medium Fast: 5 (62.50%)
- 100s: 4 (50.00%)
# Mood
- Defiant: 5 (62.50%)
- Hard Positive Excitement: 3 (37.50%)
- Hard Dark Excitement: 2 (25.00%)
Iron Maiden - The Final Frontier
# Tempo
- Medium Tempo: 6 (60.00%)
- Medium Fast: 4 (40.00%)
- Fast: 3 (30.00%)
# Mood
- Defiant: 6 (60.00%)
- Heavy Brooding: 6 (60.00%)
- Brooding: 2 (20.00%)

Finally, for all their studio albums (143 tracks), we have:

Iron Maiden
# Tempo
- Medium Tempo: 88 (61.54%)
- Medium Fast: 55 (38.46%)
- Fast Tempo: 53 (37.06%)
- Fast: 44 (30.77%)
- Medium: 31 (21.68%)
- 100s: 26 (18.18%)
- 80s: 21 (14.69%)
- 90s: 14 (9.79%)
- 130s: 12 (8.39%)
- 140s: 12 (8.39%)
# Mood
- Defiant: 72 (50.35%)
- Heavy Brooding: 33 (23.08%)
- Hard Dark Excitement: 30 (20.98%)
- Brooding: 26 (18.18%)
- Rowdy: 18 (12.59%)
- Confident / Tough: 10 (6.99%)
- Hard Positive Excitement: 9 (6.29%)
- Aggressive: 9 (6.29%)
- Heavy Triumphant: 7 (4.90%)
- Alienated / Brooding: 7 (4.90%)

This becomes interesting in terms of discovery. if you want to listen to typical Maiden, just pick a mid-tempo track with a defiant mood: “The Trooper” is one of them. On the other hand, let’s imagine you’re into something more obscure, pick an Alienating 90s-BPM track, like “Mother Russia“.

If you want to run similar experiments on your favorite albums, simply set-up an account with the Gracenote API, and get the small Python class I’ve build for the analysis.

Last night a DJ saved my life: What if Twitter could be your own DJ?

While the Twitter music app eventually failed, it’s still clear that people use Twitter’s data stream to share and/or discover new #music. Thanks to Twitter cards, a great thing is that you can directly watch a YouTube video, or listen to a SoundCloud clip, right from your feed, without leaving the platform. But what if Twitter could be your own DJ, playing songs on your request?

Since it’s been a few month since I enjoyed my last Music Hack Day – oh, I definitely miss that! – I’ve hacked a proof of concept using the seevl API, combined with the Twitter and the YouTube ones, to make Twitter acts as your own personal DJ.

Hey @seevl, play something cool

The result is a twitter bot, running under our @seevl handle, which accepts a few (controlled) natural-language queries and replies with an appropriate track, embedded in a Tweet via a YouTube card. Here are a few patterns you can use:

Hey @seevl, play something like A

To play something that is similar to A. For instance, tweet “play something like New Order”, and you might get a reply with a Joy Division track in your feed.

Hey @seevl, play something from L

To play something from an artist signed on label L (or, at least, that used to be on this label at some stage)

Hey @seevl, play some G

To play something from a given genre G

Hey @seevl, play A

To simply play a track from A.

By the way, you can replace “Hey” by anything you want, as long as you politely ask your DJ what you want him to spin. Here’s an example, with my tweet just posted (top of the timeline), and a reply from the bot (bottom left).

Twitter As A DJ
Twitter As A DJ

A little less conversation

As it’s all Twitter-based, not only you can send messages, but you can have a conversation with your virtual DJ. Here’s for instance what I’ve sent first

And got this immediate reply – with the embedded YouTube video

Followed by (“coo” meant to be “cool”)

To immediately listen to Bettie Smith in my stream

It’s kind of fun, I have to say, especially due to the instantaneous nature of the conversation – and it even reminds IRC bots!

Unfortunately, it’s likely that the bot will reach the API rate-limit when posting Tweets (and I’m not handling those errors in the current MVP), so you may not have a reply when you interact with it.

Twitter As A Service?

Besides the music-related hack, I also wanted to showcase the growth of intelligent services on the Web – and how a platform like Twitter can be part of it, using “Twitter As A Service” as a layer for an intelligent Web.

The recently-launched “Buy button” is a simple example of how Twitter can be a Siri-like interface to the world. But why not bringing more intelligence into Twitter. What about “Hey @uber, pick me in 10 minutes”, and using the Tweet geolocation plus a Uber-API integration integration to directly pick – and bill – whoever #requested a black car? Or “Please @opentable, I’d love to have sushis tonight”, and get a reply with links to the top-rated places nearby, with in-tweet booking capability (via the previous buy button)? The data is there, the tools and APIs are there, so…

Yes, this sound a bit like what’s described in the seminal Semantic Web article by Tim Berners-Lee, James Hendler and Ora Lassila. Maybe it’s because we’re finally there, in an age where computers can be those social machines that we’re dreaming about!


Google I/O 2014 Recap: Android, Knowledge Graph and more

Back in April, I was lucky enough to get a partner invite for Google I/O. Coupled with a stay at the Startup House, a co-working / housing space (ideal when you’re jet-lagged at 4AM and want a proper desk to code a few meters away from your bed) located only one black away from Moscone, I’m very glad I’ve made the trip to my first I/O!

Google I/O after hours party in Yerba Buena Gardens
Google I/O after hours party in Yerba Buena Gardens

Here are a few highlights, in a conference which clearly confirmed the role of (1) Android as a global OS, and (2) the Knowledge Graph as a hub for everything AI-related, at Google and beyond.

Most of the videos of the sessions are online on Google Developers’ YouTube channel, and I’ve tried as much as possible to link to the relevant ones below.

Android – One OS to rule them all

While I’m not (yet) a full-time Android user (let alone a developer), it’s now clear that it goes far beyond a phone-only OS. With the introduction of AndroidWear, AndroidCar, and AndroidTV during the keynote, the OS is now the core of all hardware-related initiatives at Google.

With common SDKs and API to interact with, wherever the OS is used, this makes the life of developers much easier when building cross-devices products. Relying on a single ecosystem is also of importance when building an engineering team, and I guess it may also be an decision factor for small start-ups when deciding which market to tackle.

Last but not least, the improvements in the OS itself, including a new runtime – see “What’s new in Android“, makes it even faster then before, a plus for embedded systems of all sorts.

Google’s Knowledge Graph – From search to voice controls and app indexing

So far, Google’s Knowledge Graph has been used mostly in search-related projects, including the snippets you can see when searching for entities such as places, people, music and movies on Google. Several sessions-cases showed how it is now used as a central hub for AI-related projects and products.

Search results getting richer with Google's Knowledge Graph
Search results getting richer with Google’s Knowledge Graph

Using Android TV, you can ask your TV (literally, by talking to your Android watch) to suggest an Oscar-awarded movie from 2000, or who’s casting in X or Y – all answers coming from the Knowledge Graph.  In the first case, results can be bought from Google play, another nice piece of integration between the different offerings from the company.

Another interesting case is the use if the Knowledge Graph to connect the dots between previously isolated silos, namely mobile apps. One of the common issue with those apps is their lack of links and outside-world connections, in spite of recent efforts such as Facebook-supported App Links. In the session “The Future of Apps and Search“, a combination of app indexing, JSON-LD and Knowledge Graph was presented to directly link into an app from, e.g., Google’s search results or autocompletion-search in Android, as well as launching actions from search results – e.g. playing a track in Spotify, a use-case announced a few days before I/O – using the new actions I’ve recently blogged about.

As an early JSON-LD enthusiast, and working on related technologies for almost a decade, you can’t imagine how excited I was when I saw this in something used by million of users! Let’s bet that’s only the beginning, and that new verticals will follow.

Spotify, with real bits of JSON-LD inside
Spotify, with real bits of JSON-LD inside

Google Cloud and DataFlow – Smarter, faster, easier

I’ve been recently using Google Cloud infrastructure in several projects (from GAE to Google Prediction – watch “Predicting the future with the Google Cloud Platform” for more about their ML infrastructure), and a few announcements made my day here:

  • Cloud Debugger – making DevOps and back-engineers more efficient when debugging code. You can now add breakpoints, including conditional ones (e.g. user=X) in your live app, without jeopardising its speed, and most important, without having to stop/restart/deploy anything. This means that code can be debugged on production servers with live data, and  without patching / tracing multiple boxes,  all in the comfort of your browser. A kind of New Relic on steroids, so big thumbs-up here!
  • Dataflow –  aiming to replace MapReduce, with a special focus on stream processing and scalability. A convincing use-case during the keynote was Twitter sentiment analysis, showing not only the simplicity of the interface, but also the orchestration of the services through the API. The service is not open yet, but you can check “Big data, the Cloud Way: Accelerated and simplified” to know more. I’m looking forward to try it on a few stream processing for content discovery!
Dataflow - Coming soon to a theater near you
Dataflow – Coming soon to a theater near you

The Web platform – Polymer, WebRTC and HTML5

Whether you’re accessing if from your desktop, phone, or now, your watch or Glass, there’s only one Web. And far from just websites, it can be used as a platform to build powerful apps, as many session focused on:

  • Polymer / Web components – or how to build your own HTML tags for quick prototyping and distribution. As an AngularJS user, I was immediately convinced by its two-way data bindings. Polymer (“Polymer and the Web Components revolution“) adds another elegant layer to the Web, allowing to define tags that are then rendered as full components. Imagine a <my-recent-tracks> tag that will automatically render the top-tracks you’ve played on all your favorite music platforms. Well, that’s exactly what Polymer can do;
  • HTML5 – the Web as a platform, from different perspectives. In particular, “HTML5 everywhere: How and why YouTube uses the Web platform” was a great intro talk to understand the benefits of HTML5 from different points of view: UX, scalability, cross-platform. Recommended to anyone who still have doubts about it.
  • WebRTC – building real-time systems in your browser. “Making music mobile with the Web” not only showed how to transform your Macbook into a Marshall JCM2000 with Soundtrap, but also how WebRTC was used for real-time collaborative music creation, with very low latency.

Wearables – It’s all about the UX

Then, a big part of the conference: Glass and smart watches. I often thought that most of the effort to build those was put in the hardware and OS side of things (reducing footprint, optimising battery life, gathering sensor data, etc.).

While some talks clearly focused on this (with some nice hacks such as back-camera for biking in “Innovate with the Glass Platform“, and football-related ones), I was impressed by “Designing for wearables“, which focused on the role of UX to make sure wearables are devices that let you connect, and not interfere with the world as a phone does.

Paris Saint-Germain represents at I/O 2014!
Paris Saint-Germain represents at I/O 2014!

Showing some early prototypes and discussing how and why Glass / wear notifications are so minimalistic, this was an inspiring session for anyone interested in UX and products. A must-watch for developers and entrepreneurs aiming to  build appealing user-facing products, whether it’s for wearables or more standard devices.

Google+ – Or how Google missed the spot

I may have missed it from other sessions, but none of the ones I’ve been to mentioned Google+. I was not expecting much about it at I/O since the departure of Vic Gundotra, and Sergey Brin’s statements, as well as a plus-free agenda. Still, that was a big surprise, as it would have been a no-brainer use-case in many talks.

Using dataflow to process streams from your social circles? Not a word about it. Using Glass to see what your friends are posting? Nope. Alerts on your Google TV to binge watch some TV-show together with your friends home 5000km away? Neither.

G+ could have been an awesome social network – or should I say a social platform. Combined with Freebase / Knowledge Graph, linking people to things they like, possibilities would be endless in terms of profiling, discovery and more. Yet, with a poor API, a lack of portability that could have differentiate it from its main competitors from Day 1 (imagine PubSubHubbub / WebSocket as an easy way to integrate G+ into other platforms), I’m sad they’ve missed the spot.

Up to 2015?

Overall, a great conference, in spite of the queue mismatch that forced me to miss about 30min of the keynote, queueing twice around the Moscone, a real shame when you travel 8000km for such an event.

I particularly enjoyed the focus around the 3D topics (Design, Develop, Distribute), the diversity of talks (watch the awesome “Robotics in a new world – Presented by Women Techmakers“), and the accessibility of the DevRel team between sessions at the Developer sandboxes.

Looking forward to the next one!

Enhancing the Freebase/YouTube API mappings… using Freebase and YouTube

The YouTube V3 API is one of those thing you’ll definitely fall in love with, if you’re into real-world Semantic Web applications, a.k.a “Things, not words”. With its integration with Freebase – the core of Google’s Knowledge Graph -, it’s a concrete and practical showcase of the Web as a distributed database of things and relations, and not only keywords and links between pages.

YouTube Data API v3 with Freebase mappings: the good, the bad, and the ugly

While relatively simple to use, it provides advanced features to let developers built data-driven applications. On the one hand, it allows to search for videos by Freebase entities, as you can try in a recent demo from YouTube themselves. On the other hand, it returns which entities are used/described in a video.

Yet, identifying topics from videos is a difficult task, and if you’re not convinced (and interested in all things Machine Learning related), check the following Google I/O talk from last year.

Google I/O talk on Semantic Annotations of YouTube videos, featuring our own seevl
Google I/O talk on Semantic Annotations of YouTube videos, featuring our own seevl

While the API generally delivers correct information, it sometimes requires a bit of work to automatically uses its results in a music-related context (to be exact, the issues might be in the underlying data, rather then on the API itself):

  • In some cases, it provides multiple artists – which is often correct, e.g. Blondie and Debby Harry but makes difficult to find who’s the main one, as the API delivers them at the same level (topicIds).
  • In others, it returns empty results, like this (recently deleted, maybe as part of the YouTube music limbo?) Nirvana video.
  • Finally, when an awesome band like Weezer decides to cover Coldplay, both bands are returned by the API.

This is something we’ve improved to build our former seevl for YouTube plug-in, and while it’s not available anymore, as we’ve moved away from consumer-facing products to refocus on a B2B, turn-key, music discovery solution, I’ve decided to open source the underlying library to find who’s playing and what (yes, that’s music only) in any YouTube videos.

Introducing youplay – who’s and what’s playing in a YouTube music video

The result is youplay, available on PyPI and github, a MIT-licensed python library that works as an enhancement on top of the YouTube Data API v3 to automatically identify who’s and what’s playing in a music video. It uses different heuristics, data look-up, and more to find the correct artists if multiple ones are returned (unless they’re all playing in the video, like this RHCP + Snoop Dogg version of Scar Tissue), to filter ambiguous ones, or to find the correct artist and track if the API doesn’t deliver anything.

Here’s an example

#!/usr/bin/env python
import youplay

(artists, tracks) = youplay.extract('0UjsXo9l6I8')
print '%s - %s' %(', '.join([ for artist in artists]), tracks[0].name)

(artists, tracks) = youplay.extract('c-_vFlDBB8A')
print '%s - %s' %(artists[0].name, tracks[0].name)

will return

(env)marvin-7:youplay alex$ python
Jay-Z, Alicia Keys - Empire State of Mind
Dropkick Murphys - Worker's Song

The tool is also packaged with a command line script returning JSON data for easy integration into non-python apps.

(env)marvin-7:youplay alex$ ./bin/youplay ebBjGp7QOGc
  "tracks": [
      "mid": "/m/0dt1kzp", 
      "name": "For My Family"
  "artists": [
      "mid": "/m/022tqm", 
      "name": "Agnostic Front"

With a little help from my friends

The fun part? All the look-ups (if any) are using the Freebase and YouTube API themselves, such as:

  • Finding the top-tracks of an artist from Freebase and matching it with the video name if the original API call when it returns only artist names;
  • Identifying if a song has been recorded by multiple artists;
  • Looking-up related YouTube videos to identify what’s the common topic between all of them, and guess the current artist of a video with no API-results.

Isn’t it a nice way to bridge the gap?

Even though I hope the API will be useful to other music-tech developers, I also wish that it soon becomes obsolete, as Google’s Knowledge Graph, and other structured-data efforts on the Web, keep growing on the Web in terms of AI, infrastructures and APIs/toolkits – making more and more easier every day to build data-driven applications (if only I had this 10years ago when I started digging into the topic!).

Oh, and I’m attending Google I/O next week, and if you’re working on similar projects, ping me and let’s have a chat!

echoplot – Plot song loudness using the EchoNest API

As I’m working on the second part of my analysis of the Rolling Stone 500 Greatest songs of all time, I needed to draw the loudness representation of various songs extracted from the EchoNest API. I’ve been using mathplotlib with pyechonest, and as the process is quite repetitive, I’ve packaged everything as echoplot, so you can easily plot song loudness using the EchoNest API.

pip install echoplot

Once you’ve setup your EchoNest API key as an environemnt variable ECHO_NEST_API_KEY, just run echoplot

marvin-7:~ alex$ echoplot -h
usage: echoplot [-h] [-s START] [-e END] artist title

Plot loudness of a song using the EchoNest API.

positional arguments:
  artist                the song's artist, e.g. 'The Clash'
  title                 the song's title, e.g. 'London Calling'

optional arguments:
  -h, --help            show this help message and exit
  -s START, --start START
                        start analysis at a given time (seconds)
  -e END, --end END     end analysis at a given time (seconds)

For example

marvin-7:~ alex$ echoplot 'The Clash' 'London Calling'
The Clash - London Calling
The Clash – London Calling


marvin-7:~ alex$ echoplot Radiohead 'Paranoid Android'
Radiohead - Paranoid Android
Radiohead – Paranoid Android

The plot also displays the different segments of the song (chorus, verse, etc.), also provided by the EchoNest API. Echoplot’s source code is on github and the package is on PyPI.

The new actions: What they mean for personalisation on the Web

The initiative just announced the release of a new action vocabulary. As their blog post emphasises:

The Web is not just about static descriptions of entities. It is about taking action on these entities.

Whether they’re online or offline, publishing those actions in a machine-readable format follows TimBL’s “Weaving the Web” vision of the Web as a social machine.

It’s even more relevant when the online and the offline world become one, whether it’s through apps (4square, Uber, etc.) or via sensors and wearable tech (mobile phones, Glass, etc.). A particular aspect I’m interested in is how those actions can help to personalise the Web

The rise of dynamic content and structured data on the Web

This is not the first time actions – at least online ones –  are used on the Web: think of Activity StreamsWeb Intents, as well as SIOC-Actions that I’ve worked on with Pierre-Antoine Champin a few years ago.

Yet, considering the recent advances on structured Web data (, Google’s Knowledge Graph, Facebook OpenGraph, Twitter cards…), this addition is a timely move. Every one can now publish their actions using a shared vocabulary, meaning that apps and services can consume them openly – pending the correct credentials and privacy settings. And that’s a big move for personalisation.

Personalising content from distributed data

Let’s consider my musical activity. Right now, I can plug my services into Facebook and use the Graph API to retrieve my listening history. Or query APIs such as the Deezer one. Or check my Twitter and Instagram feeds to remember some of the records I’ve put on my turntable. Yet, if all of them would publish actions using the new ListenAction type, I could use a single query engine to get the data from those different endpoints.

Deezer could describe actions using the following JSON-LD, and Spotify with RDFa, but it doesn’t really matter – as both would agree on shared semantics through a single vocabulary.

    "name":"The Clash"
} </script>

Ultimately, that means that every service could gather data from different sources to meaningfully extract information about myself, and deliver a personalised experience as soon as I log-in.

You might think that Facebook enables this already with the Graph API. Indeed, but data need to be in Facebook. This is not always the case, either because the seed services haven’t implemented – or removed – the proper connectors, or because you didn’t allow them to share your actions.

In this new configuration, I could decide, for every service I log-in, which sources it can access. Log-in to a music platform? Let’s access to my Deezer and Spotify profiles, where some Actions can be found. Booking a restaurant? Check my OpenTable ones. From there, those services can quickly build my profile and start personalising my online experience.

In addition, websites could decide to use background knowledge to enrich one’s profile, using vertical databases, e.g. Factual for geolocation data or our recently relaunched seevl API for music meta-data, combining with advanced heuristics such as such as time decay, actions-objects granularity and more to enhance the profiling capabilities (if you’re interested in the topic, check the slides of Fabrizio Orlandi’s Ph.D. viva on the topic) .

Privacy matters

This way of personalising content could also have important privacy implications. By selecting which sources a service can access, I implicitly block access to data that is non-relevant or too private for that particular service – as opposed to granting access to all my content.

Going further, we can imagine an privacy-control matrix where I can select not only the sources, but also the action types to be used, keeping my data safe and avoiding freakomendations. I could provide my 4square eating actions (restaurants I’ve checked-in) to a food website, but offer my musical background (concerts I’ve been to) to a music app, keeping both separate.

Of course, websites should be smart enough to know which action they require, doing a source/action pre-selection for me. This could ultimately solve some of the trust issues often discussed when talking about personalisation, as Facebook’s Sam Lessin addressed in his keynote on the future of travel.

What’s next?

As you could see, I’m particularly interested in what’s going to happen with this new update, both from the publishers and the consumers point of view.

It will also be interesting to see how mappings could emerge between it and the Facebook Graph API, adding another level of interoperability in this quest to make the Web a social space.