24 - Discover YouTube music trends via Twitter

24 hours, 24 genres, 24 tracks: Discover YouTube music trends via Twitter

24’s idea is simple: Identify the top-24 tracks of the top-24 genres played via YouTube during the last 24 hours on Twitter.

24 - Heavy Metal tracks

24 – Heavy Metal tracks

It’s based on the Twitter + Freebase + BigQuery pipeline I’ve built to run last week-end (indeed, using only a subset of the full Twitter stream), and uses Bootstrap and AngularJS for the UI. While the data is mined in real-time, the page itself is refreshed every two hours.

Feedback? Comments? Say hello or check MDG’s portfolio for more. In the meantime, enjoy 24 at http://24.mdg.io, and check it out regularly for updates on its current MVP.

Screen Shot 2014-11-21 at 00.03.08

Sharing YouTube music on Twitter: Analytics using Freebase and BigQuery

Following my journey with Google Cloud, in particular BigQuery, I’m building a pipeline which mines Tweets containing YouTube videos, and maps those videos to Freebase in order to run various discovery / recommendations analytics and products experiment.

If you’re not yet familiar with it, Freebase is the core of Google’s Knowledge Graph and provides machine-readable, structured, information about a large number of entities, or “real-world things described on the Web”

 

To build this, I’ve been using the streaming APIs from both Twitter and BigQuery to get and save the data. In-between, a middleware parses the tweets and calls the YouTube API and Freebase’s Knowledge Graph to extract additional data from each, with a bit of memcached to avoir rate-limiting on those APIs.

The infrastructure started to run a few days ago, and I’ve gathered (when starting this write-up) 1.2M Tweets so far, for a total or 516,056 distinct videos, of which 345,410 have been linked to Freebase entities. Using this sample, I’ll now describe a few things that we can learn using this data in the context of music.

But first, why music videos? Not only because I’m big into music-related data science and engineering, but also because Music is the most shared category on the sample with 28.2% of the videos being in this category, followed by People and Blog (19.5%) and Entertainment ( 14.4). Another reason why YouTube Music Key definitely makes sense.

Popular videos: super fans or spammers?

One of the first query I’ve tried – focusing solely on BigQuery’s SQL capabilities – was to identify the most popular videos on Twitter, with their corresponding YouTube views (using data from the last YouTube API call).

SELECT
  tweet_youtube.youtube_id,
  tweet_youtube.youtube_title,
  COUNT(tweet_id) as num_tweets,
  MAX(tweet_youtube.youtube_views) as num_views,
FROM
  [Twitter.TwitterStream]
WHERE
  tweet_youtube.youtube_category_id = 10
GROUP EACH BY 1, 2
ORDER BY 3 DESC
Most popular videos in the dataset

Most popular videos in the dataset

I was surprised by the low difference between the number of tweets and views for some of them, so I’ve decided to measure the number of tweets per user for any video. Using another simple SQL query, we can easily identify videos that are self-promoted, or should I say spammed, on Twitter.

SELECT
  tweet_youtube.youtube_id,
  tweet_youtube.youtube_title,
  COUNT(DISTINCT(tweet_user_id)) as num_users,
  COUNT(tweet_youtube.youtube_id) as num_tweets,
  MAX(tweet_youtube.youtube_views) as num_views,
  CAST(COUNT(DISTINCT(tweet_user_id)) as float)
    /CAST(COUNT(tweet_youtube.youtube_id) as float) as ratio
FROM
  [Twitter.TwitterStream]
WHERE
  tweet_youtube.youtube_category_id = 10
GROUP EACH BY 1, 2
ORDER BY 6 ASC
Number of tweets vs views on YouTube

Number of tweets vs views on YouTube

On the other hand, limiting the first SQL query to allow only one tweet per video per user provides an easiest way to identify top-tracks based on their number of unique fans (whether or not some of them are spam accounts is another topic).

Popular videos (one tweet per user)

Popular videos (one tweet per user)

Entities: better than tags

By linking videos to entities, rather than doing simple tag or keyword extraction, much more meaning can be derived from Tweets. As every entity is typed, additional filtering can be applied using the type of each entity. For instance, we can adapt the previous query to find not the top-tracks, but the top artists (i.e. entities having a type music artist).

Popular artists (via Freebase mappings)

Popular artists (via Freebase mappings)

Going deeper, you can also find what are the most popular music genres in the dataset.

SELECT
  tweet_youtube.youtube_relevant_topic.topic_id,
  tweet_youtube.youtube_relevant_topic.topic_name,
  COUNT(DISTINCT tweet_youtube.youtube_id) as num_videos,
FROM
  [Twitter.TwitterStream]
WHERE
  tweet_youtube.youtube_relevant_topic.topic_type = '/music/genre'
GROUP EACH BY 1, 2
ORDER BY 3 DESC
Top-10 genres in the dataset

Top-10 genres in the dataset

From there, and using the same entity-filtering approach, we can build genre-specific top-10, as below for Heavy-metal.

Top Heavy-metal videos

Top Heavy-metal videos

 

User profiling, semantic advertisement, and more

Besides analytics, an obvious use-case of such approach is user profiling. When I was at DERI, we learned that a lot of valuable content for user profiling is mined from things that people link to, by extracting structured data from those links (in this current case, via the YouTube / Freebase mappings).

Using a similar process, Twitter users could be categorised more specifically thorough their music tastes, and get recommendations of artists to follow, videos to watch, or music to buy based on this data mined from external sources. This is definitely relevant in the context of upcoming Twitter feed update! On the other side of the spectrum, we can imagine that advertisers, or bands that want to promote themselves on Twitter, could use those signals for specific user-targeting – a constant struggle for music industry marketing.

As the pipeline is progressing, I’ll try to come up with some other interesting experiments, while I’m building a small hack / product in the meantime using this data, most likely combined with data from seevl, to be released soon.

@seevl’s DJ, Twitter, and the Semantic Web

As most of my side projects, “seevl DJ” started as a quick hack on a sunday afternoon. Yet, it has been quickly picked and featured on Fast Company and Hypebot, and also got some attention on Twitter itself.

With a little help from my friends

I’ve spend some time improving it so you can use additional commands, e.g. “a song by”, “play me something like”. In addition, it now uses the Freebase / YouTube mappings combined with the seevl API in order to find an artist’s videos (when using a genre / label / related query).

Last but not least, you can now use “/cc @user” and “for @user” in your Tweet to send a track to any of your friend, the music video being available directly on their feed through Twitter cards (Web and mobile).

Services, actions, and payments on Twitter

Thinking again about Twitter as an intelligent agent on the Web, let’s be bold and imagine this integrated with the buy / Stripe integration. While it’s now used to buy stuff, what about paying for services with it? “Hey @uber, bring @myfriend here”. “Hey @trycaviar, sushis for 6 please”. Both answering with an automated tweet embedding a Buy button so you can validate the order; and get your black car or food home within minutes. All through Twitter.

Natural Language Processing is one way to enable this, but another one is to pre-fill such “service-based tweets” so that users would just have to complete a few fields (e.g. number of people when messaging @opentable). This makes things much easier from the processing side, also providing a friction-less experience to users. Technically, the intelligence can be brought by schema.org actions, as I’ve wrote in the past, using JSON-LD as the supporting data serialisation.

A similar approach is used in Gmail (see for instance the Github integration). So, Twitter, what’s your next move to also embrace the Semantic Web?

Remove inactive Twitter followees with this tiny Python script

I recently reached the Twitter limit to add new followees so I’ve wrote a tiny Python script, Twitter Cleaner, to remove people who haven’t send anything for a number of days (30 by default) – and consequently be able to add new ones. It’s now available on github.

Twitter Cleaner

Twitter Cleaner

Note that it might conflict with the previous Twitter TOS if you unfollow too many people at once. However, it will happen only once if you put it into a daily crontab. It was safe in my case, but I can’t guarantee it will be in yours. You may also reach the API rate-limiting if you’ve too many followees.

It’t built using python-twitter, and is available under the MIT license.

 

Last night a DJ saved my life: What if Twitter could be your own DJ?

While the Twitter music app eventually failed, it’s still clear that people use Twitter’s data stream to share and/or discover new #music. Thanks to Twitter cards, a great thing is that you can directly watch a YouTube video, or listen to a SoundCloud clip, right from your feed, without leaving the platform. But what if Twitter could be your own DJ, playing songs on your request?

Since it’s been a few month since I enjoyed my last Music Hack Day – oh, I definitely miss that! – I’ve hacked a proof of concept using the seevl API, combined with the Twitter and the YouTube ones, to make Twitter acts as your own personal DJ.

Hey @seevl, play something cool

The result is a twitter bot, running under our @seevl handle, which accepts a few (controlled) natural-language queries and replies with an appropriate track, embedded in a Tweet via a YouTube card. Here are a few patterns you can use:

Hey @seevl, play something like A

To play something that is similar to A. For instance, tweet “play something like New Order”, and you might get a reply with a Joy Division track in your feed.

Hey @seevl, play something from L

To play something from an artist signed on label L (or, at least, that used to be on this label at some stage)

Hey @seevl, play some G

To play something from a given genre G

Hey @seevl, play A

To simply play a track from A.

By the way, you can replace “Hey” by anything you want, as long as you politely ask your DJ what you want him to spin. Here’s an example, with my tweet just posted (top of the timeline), and a reply from the bot (bottom left).

Twitter As A DJ

Twitter As A DJ

A little less conversation

As it’s all Twitter-based, not only you can send messages, but you can have a conversation with your virtual DJ. Here’s for instance what I’ve sent first

And got this immediate reply – with the embedded YouTube video

Followed by (“coo” meant to be “cool”)

To immediately listen to Bettie Smith in my stream

It’s kind of fun, I have to say, especially due to the instantaneous nature of the conversation – and it even reminds IRC bots!

Unfortunately, it’s likely that the bot will reach the API rate-limit when posting Tweets (and I’m not handling those errors in the current MVP), so you may not have a reply when you interact with it.

Twitter As A Service?

Besides the music-related hack, I also wanted to showcase the growth of intelligent services on the Web – and how a platform like Twitter can be part of it, using “Twitter As A Service” as a layer for an intelligent Web.

The recently-launched “Buy button” is a simple example of how Twitter can be a Siri-like interface to the world. But why not bringing more intelligence into Twitter. What about “Hey @uber, pick me in 10 minutes”, and using the Tweet geolocation plus a Uber-API integration integration to directly pick – and bill – whoever #requested a black car? Or “Please @opentable, I’d love to have sushis tonight”, and get a reply with links to the top-rated places nearby, with in-tweet booking capability (via the previous buy button)? The data is there, the tools and APIs are there, so…

Yes, this sound a bit like what’s described in the seminal Semantic Web article by Tim Berners-Lee, James Hendler and Ora Lassila. Maybe it’s because we’re finally there, in an age where computers can be those social machines that we’re dreaming about!

 

A proposal for Semantic OMB

From what I read on Twitter, it seems there’s a bit of confusion regarding SMOB. Indeed, while SMOB provides a framework for Open and Semantic Microblogging, it does not define a new protocol, but simply uses SPARQL/Update over HTTP to exchange information between hubs (posting / removing notices and following / followers). Hence, this is not something that competes against OMB, the OpenMicroBlogging specification.

Actually, OMB is something we planned to look at for a long time, as briefly discussed when Status.net / OMB was presented in the W3C Social Web XG telco. I’ve finally took the time to analyse the full spec and checked how it compares with the distributed microblogging implementation of SMOB, and more generally with the vision of Semantic Web / Linked Data (SW/LD) microblogging services.

So here is a proposal for “Semantic OMB” (on Status.net wiki) that describes how the current OMB protocol fits with the previous idea. In particular, it aligns the terminology with existing classes / properties from well-known ontologies, and discusses how some current parts of the spec should be updated. It also discuss how OMB operations can be mapped to SPARQL/Update queries, based on the ones that currently happen in SMOB for cross-hubs synchronisation.

As you can see when browsing it, besides the terminology mappings, most of the things are compliant and there are only a few things that shall be discussed, in order to:

  • enable a better “distributed-ness” by keeping profiles owned by their users and not necessarily creating remote accounts;
  • making some mandatory elements being optional, as they are contained in the data that is exchange between services thanks to the Linked Data principles.

Thanks to these small updates, it could provide a protocol enabling SW/LD systems to be designed based on the OMB protocol, while having a sufficient abstraction level to comply with OMB systems using other technologies for data modeling and exchange. I’d be more than happy to see such features in an upcoming OMB release, and hopefully see deeper links between OMB and SW/LD efforts, as both aims to achieve the same goal of openness and interoperability. Comments and feedback are welcome on the related thread on the OMB mailing-list.

SMOB v2.1: Using SMOB as a Twitter client

Here’s a new release of SMOB, the Semantic MicrOBlogging framework. This release includes various new features, the main one being the integration of Twitter messages in SMOB so that you can use your SMOB hub as a Twitter client, where each Tweet is represented in RDFa using SIOC, FOAF, etc.

In addition, the new release provides:

  • RSS feed for hub owner’s messages;
  • Automatic @reply when replying to a Twitter message (including sioc:addressed_to annotation);
  • Updated user-interface for #tags mappings, now done using tabs to avoid too much scrolling;
  • Ability to directly check @reply messages;
  • Starring system using the Review vocabulary.

SMOB v2.1 can be downloaded here. If you used a previous version, you will also need to apply this patch after the update. It may remove some of your following / followers (as there have been some changes in the related RDF data – this should be taken into account by the patch, but who knows …), in that case you’ll add to add them again, sorry for the inconvenience !

Hopefully, a 2.2 release will be out in the next weeks, including geolocation of messages, advanced browsing features and other funky improvements. Feature requests can also be suggested on its dedicated bugtracker.