Export and structure your musical activity with schema.org

Following my recent post on schema.org and personalisation on the Web, I wrote a music actions exporter for various services, including Facebook, Deezer, and last.fm. Available at http://music-actions.appspot.com, it’s mostly a proof-of-concept, but it showcases the ability to uniformly export and structure your data (in that case music listening actions) whatever service you initially used. Does that ring a bell?

As the previous post focused on why it matters, I’ll cover technical aspects of the exporter here, including the role of JSON-LD for representing content on the Web.

One model to rule them all

The Music Actions exporter is not rocket science. Basically speaking, it translates (application-specific) JSON data into another (open, with shared semantics) JSON representation, using JSON-LD. But that’s also where the power lies: it would take only a few engineering hours to most platforms to expose their actions with schema.org if they already have a public API – or user profile pages (think RDFa or microdata) – doing so. And they would probably enjoy the same benefits as when publishing factual data with schema.org.

Moreover, it will make life easier for developers: understanding a single model / semantics and learning a common set of tools will be enough to get and use data from multiple sources, as opposed to handling multiple APIs as it is currently the case – meaning, eventually, more exposure for the service. This is the grand Semantic Web promise, and I’m glad to see it more alive than ever.

In particular, let’s consider the music vertical: Inter-operable taste profiles, shared playlists, portable collections, death-to-cold-start… you name it, it could finally be done. The promise has been here for a while, many have tried, and it obviously reminds me some earlier work I’ve done circa 2008 (during and post-Ph.D.), including this initiative with Yves Raimond from the BBC using FOAF, SIOC, MO and more:

Coming back to the exporter, here’s an excerpt of my recent Facebook music.listens activity (mostly gathered from spotify here) exported as JSON-LD, with a longer feed here.

{
"@context": {
"name": "http://schema.org",
"agent_of": {
"@reverse": "http://schema.org/agent"
}
},
"@id": "http://facebook.com/alexandre.passant",
"url": "http://facebook.com/alexandre.passant",
"name": "Alexandre Passant",
"@type": "Person",
"agent_of": [{
"@type": "ListenAction",
"object": {
"@id": "http://open.spotify.com/track/1B930FbwpwrJKKEQOhXunI",
"url": "http://open.spotify.com/track/1B930FbwpwrJKKEQOhXunI",
"@type": "MusicRecording",
"name": "Represent (Rocked Out Mix)",
"audio": "http://open.spotify.com/track/1B930FbwpwrJKKEQOhXunI",
"byArtist": [{
"@id": "http://open.spotify.com/artist/3jOstUTkEu2JkjvRdBA5Gu"
"url": "http://open.spotify.com/artist/3jOstUTkEu2JkjvRdBA5Gu"
"@type": "MusicGroup",
"name": "Weezer",
}],
"inAlbum": [{
"@id": "http://open.spotify.com/album/0s56sFx1BJMyE8GGskfYJX",
"url": "http://open.spotify.com/album/0s56sFx1BJMyE8GGskfYJX"
"@type": "MusicAlbum",
"name": "Hurley",
}]
}
}]
}

For every service, it returns the most recent tracks listened to (as ListenAction), including – when available – additional data about artists and albums. In the case of Deezer and Lastfm, those information are already in the history feed, while for Facebook, this requires additional calls to the Graph API, querying individual song entities in their data-graph.

Using Google Cloud Endpoints as an API layer

Since the exporter works as a simple API, I’ve implemented it using Google Cloud Endpoints. As part of Google’s Cloud offering, it greatly facilitates the process of building a Web-based APIs. No need to build a full – albeit lightweight – application with routes / handlers (webapp2, etc.): document the API patterns (Request and Response messages),  define the application logic, and let the infrastructure manages everything.

It also automatically provides a web-based front-end to test the API, and other advantages of Google App Engine infrastructure, such as Web-based logs management in order can trace production errors without logging-in to a remote box.

GAE Endpoints API Explorer

GAE Endpoints API Explorer

The only issue is that it can’t directly return JSON-LD , since it encapsulate everything into the following response.

{
"kind": "musicactions#resourcesItem",
"etag": "\"_oj1ynXDYJ3PHpeV8owlekNCPi4/NH17nWS3hMc3GSHWziswWp2pTFk\""
"data": "<a style="color: #428bca;" href="http://music-actions.appspot.com/static/data.json">some action data</a>"
}

Thus, if you use the exporter,  you’ll need to parse the response and extract the data string value, then transform it into JSON to get the “real” JSON-LD data. That’s not a big deal as you probably won’t link to the API URL anyways since the it contains your private authentication tokens. But it’s worth keeping in mind for some projects.

JSON-LD and the beauty of RDF

Last but not least: the use of JSON-LD, augmenting JSON with the concept of “Linked Data“, i.e. “meanings, not strings”.

Let’s look at the representation of 2 ListenAction instances for the same user (using their Facebook IDs in this example). The JSON-LD serialisation will be as follows.  I’m using the @graph property to represent two statements about distinct objects (as those are 2 different ListenAction) in the same document, but I could have used multiple contexts.

{
"@context": "http://schema.org&quot;,
"@graph": [{
"@type": "ListenAction",
"agent" : {
"@id": "http://graph.facebook.com/607513040&quot;,
"name": "Alexandre Passant",
"@type": "Person"
},
"object": {
"@id": "http://graph.facebook.com/10150500879645722&quot;,
"name": "My Name Is Jonas",
"@type": "MusicRecording"
}
}, {
"@type": "ListenAction",
"agent" : {
"@id": "http://graph.facebook.com/607513040&quot;,
"name": "Alexandre Passant",
"@type": "Person"
},
"object": {
"@id": "http://graph.facebook.com/10150142973310868&quot;,
"name": "Buddy Holly",
"@type": "MusicRecording"
}
}]
}

Below is the corresponding graph representation, with 2 nodes for the same agent (i.e. the user committing the action).

Representing ListeningActions with JSON-LD

Representing ListeningActions with JSON-LD

Yet, an interesting aspect of JSON-LD is its relation with RDF – the Resource Description Framework and its graph model especially suited for the Web. As JSON-LD uses @ids as common node identifiers, a.k.a. URIs, those 2 agents are actually the same, and so the graph looks like:

Merging agents with JSON-LD

Merging agents with JSON-LD

Finally, an interesting property of RDF / JSON-LD graphs is their directed edges. Thus, instead of writing the previous statement from an Action-centric perspective, with un-identified action instances (a.k.a. blank nodes), we can write it from a User-centric perspective using an inverse property (“reverse” in the JSON-LD world), as follows.

Using inverse properties in JSON-LD

Using inverse properties in JSON-LD

Leading to the following JSON-LD document, thanks to the definition of an additional reverse property in the context. This makes IMO the document easier to understand, since it’s now user-centric, with the user / Person being the core element of the document, with edges from itself to the actions it contributes to.

{
"@context": {
"name": "http://schema.org&quot;,
"agent_of": {
"@reverse": "http://schema.org/agent&quot;
}
},
"@id": "http://graph.facebook.com/607513040&quot;,
"name": "Alexandre Passant",
"@type": "Person",
"agent_of": [{
"@type": "ListenAction",
"object": {
"@id": "http://graph.facebook.com/10150500879645722&quot;,
"name": "My Name Is Jonas",
"@type": "MusicRecording"
}
}, {
"@type": "ListenAction",
"object": {
"@id": "http://graph.facebook.com/10150142973310868&quot;,
"name": "Buddy Holly",
"@type": "MusicRecording"
}
}]
}

From shared actions to shared entities

While being (for now) a proof of concept, the exporter is a first step towards a common integration of musical actions on the Web. Of course, the same pattern / method could be applied to any other vertical. But, more interestingly, we can hope that services will directly publish their actions using schema.org, as they’ve been doing for other facts – for instance artist concert data, now enriching Google’s search results through their Knowledge Graph.

In addition, an interesting next step would be to use common object identifiers across services, in order to not only share a common semantics about actions, but also about the objects used in those actions. This could be achieved by referring to open knowledge bases such as Freebase, or using vertical-specific ones such as our new seevl API in the music area. Oh, and there will be more to come about seevl and actions in the near future. Interested? Let’s connect.

Using Semantics to Improve Corporate Online Communities

I gave a talk on “Using Semantics to Improve Corporate Online Communities” yesterday at the COIN@MALLOW workshop. The talk was mainly based on the work done during my Ph.D. thesis, demonstrating how to manage and combine various layers of semantics on the top of Enterprise 2.0 ecosystems and let users create and take advantage of the related semantic annotations. Here are the slides of the talk.

http://static.slidesharecdn.com/swf/ssplayer2.swf?doc=coin-corporatecommunities-090909121441-phpapp02&stripped_title=using-semantics-to-improve-corporate-online-communities

Social Data on the Web at ISWC2009

It has been announcedin the past few weeks but I didn’t really blog about it so far. We’re hosting a second edition of the Social Data on the Web (SDoW) workshop at the next ISWC2009 in Washington. Here’s the call for papers (longer version here).

The 2nd Social Data on the Web workshop (SDoW2009) co-located with the 8th International Semantic Web Conference (ISWC2009) aims to bring together researchers, developers and practitioners involved in semantically-enhancing social media websites, as well as academics researching more formal aspect of these interactions between the Semantic Web and Social Web.

Since its first steps in 2001, many research issues have been tackled by the Semantic Web community such as data formalism for knowledge representation, data querying and scalability, or reasoning and inferencing. More recently, Web 2.0 offered new perspectives regarding information sharing, annotation, and social networking on the Web. It opens new research areas for the Semantic Web which has an important role to play to lead to the emergence of a Social Semantic Web that should provide novel services to end-users, combining the best of both Semantic Web and Web 2.0 worlds. To achieve this goal, various tasks and features are needed from data modeling and lightweight ontologies, to knowledge and social networks portability as well as ways to interlink data between Social Media websites, leveraging proprietary data silos to a Giant Global Graph.

Following the successful SDoW2008 workshop at ISWC2008, SDoW2009 aims to bring together Semantic Web experts and Web 2.0 practitioners and users to discuss the application of semantic technologies to data from the Social Web.

The workshop welcome submission of short and full papers as well as demos of applications combining Semantic Web and Social Web technologies – all due to the 10th of August.

CommonTag – An easy-to-use vocabulary for Semantic Tagging

I’m happy to announce CommonTag, a new RDFS vocabulary for Semantic Tagging, designed to bridge the gap between free-text tagging and Linked Data. In a similar way that what I’ve done in the past with MOAT, CommonTag allows one to create links between his tags (as simple keywords) and the concept they represent, defined as URIs of Semantic Web resources, from public knowledge bases such as Freebase or DBpedia.

What is especially relevant with regards to CommonTag is that the vocabulary aims to be simple to understand, easily accessible, and with an easy RDFa annotation process for end-users and Web developers. On the other hand, it features mappings with existing tagging vocabularies (the Tag Ontology, MOAT, SCOT, SIOC and SKOS) for those who want to go further or use their existing applications with this new model.

But most interestingly, as one can see when browsing the website, a key feature is that CommonTag is not an isolated initiative but supported by various companies involved in the Semantic Web and the Social Web — and especially in both ! — namely (for the initial nucleus and by alphabetical order, hope it will grow soon !) AdaptiveBlue, DERI (NUI Galway), Faviki, Freebase, Yahoo, Zemanta and ZigTag – and I must add that was a great experience to design this vocabulary together !

CommonTag is already supported in various applications as you can see on the website and on the following picture, from Zemanta to index your blog posts to Sindice to build applications on the top of it. And there is more to come soon, stay tuned ;-)

Soutenance de thèse “Technologies du Web Sémantique pour l’Entreprise 2.0″

Je soutiendrai ma thèse “Technologies du Web Sémantique pour l’Entreprise 2.0” le mardi 9 Juin à 10h30 à la Maison de la Recherche, 28 rue Serpente, Paris.

Résumé:

Les travaux présentés dans cette thèse proposent différentes méthodes, réflexions et réalisations associant Web 2.0 et Web Sémantique. Après avoir introduit ces deux notions, nous présentons les limites actuelles de certains outils, comme les blogs ou les wikis, et des pratiques de tagging dans un contexte d’Entreprise 2.0. Nous proposons ensuite la méthode SemSLATES et la vision globale d’une architecture de médiation reposant sur les standards du Web Sémantique (langages, modèles, outils et protocoles) pour pallier à ces limites. Nous détaillons par la suite différentes ontologies (au sens informatique) développées pour mener à bien cette vision : d’une part, en contribuant activement au projet SIOC – Semantically-Interlinked Online Communities -, des modèles destinés aux méta-données socio-structurelles, d’autre part des modèles, étendant des ontologies publiques, destinés aux données métier. De plus, la définition de l’ontologie MOAT – Meaning Of A Tag – nous permet de coupler la souplesse du tagging et la puissance de l’indexation à base d’ontologies. Nous revenons ensuite sur différentes implémentations logicielles que nous avons mises en place à EDF R&D pour permettre de manière intuitive la production et l’utilisation d’annotations sémantiques afin d’enrichir les outils initiaux : wikis sémantiques, interfaces avancées de visualisation (navigation à facettes, mash-up sémantique, etc.) et moteur de recherche sémantique. Plusieurs contributions ont été publiées sous forme d’ontologies publiques ou de logiciels libres, contribuant de manière plus large à cette convergence entre Web 2.0 et Web Sémantique non seulement en entreprise mais sur le Web dans son ensemble.

La soutenance est publique, si le sujet vous intéresse, n’hésitez pas !
Le mémoire et les slides seront également postés sur ce site par la suite.

SIOC goes OWL-DL

Just sent that to sioc-dev, but I guess it worth a larger announcement :

We just made some changes to the SIOC Core ontology and to the related modules:

- Added OWL-DL compliance statements for SIOC Core and the Types / Access / Services modules
- Edited owl:disjointWith statements for some classes of SIOC Core
- Removed domain of sioc:note
- Removed domain of sioc:has_owner and range of sioc:owner_of
- Defined sioc:account_of as inverse property of foaf:holdsAccount
- Defined sioc:avatar as a subproperty of foaf:depiction

So, SIOC is now OWL-DL !
This change was motivated by the current SWANSIOC integration project that will be introduced during the upcoming ISWC tutorial on Semantic Web for Health Care and Life Sciences.

The SIOC Core Ontology Specification has been updated according to the changes.

The other good news regarding SIOC is that Yahoo! SearchMonkey now supports (and recommends !) it in its developer documentation. Moreover, in case you did not already read it, John published the Tales from the SIOC-o-sphere #8 about two weeks ago.

More generally, if you want to join the SIOC community, by developing new applications or APIs, or if you request some help regarding implementing SIOC in your existing tools, feel free to come on #sioc on irc.freenode.net or ask on the sioc-dev ML.

Say hello to lodr.info

In one of my recent post, I mentionned LODr, a semantic-tagging application based on MOAT. While I started it a few months ago, it’s finally online now. I put the code in svn last friday and twitted about it, but did not make any official announcement yet, so here it is. I certainly should have released before, but as the source code involves lots of classes, I wanted to be sure of the architecture.

So, what is it about ?

LODr aims to apply to MOAT principles (in a few words, link your tags to concepts URIs – people URI, Musicbrainz artists, DBpedia resources … – , share those relationships in a community and then tag content with those URIs) to existing Web 2.0 content. So you can “re-tag” your existing Flickr pics, slideshare presentations, etc, using those principles and make your social data enter the LOD cloud. I think focusing on the existing word is important here, as LODr lets you keep your Web 2.0 habits by using your favourite tools, but provides a separate service to semantically-enrich it. I don’t want to go into too much details here, but in brief, some interesting points regarding the applications are:

  • While tags / URIs relationships are shared within the LODr community in a central RDF-base (following the MOAT architecture principles), LODr is a personal application, so that you just need to install the software on your webserver to enjoy it. Moreover, as it’s local, you can re-use your data immediately for any mash-up;
  • LODr is completely RDF-based. It might be a bit geeky, but as some were recently wondering where are all the RDF-based applications, here’s one. And of course RDF-based means using standard vocabularies, such as SIOC, FOAF, DC, the Tag Ontology and of course MOAT. The RDF-backend is powered by ARC2, so you can enjoy a SPARQL endpoint for your data. Last but not least, each item page features RDFa, using the previous vocabularies, even if you decide not to use MOAT for a particular item (so that any Web 2.0 item you aggregate is RDFa-ized);
  • Aggregated data will provide you a complete tagcloud for your social activity (which might be SCOT-ed in the next updates), as seen here. Each tag link redirects to a list of items provided using Exhibit, and you can restrict by source (i.e. the service it’s from) or creation date. And if a tag have been assigned a URI, you’ll get a link to browse the related items using a similar interface;
  • When browsing all items tagged with a particular URI, you’ll get suggested some related URIs. Related because of co-occurence as usually in tag-based applications, but also because they’re directly interlinked, or because they share a common property. To avoid information overload, only the URIs you used to re-tag some of your items will be shown;
  • The application can be easily extended. LODr uses wrappers to retrieve your data, and each wrapper is only a few lines of code (e.g. 24 lines for the Flickr one). At the moment, wrappers use RSS to retrieve data and the feeds are automatically discovered from the user FOAF profile – dataportability rocks ! Yet, the architecture allows to use authenticated wrappers (to use services API) but also SIOC exports for those tools;
  • As the MOAT process is more time-consuming that simple tagging (since you must define tag/URI relationships, at least at the first time as you can do automated tagging after) the URIs can be displayed as labels when you need to choose which one is relevant for your tag (using the inference capabilities described here as not all resources have a direct rdfs:label property ) . When you need a new URI, the application relies on the Sindice search widget, as done in the Drupal MOAT module. And the system then checks if the new URI is valid, but I’ll blog about that particular point later;
  • Finally, in addition of the previous features, LODr can be used to discover all the community content. This feature is not provided by the local application, but by LODr.info, that aggregates your RDF data when you re-tag it to provide search capabilities. Then, you can directly list all items linked to a particular URI. Want to find content related to the Forbidden City ? Or to SPARQL ? And to be even more enjoyable, I added a Ubiquity command so that from any Wikipedia page (more services will be supported soon), you can get the list of all related items (through DBpedia in order to find the concept URI from a document page). While it provides a really-straightforward way to discover related Web 2.0 content when browsing the Web, I also hope it can convice people of the complete process.

So, you can simply download the code from the website and install it. For those who just want to have a look, you can check my LODr instance (while you won’t be able to edit it, you can check the display interfaces). As there might be some bugs and I’m still adding features, please consider using the SVN version instead of the tgz. And then, enjoy the power of Linked Data for your Web 2.0 content ;-)