The new schema.org actions: What they mean for personalisation on the Web

The schema.org initiative just announced the release of a new action vocabulary. As their blog post emphasises:

The Web is not just about static descriptions of entities. It is about taking action on these entities.

Whether they’re online or offline, publishing those actions in a machine-readable format follows TimBL’s “Weaving the Web” vision of the Web as a social machine.

It’s even more relevant when the online and the offline world become one, whether it’s through apps (4square, Uber, etc.) or via sensors and wearable tech (mobile phones, Glass, etc.). A particular aspect I’m interested in is how those actions can help to personalise the Web

The rise of dynamic content and structured data on the Web

This is not the first time actions – at least online ones –  are used on the Web: think of Activity StreamsWeb Intents, as well as SIOC-Actions that I’ve worked on with Pierre-Antoine Champin a few years ago.

Yet, considering the recent advances on structured Web data (schema.org, Google’s Knowledge Graph, Facebook OpenGraph, Twitter cards…), this addition is a timely move. Every one can now publish their actions using a shared vocabulary, meaning that apps and services can consume them openly – pending the correct credentials and privacy settings. And that’s a big move for personalisation.

Personalising content from distributed data

Let’s consider my musical activity. Right now, I can plug my services into Facebook and use the Graph API to retrieve my listening history. Or query APIs such as the Deezer one. Or check my Twitter and Instagram feeds to remember some of the records I’ve put on my turntable. Yet, if all of them would publish actions using the new ListenAction type, I could use a single query engine to get the data from those different endpoints.

Deezer could describe actions using the following JSON-LD, and Spotify with RDFa, but it doesn’t really matter – as both would agree on shared semantics through a single vocabulary.

<scripttype="application/ld+json">
{
  "@context":"http://schema.org",
  "@type":"ListenAction",
  "agent":{
    "@type":"Person",
    "name":"Alex"
  },
  "object":{
    "@type":"MusicGroup",
    "name":"The Clash"
  },
  "instrument":{
    "@type":"WebApplication", 
    "name":"Deezer",
    "url":"http://deezer.com"
  } 
} </script>

Ultimately, that means that every service could gather data from different sources to meaningfully extract information about myself, and deliver a personalised experience as soon as I log-in.

You might think that Facebook enables this already with the Graph API. Indeed, but data need to be in Facebook. This is not always the case, either because the seed services haven’t implemented – or removed – the proper connectors, or because you didn’t allow them to share your actions.

In this new configuration, I could decide, for every service I log-in, which sources it can access. Log-in to a music platform? Let’s access to my Deezer and Spotify profiles, where some schema.org Actions can be found. Booking a restaurant? Check my OpenTable ones. From there, those services can quickly build my profile and start personalising my online experience.

In addition, websites could decide to use background knowledge to enrich one’s profile, using vertical databases, e.g. Factual for geolocation data or our recently relaunched seevl API for music meta-data, combining with advanced heuristics such as such as time decay, actions-objects granularity and more to enhance the profiling capabilities (if you’re interested in the topic, check the slides of Fabrizio Orlandi’s Ph.D. viva on the topic) .

Privacy matters

This way of personalising content could also have important privacy implications. By selecting which sources a service can access, I implicitly block access to data that is non-relevant or too private for that particular service – as opposed to granting access to all my content.

Going further, we can imagine an privacy-control matrix where I can select not only the sources, but also the action types to be used, keeping my data safe and avoiding freakomendations. I could provide my 4square eating actions (restaurants I’ve checked-in) to a food website, but offer my musical background (concerts I’ve been to) to a music app, keeping both separate.

Of course, websites should be smart enough to know which action they require, doing a source/action pre-selection for me. This could ultimately solve some of the trust issues often discussed when talking about personalisation, as Facebook’s Sam Lessin addressed in his keynote on the future of travel.

What’s next?

As you could see, I’m particularly interested in what’s going to happen with this new schema.org update, both from the publishers and the consumers point of view.

It will also be interesting to see how mappings could emerge between it and the Facebook Graph API, adding another level of interoperability in this quest to make the Web a social space.

dbrec – Intelligent music recommendations for and from the Web of Data

In addition to the Social Semantic Web, you probably know that one of my main research interest concerns Linked Data, not only in publishing but also in consuming it. And well, I also enjoy music and the possibilities that LOD offers in that context, as we’ve wrote with Yves mid-2008.

So, I recently worked deeper on the use of Linked Data for music recommendations and I’m happy to announce dbrec, a service providing recommendations for the 39,000+ artists available in the DBpedia dataset (i.e. identified as instances of dbpedia-owl:MusicalArtist or dbpedia-owl:Band). The recommendations are computed using an algorithm for Linked Data Semantic Distance and take into account the various links that connect two resources, either directly (e.g. artists having played together) or indirectly (e.g. being on the same label or having covered the same song). Moreover, dbrec, explains the recommendations to the user, by keeping in mind the various links that have been used to compute the recommendations. For instance, the following screenshot shows why Big Brother and the Holding Company is suggested for a search on Janis Joplin.

dbrec is fully based on Semantic Web and Linked Data technologies and, in addition, exposes all the recommendations publicly (under a Creative Commons license) in RDFa using the dedicated LDSD ontology. For more details, you can check the homepage of the service, and start exploring the recommendations. Hey ! Ho ! Let’s Go !

SMOB v2.0

Attachment Size
smob-posts.png 63.68 KB

About 2 years ago, we designed SMOB, a Semantic Microblogging client and server application, in order to demonstrate how Semantic Web and vocabularies like FOAF and SIOC could be used to provide a more open microblogging experience.

While we did not improve is much since then, there have been a lot of work on it these last months (about 250 SVN commits since end of October, when we decided to revive it) and I’m happy to announce that SMOB v2.0 is now officilay out, after some internal beta-testing during the last weeks.

Overall, it has been a complete code rewriting and architecture redesign since the previous release. While the initial version relied on clients and servers to respectively publish and aggregate data, this new version is based on the concept of distributed and independent hubs that communicate each other to exchange data, being microblog posts as well as followers / following lists.

As you can guess, SMOB is entirely based on Semantic Web and Linked Data technologies. Then, each hub locally stores its data as native RDF (using ARC2, also providing a SPARQL endpoint per hub) and the communication between hubs is provided via SPARQL/Update over HTTP. In addition, each hub provides RDFa information about itself and the microblog posts it contains, using SIOC, FOAF and OPO as well as interlinking with the Linking Open Data cloud using MOAT and CommonTag. Regarding that later aspect, the UI has also been improved and the system now suggest URIs from DBpedia and Sindice (new wrappers can easily be added) as soon as you use any #tag when writing your posts, and the mappings between tags and URIs are remembered for further usage in other posts. Finally, new content is posted to Sindice to enable discovering and querying microblog posts across the (Semantic) Web.

For those who want to get a preview before installing their own hub, here are two screenshots of the interface, the first one about publishing data, where you can see #tag mappings, as well as broadcasting to Twitter.

And in that second one, you can see a list of posts, with links to RDF data, hashtags mapped to URIs, etc.

You can also have a look at my SMOB hub here.

SMOB v2.0 is available through its download page and is licensed under the terms of the GNU GPL as its previous release. In addition, we are happy to provide commercial support for it, such as development of new features or custom integration of SMOB for enterprise microblogging purposes. For any enquiry about these commercial services, simply send an e-mail to at alexandre.passant[AT]deri.org, indicating [SMOB Support] in the subject line.

Oh, and finally, SMOB graduated and now got its own domain at http://smob.me. Enjoy Semantic Microblogging !

RDFa profile and new URI

I just added a short profile about myself embedding RDFa that aims to replace my old FOAF file, in which I already moved some things (i.e. relationships) to external services.

I also gave me a nicer URI, http://apassant.net/alex that uses content-negociation to redirect either to the HTML version of this profile or to the extracted RDF one, depending on the Accept header, combining some rewrite rules that Ivan Herman defined for the SW Faq and the .htaccess used for the Flickr wrapper. My old foaf.rdf file is also now redirected to this extracted profile, and I’m using an owl:sameAs in RDFa link to be compliant with services that uses my old URI.

# Old foaf.rdf compliance
RedirectPermanent /foaf.rdf http://www.w3.org/2007/08/pyRdfa/extract?uri=http://apassant.net/about/
# RDF redirect for my URI
RewriteCond %{HTTP_ACCEPT} application/rdf+xml
RewriteRule ^alex$ http://www.w3.org/2007/08/pyRdfa/extract?uri=http://apassant.net/about/ [R=303,L]
# HTML redirect for my URI
RewriteRule ^alex$ about [R=303,L]

I’m also wondering, since this profile is used as http://apassant.net homepage which is also my OpenID URL, how it will work when loggin on websites using SparqlPress + OpenID as ARC2 embeds an RDFa extractor so that it should discover my FOAF data without using any autodiscovery link.