Social, mobile, semantic

Monday’s DBpedia mobile presentation at LDOW2008 impressed me a lot. Actually, while I never worked on it, I’m really interested in ways to combine mobile applications, Semantic Web / Linked Data technologies and social networking. Here’s a use case I have in mind for a long time and I’d like to share.

Imagine in can embed a FOAF profile on my mobile phone, or just an URI with owl:sameAs / rdfs:seeAlso links to my main URI / RDF file. When joining a conference, a restaurant or any place where there are some people (and when I’m in a good mood), I allow my phone to deliver my presence and this URI (+ related data) to anyone, while at the same time searching for available URIs and data.

Then, I got a list of URIs, and my phone will suggest me that there’s some people nearby that I must meet regarding some criterias and how our URIs are interlinked. A simple way would be to configure the application with kind of (statement, depth) tuples. For example (foaf:interest, 2) would suggest me all people where one of my foaf:interest is link to one of their foaf:interest with a maximum path of 2. And of course, those paths should be computed using Linked Data and considering the whole SW graph, or GGG, e.g. going through DBpedia, GeoNames of any dataset from the LOD cloud if needed.

But, in some case, paths are not enough, as they can result to unrelevant results (depending on the start URI the path may quickly go towards too generic URIs), or sometimes too much people. For example, at a SW event, I guess it would have suggest me to meet anything since I have dbpedia:Semantic_Web in my profile. A solution could be to have an intelligent context manager in the mobile phone that will check my iCal, find that I’m attending a workshop (or even better, use GPS location, browse upcoming.org or other services to find which event I’m attending), retrieve the workshop homepage in which organizers embedded some RDF data about topics of the workshops (as they eat their own dogfood :), and exclude those URIs (and paths that goes through). To be more accurate, instead of those path tuples, I could also define complex queries, as for example: “People that will present some paper at a conference I’ll attend next month”.
Actually, it’s just a matter of providing all the data, open it, and of course, interlink. But well, “Linked Data is the Semantic Web done as it should be. It is the Web done as it should be“, no ?

SparlPress and foaf:openid

This website now uses SparqlPress.

Morten did a lot of work to include a scutter with ARC2-integration into the plugin, and so this blog now features a RDF backend, that stores some data from my website and related documents (FOAF profile, related seeAlso’s) and also from people who commented there. After 3 great weeks in DERI, I finally took time to dig in the source code of the plugin and start hacking.

A cool thing with this plugin is that the openid patch I wrote some times ago, which implied to hack the original plug-in, is now powered by SparqlPress itself. Each time someone registers to the website, its openid URL is parsed by ARC’s SemHTMLParser which retrieves the FOAF profile, that then goes in the scutter’s queue. There maybe some delay before the files is fetched, but this issue should be covered soon.

Yet, you may notice that some FOAF links disappeared from the comments. While the first version only retrieved the profile using auto-discovery links, I can now SPARQL the file to check if there’s a foaf:openid link to the URL, which lets identify that’s the foaf profile belongs to (or mentions) the related user. So, if your foaf link disappeared, that’s certainly because you don’t have this property in your profile, of it’s not the same than the URL you used to register here (be careful with trailing /). On the other hand, thanks to SPARQL capabilities, comments now also feature links to homepage and blog, picture and others things may come soon.

PREFIX foaf: <http://xmlns.com/foaf/0.1/>
PREFIX rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#>
SELECT ?graph ?uri ?home ?blog
WHERE {
  GRAPH ?graph {
    ?uri rdf:type foaf:Person ;
      foaf:openid <$openid> .
    OPTIONAL { ?uri foaf:homepage ?home } .
    OPTIONAL { ?uri foaf:weblog ?blog } .
  }
} LIMIT 1

There should also be some privacy settings in the future (in case you do not want your information to be used), as well as SIOC / FOAF / SKOS exports and some other widgets / social network stuff. If you’re interested, check this page (repositories are not merged yet).

Introducing MOAT

I’m happy to announce the MOAT project:

MOAT (Meaning Of A Tag) provides a Semantic Web framework to publish semantically-annotated content from free-tagging.

While tags are widely used in Web 2.0 services, their lack of machine-understandable meaning can be a problem for information retrieval, especially when people use tags that can have different meanings depending on the context.

MOAT aims to solve this by providing a way for users to define meaning(s) of their tag(s) using URIs of Semantic Web resources (such as URIs from dbpedia, geonames … or any knowledge base), and then annotate content with those URIs rather than free-text tags, leveraging content into Semantic Web, by linking data together. Moreover, tag meanings can be shared between people, providing an architecture of participation to define and exchange potential meanings of tags within a community of users.

To achieve this goal, MOAT relies on an architecture that can be deployed for any organisation or community and that involves a lightweight ontology, a MOAT server, and some third-party clients .

More details about the framework and its implementation are described on the project website. A demo server is available here, and updates should be done soon (code and documentation).

One FOAF fits all

A few years ago, I created my FOAF profile with FOAF-O-Matic. But actually, I almost never updated it and its foaf:knows list.

So, now, I’ll let external websites manage those informations.
I have exports in RDF of my flickr, twitter and facebook accounts, as well as this weblog (in progress), and in most of them, I define those relationships to the people I know. Since all those files define a URI for myself, I can use my main (I mean hosted by my own) FOAF file as a reference profile that will link myself to my other URIs using owl:sameAs, and also add rdfs:seeAlso links to the related files (as described here), eg:

<owl:sameAs
  rdf:resource="http://apassant.net/home/2007/12/flickrdf/people/33669349@N00"
  rdfs:seeAlso="http://apassant.net/home/2007/12/flickrdf/data/people/33669349@N00"/>

<owl:sameAs
  rdf:resource="http://twitter.com/terraces"
  rdfs:seeAlso="http://tools.opiumfield.com/twitter/terraces/rdf"/>

And I’ll get a decentralized foaf:knows network, as shown on this graph:

onefoaf.png

Then, I can grab all these profiles in a local RDF store, or even better, use a dedicated Semantic Web “social graph manager” as Knowee or Beatnik to get all my contacts locally, get their e-mail, query profiles, or as David said, integrates in other desktop apps and sync with my iphone (ok, I don’t have one yet :) ) …

And if services as linked-in or bibliography repositories export FOAF URIs for anyone, as the FOAF/DBLP service already offers, I could even link to people I worked with. In case all those services exports data with only foaf:knows and I want to be more precice, I can refine relationships in my profile using the RELATIONSHIP vocabulary, or maybe even include rules in my profile that could then be taken into consideration by agents that will query it ? Something like:

( GRAPH <http://linkedin/foafexport/mygraph> { #me foaf:knows ?x } )
=>
( #me rel:collaboratesWith ?x )

that will be in my profile itself.

Finally, since I can create a link to this FOAF profile from my OpenID, I can reuse this graph in many applications. And when login to a new service, ask him “is there anyone here that I know from flickr ?”.

More than social network, I can also link from the same profile (or, actually, from the profiles that have been linked to the reference one) to various things I wrote or done on the Web, as data from last-fm, revyu or flickr, thanks to SIOC, as explained here.

So: One reference profile. Lots of distributed information. One Giant Global Graph.

NB: Also check some of Dan Brickley’s experiments about related topics.

NB2: I did not take trust issues in consideration in that post, i.e. how can we be sure that the owl:sameAs relationship is linked to an URI which is really *me*. I think one solution would be to authenticate on those websites using OpenID so that it can find my FOAF file, then my URI, and add an owl:sameAs link in the other direction. Both files should also be signed, and I think that will be ok (?).

RDF export of Flickr profiles with FOAF and SIOC

I really think that the Semantic Web, and especially FOAF and SIOC can be an answer to the social graph and distributed social networks, as explained (and drawn) there: do not rely on proprietary APIs, but provide data in a way that can be universally linked and understood by software agents.

There are already RDF exporters for Twitter or Facebook, and here’s my contribution to the Giant Global Graph: a Flickr exporter, exporting accounts and groups - if you want some data about your pictures, check flickurl.

It uses the phpFlickr API and exports only publicly available data, to be compliant with Flickr privacy settings (i.e. if you setup your account so that not connected users cannot see it in any contact list, it won’t appear in those contact lists RDF exports).

Basically, it exports one RDF file per user, including one foaf:Person and a related sioc:User - with basic properties (sioc:name, sioc:avatar) as well as a link to his image gallery using the SIOC types ImageGalery class -, and exports the relationships (foaf:knows) with other users, and groups he’s member of (sioc:member_of). Groups are exported is another file, with related informations (dc:description, foaf:depiction …). Files are related with seeAlso links.

In order to be compliant with the Linked Data principles, the script also defines a URI for each foaf:Person, so that anyone can link to it from his FOAF profile, using this pattern within his own foaf:Person description:

<owl:sameAs rdf:resource="http://apassant.net/home/2007/12/flickrdf/people/flickr_id"/>

It also defines an URI for each sioc:User and sioc:UserGroup. I could have used the Flickr account URL, but I think that’s better to make the difference between the account itself, and the homepage of this account, same for the groups.

URIs are defined as follows:

While data is available at:

Eg with my profile:

It uses content-negociation and 303 redirections to the RDF file or to Flickr.com, depending if you access the page with an RDF-compliant browser or not.

Finally, profiles are also linked to existing URIs thanks to:

Please note that each created page is kept on the server. If you want to rebuild one, go to the service homepage and create a profile (you’ll see it also work with user names if you don’t know user id but be careful the user name is not the screen name). A dataset of created profiles should be available soon.

Edit 22/12/2006: Check this post for more details on issues with how to deal with user name / user ID.

Updating doap:store architecture and rewriting queries

I recently changed the server that runs the various websites I host / maintain, moving from a Core2Duo 2×1.80Gh2 with 1Go RAM to a Celeron 2.0Ghz with only 256Mo.

The main reason I had such a server was hosting doap:store, as I wanted to run SPARQL queries in a reasonable time. Moving to the new server (a you can guess, for pricing reasons), most of the queries were really slow, even with an optimized MySQL config, some of them even freezing the MySQL server, hanging on “Copying to tmp table on disk” instructions, making the website almost unusable.

Looking for a solution to host the triplestore on a more powerful box, Kingsley Idehen and Openlink kindly offered to host it using Virtuoso Open-Source Edition with EC2. I just made the changes, re-imported the ~4600 RDF files fetched until now (now including doapspace data), and the service is live again, with really better performances in both finding and browsing projects.

So, now, doap:store runs thanks to :

In the meanwhile, I optimized some queries by removing useless vars and reordering statements, after reading this technical report about OptARQ. Finally, I took advantage of Virtuoso aggregate functions to use count, instead of fetching all graphs / projects and counting in PHP for projects stats.

Thanks again to Openlink for hosting the data and for their support !

Retrieving FOAF profile from OpenID

Following Dan Brickley’s experiments, I’ve just setup an OpenID plug-in for this weblog, allowing anyone to register using OpenID, since users now need to register to comment.

Actually, one of the reason I installed it is that I thaught it would be an easy way to retrieve a FOAF profile for any registered user on this weblog. I’m using auto-discovery to get it from the OpenID URL. I don’t know if there’s any OpenID provider that have such a feature, but it can easilly be done if you’re delegating your OpenID to your own webpage, by adding the auto-discovery line in the header of your OpenID URL (check the source of this page if needed). BTW, WP users, here’s another plugin that helps to delegate your OpenID to your weblog.

The method I used to retrieve the profile is the following:

function fetch_foaf_profile($url) {
  $html = file_get_contents($url);
  preg_match_all('/<head.*<link.*rel="meta".*title="foaf".*href="(.*)".*\/>.*<\/head>/Usi', $html, $links);
  if($links) {
    if($foaf = $links[1][0]) {
      $ex = parse_url($foaf);
      if($ex['scheme']) return $foaf;
      elseif(substr($ex['path'], 0, 1) == '/') {
        $ex = parse_url($url);
        return $ex['scheme'].'://'.$ex['host'].$foaf;
      }
      else return $url.$foaf;
    }
  }
  return null;
}

Then, the profile is added to wp_usermeta table. The complete svn diff for the latest version of the plug-in is available here.

So, now that I can have FOAF profiles for users, I’ll be able to experiment the FOAF-avatar idea I had a long time ago. Other steps would be to combine people URIs with Morten Frederiksen’s FOAF output plugin (cannot find how to make it work with WP2.3) , using owl:sameAs to link users’ URIs created from this weblog to their profiles URI, and also to use it in the SIOC output plugin.

Regarding social networking Dan is talking about, I think that would be a great way answer questions as: do people commenting this blog already know themself in the SemWeb world ? Are they friends re. their FOAF profile ? re. Facebook ? Do they share interests ?

So, If you have FOAF and OpenID connected thanks to auto-discovery, I’ll be happy you try it so that I can start experimenting some of these ideas.

Update: The script now also retrieve SIOC profile if available.

AllegroGraph v2.2

While answering a comment on my latest post about some SW tools I’m - almost daily - using, I saw that a new release of AllegroGraph has just been set.

Actually, I tried it last week while looking for triple stores that support the following requirements:

Even if it was relatively fast, I had to remove it from my shortlist as the needed inference capabilities were supported only with its own query language.

But … according to its changelog regarding this new release:

Get-triples and SPARQL now both work with AllegroGraph’s RDFS++ reasoner

Wow ! Plus other cool features as freetext indexing and SPARQL over HTTP support… let’s try it again !

Quelques outils pour découvrir le Web Sémantique

Suite à différents billets francophones sur le Web Sémantique (dont un revenant), voici une petite liste d’outils et services que j’utilise régulièrement pour manipuler ce genre de données. Si vous voulez vous mettre à RDF, RDFS et/ou OWL, vous trouverez sans doute de quoi vous amuser un peu.

Si vous avez besoin d’un vocabulaire précis pour gérer vos données, et avant de vous lancer dans une phase plus ou moins longue de modélisation, assurez vous que vos besoins ne sont pas couverts par une ontologie déjà existante. Swoogle et SchemaWeb permettent ainsi de trouver un vocabulaire en fonction des classes que vous souhaitez utiliser. Par exemple, une recherche sur “Person” vous amènera logiquement - entre autres - vers FOAF. Vous pouvez également consulter la liste des 100 espaces de nom RDF les plus utilisés sur le Web, et les recommendations du projet Linking Open Data sur les vocabulaires “de référence“.

Si rien ne vous convient, à vous de créer votre ontologie. Protégé est l’outil de référence dans le domaine. De nombreux plug-ins, un support des différents niveaux de OWL …, le tout en open-source. Il ne supporte cependant que la sérialisation XML des données RDF. Par contre, Protégé permet de partir d’une ontologie existante lors de la création d’un nouveau modèle, et d’utiliser ainsi les notions de sous-classes (rdfs:subClassOf) et de sous-propriétés (rdfs:subPropertyOf) entre ontologies (comme peut le faire SIOC avec FOAF via sioc:User), ce qui vous permettra par la suite de bénéficier des possibilités d’inférence de certains moteurs SPARQL. De façon générale, c’est une bonne pratique que d’affiner des vocabulaires existants (en terme de classes ou de propriétés) plutôt que de repartir de zéro, dans une perspective d’interconnection des données.

Une fois l’ontologie en place, vous pouvez à nouveau utiliser Protégé pour créer les instances correspondantes, ou passer simplement par un éditeur de texte. Cependant, Protégé vous permettra de réduire les erreurs, et vous pouvez même l’associer à un raisonneur pour éviter les incohérences (par exemple, éviter créer une instance qui ne respecte pas certaines contraintes que vous avez définies une ontologie OWL).

Dans le cas d’une édition manuelle, et pour un fichier serialisé en XML, vous pouvez utiliser le validateur du W3. Attention, celui-ci ne vérifiera pas que les données sont conformes aux ontologies utilisées, mais juste que le fichier est bien formé, avec en prime une représentation graphique.

Une fois vos différents fichiers RDF en main, comment les manipuler et les interroger ? Ici ma préférence va à Redland, ses différents outils en ligne de commande, et ses bindings Python. La manipulation des triplets et l’interrogation via SPARQL est également très simple. Puisqu’on parle de SPARQL, c’est le langage inévitable pour interroger des données RDF, mais aussi les reformater en utilisant CONSTRUCT.

Du côté de PHP, les librairies RAP et ARC proposent aussi la manipulation et l’interrogation de données RDF. Dans les deux cas, l’implémentation d’un entrepôt RDF en ligne se fait rapidement, et RAP a la bonne idée de proposer un moteur d’inférence. Pour un entrepôt de données plus puissant, j’utilise régulièrement 3store, qui supporte aussi certaines règles RDFS, avec l’avantage que celles-ci sont automatiquement déduites des ontologies importées dans l’entrepôt. Il est assez rapide mais nécessite une phase de compilation et donc un serveur sur lequel vous avez la main. Virtuoso est aussi prometteur, mais je n’ai pas encore eu l’occasion de faire des comparaisons avec 3store.

Enfin, si vous voulez simplement visualiser des données RDF sans passer par un entrepôt de donnés, vous pouvez utiliser Exhibit, dont la version 2.0 vient tout juste de sortir en bêta, et jeter un oeil par la même occasion aux différents outils du projet SIMILE.

Pour finir, et pour un inventaire plus complet d’outils dédiés au Web Sémantique, vous pouvez consulter cette liste de plus de 500 outils, dont l’interface utilise d’ailleurs Exhibit.

How many triples ?

I just found a simple way to count triples in a RDF file, without needed to script anything, using rapper (included in raptor-utils package for Debian and Ubuntu):

rapper --count http://apassant.net/feed 2>&1 | awk 'NR==2 {print $4}'

Redland is definitely great to manage RDF in many ways !

Next Page →