From RSS to SIOC using SPARQL

Recently, Danny Ayers asked on sioc-dev:

My blog has RDF inside, core stuff is RSS 1.0 vocab. Does anyone happen to a SPARQL CONSTRUCT for RSS 1.0 to SIOC?

I’ve never looked at the CONSTRUCT feature of SPARQL before, so I thought it was a good motivation to look at it. Basically, the goal of CONSTRUCT is to create a RDF graph from a SPARQL query (instead of getting the XML / JSON formatted-results). So it can be used to translate RDF data from one format to another, as soon as you can get data from source using a SPARQL query.

Regarding RSS 1.0 to SIOC, the mappings can be defined as:

  • rss:channel is a sioc:Forum (I think that’s the broader concept, as it’s the one that contains the posts), and rss:title, rss:link and rss:description can be mapped to dc:title, sioc:link and dc:description;
  • rss:item is a sioc:Post. The previously mentionned properties are mapped the same way.

So, the RSS1.0 “core” to SIOC CONSTRUCT query is:

PREFIX rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#>
PREFIX sioc: <http://rdfs.org/sioc/ns#>
PREFIX rss: <http://purl.org/rss/1.0/>
PREFIX dc: <http://purl.org/dc/elements/1.1/>

CONSTRUCT {
  ?channel rdf:type sioc:Forum .
  ?channel sioc:link ?channel_url .
  ?channel dc:title ?channel_title .
  ?channel dc:description ?channel_description .
  ?channel sioc:container_of ?item .
  ?item rdf:type sioc:Post .
  ?item sioc:link ?item_url .
  ?item dc:title ?item_title .
  ?item sioc:content ?item_content .
} WHERE {
  ?channel rdf:type rss:channel .
  ?channel rss:link ?channel_url .
  ?channel rss:title ?channel_title .
  ?channel rss:description ?channel_description .
  ?channel rss:items ?items .
  ?items ?li ?item .
  ?item rdf:type rss:item .
  ?item rss:link ?item_url .
  ?item rss:title ?item_title .
  ?item rss:description ?item_content .
}

As an example, here’s my RSS feed (core only) translated to SIOC, and rendered in the SIOC browser.

Yet, most blogs also use DC and content RSS extensions to add more information to each rss:item, so I mapped dc:date to dcterms:created, and kept content:encoded, both using OPTIONAL in the query [1].

Finally, I also wanted to map dc:creator, but this is more tricky as there’s different use cases:

  • If querying the posts feed or if your blog needs registration to comment, then dc:creator is a registered user of the system, so he’s a sioc:User, using dc:creator as a sioc:name. He can also be linked to a foaf:User, where dc:creator could be used as foaf:name (or maybe foaf:nick or rdf:label, I’m still confused with this). Yet, as nothing can clearly define a dc:creator, since this is just a simple string, it will create one sioc:User / foaf:User for each post, and I’m not sure it really makes sense, but here’s the part to add to make it work:
?item foaf:maker _:foaf .
_:foaf foaf:name ?item_creator .
_:foaf foaf:holdsAccount _:sioc .
_:foaf rdf:type foaf:Person .
?item sioc:has_creator _:sioc .
_:sioc rdf:type sioc:User .
_:sioc sioc:name ?item_creator .

in the CONSTRUCT part, and

?item dc:creator ?item_creator

in the WHERE clause. (eg: my RSS feed + browing it);

  • If you translate comments feed – allowing not-registered users -, as it was decided earlier, just use foaf:Person here, with this CONSTRUCT part:
 
?item foaf:maker _:foaf .
_:foaf rdf:type foaf:Person .
_:foaf foaf:name ?item_creator .

(eg: my comments feed + browsing it)

  • Finally, if your RSS feed uses dc:creator and foaf:maker as this one, you can get more information about foaf:Person:
 
?item foaf:maker _:foaf .
_:foaf rdf:type foaf:Person .
_:foaf foaf:name ?item_creator .
_:foaf foaf:holdsAccount _:sioc .
_:foaf foaf:nick ?foaf_nick .
_:foaf foaf:mbox_sha1sum ?foaf_sha1 .
_:foaf foaf:homepage ?foaf_homepage .
_:foaf rdfs:seeAlso ?foaf_seealso .
?item sioc:has_creator _:sioc .
_:sioc rdf:type sioc:User .
_:sioc sioc:name ?item_creator .

in the CONSTRUCT part, and

?item foaf:maker ?foaf .
?foaf rdf:type foaf:Person .
?foaf foaf:nick ?foaf_nick .
?foaf foaf:mbox_sha1sum ?foaf_sha1 .
?foaf foaf:homepage ?foaf_homepage .
?foaf rdfs:seeAlso ?foaf_seealso

in the WHERE clause, see result here + browsing it. Yet, I still can’t see how to merge all foaf:User into only one using only CONSTRUCT. If someone knows, I’ll be happy to get it.

So, basically, as most feeds contain only RSS, DC, and content vocabularies, here’s a SPARQL CONSTRUCT that should fits most of it (remove the SIOC part in CONSTRUCT for comments feeds):

PREFIX rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#>
PREFIX sioc: <http://rdfs.org/sioc/ns#>
PREFIX rss: <http://purl.org/rss/1.0/>
PREFIX content: <http://purl.org/rss/1.0/modules/content/>
PREFIX dc: <http://purl.org/dc/elements/1.1/>
PREFIX dcterms: <http://purl.org/dc/terms/>
PREFIX foaf: <http://xmlns.com/foaf/0.1/>
CONSTRUCT {
  ?channel rdf:type sioc:Forum .
  ?channel sioc:link ?channel_url .
  ?channel dc:title ?channel_title .
  ?channel dc:description ?channel_description .
  ?channel sioc:container_of ?item .
  ?item rdf:type sioc:Post .
  ?item sioc:link ?item_url .
  ?item dc:title ?item_title .
  ?item dcterms:created ?item_created .
  ?item sioc:content ?item_content .
  ?item content:encoded ?item_content_encoded .
  ?item dc:subject ?item_subject .
  ?item foaf:maker _:foaf .
  _:foaf foaf:name ?item_creator .
  _:foaf foaf:holdsAccount _:sioc .
  _:foaf rdf:type foaf:Person .
  ?item sioc:has_creator _:sioc .
  _:sioc rdf:type sioc:User .
  _:sioc sioc:name ?item_creator .
} WHERE {
  ?channel rdf:type rss:channel .
  ?channel rss:link ?channel_url .
  ?channel rss:title ?channel_title .
  ?channel rss:description ?channel_description .
  ?channel rss:items ?items .
  ?items ?li ?item .
  ?item rdf:type rss:item .
  ?item rss:link ?item_url .
  ?item rss:title ?item_title .
  ?item rss:description ?item_content .
  OPTIONAL {
    ?item dc:date ?item_created
  } . OPTIONAL {
    ?item content:encoded ?item_content_encoded
  } . OPTIONAL {
    ?item dc:subject ?item_subject
  } . OPTIONAL {
    ?item dc:creator ?item_creator
  }
}

My translated RSS feed there, and the same one in the browser.

Notes

[1] I fist struggled with CONSTRUCT / OPTIONAL, before realising thanks to #swig guys that it was a librdf bug, not a SPARQL one, and it runs fine with Jena/ARQ which runs in these examples with sparql.org service.

About these ads

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s