About JSON-LD and Content-Negotiation

NB: This post was originally written on blog.seevl.net – I crosspost it here now as I’m starting to blog here again, and I think that’s definitely an approach and a data-serialisation worth using.

I wanted to write about our use of Content-Negotiation for a long time, and as we recently switched to JSON-LD as the unique format to represent our data, I decided to talk about both in this blog post.

As you may know (or can guess when checking the team page), we have been involved in efforts around the Semantic Web and Linked Data for many years, and are big fans of the graph model that these technologies offer. It’s flexible, it’s agile, and it’s a straightforward way to integrate and mash-up data from different sources. But when it comes to providing this data back to developers, things are more complex. One could chose to offer RDF/XML or Turtle, but that generally requires a new skillset. Why ? Because most platforms provide JSON, and developers are consequently more used to this than to other formats. Check in particular these recent stats (slide 22-24) from a talk that John Musser (ProgrammableWeb) gave at SemTech this year.

So, we decided to use JSON-LD for our data, after an initial home-grown JSON-modeling, that was not that far from the current JSON-LD spec. One thing I particularly like is that it enables to send “objects” over the wire rather than as a set of triples. For instance, consider this representation of facts about the Beatles:

{
    "@context": {
        "collaborated_with": "http://purl.org/ontology/mo/collaborated_with",
        "id": "http://purl.org/dc/terms/identifier",
        "origin": "http://purl.org/ontology/mo/origin",
        "prefLabel": "http://www.w3.org/2004/02/skos/core#prefLabel",
        "uri": "@subject"
    },
    "collaborated_with": [
        {
            "id": "hSmwe4Dq",
            "prefLabel": "The Quarrymen",
            "uri": "http://data.seevl.net/entity/hSmwe4Dq#id"
        }
    ],
    "origin": [
        {
            "id": "px6UYEPh",
            "prefLabel": "England",
            "uri": "http://data.seevl.net/entity/px6UYEPh#id"
        },
        {
            "id": "xMgUSM9b",
            "prefLabel": "Liverpool",
            "uri": "http://data.seevl.net/entity/xMgUSM9b#id"
        }
    ],
    "prefLabel": "The Beatles",
    "uri": "http://data.seevl.net/entity/pzkVnsbP#id"
}

If you are used to JSON, you probably understand with no additional effort, and can use any JSON toolkit to parse it. If you are aware of the Linked Data principles, you directly see that every entity has its own URI, that can also be accessed to get more infos about it. And if you care about triples, you can use a JSON-LD parser or the public playground, that will understand the @context values to translate this JSON-LD feed into raw triples (using MO and SKOS here). Clearly, the best of both worlds.

Then, Content-negotiation. When using a website and then deciding to develop around it,  I am often frustrated by the need to learn new URLs, new paths, new parameters. Why the hell humans and machines should have different way to access the same data, albeit in different formats ? This is exactly why we rely on Content-negotiation on data.seevl.net. By default, every page is rendered as HTML, but if you ask for JSON, you’ll get a JSON-LD representation of the same entity, separated into different “slices” (infos, link, facts, topics and related artists, as detailed on our dev zone). No need to learn new URIs, no additional paramaters. Just tell us you want JSON, and we’ll serve you what you need !

curl http://data.seevl.net/entity/?prefLabel=beatles
-H "Accept: application/json" -H "X_APP_ID: 1c55b80a"
-H "X_APP_KEY:65e7fbe154e8cee6c1704a9358dd8939"
Of course, we still want users to authenticate and gather metrics about data usage, but content-negotiation (versus a separate api.mydomain.org) does not prevent this at all. We are using 3scale and as opposed to most API-enabler, they do not proxy the API calls. This means that we can simply implement content-negotiation from our side (using our existing URIs), and just call them when authenticating and reporting metrics.
Overall, this combination of Content-negotiation and JSON-LD works like this (plus some other usual suspects such as Django, memcached, ApacheVarnish and Virtuoso – all on AWS)
Content-negotiation + JSON-LD
Content-negotiation + JSON-LD
To conclude this post, there are two things that really matter here:
  • First, WYSIWYM – What You See Is What You Mean. Using JSON-LD, we provide a view of our data directly mapped to our underlying model – in a simple JSON format. This helps to understand of how data is represented and how one can query it later (for example, “reversing” the previous representation to get a list of bands originated from Liverpool)
  • Then, we save costs. By implementing a Content-negotiation strategy, we have a single layer to maintain between users (humans and machines) and our data. That largely simplifies the deployment process, and minimises overhead. Also, every new feature is immediately available from both side with no added cost.
Enjoy a Web designed both for humans and machines. Enjoy a Web of Data.

Timeout for HTML5 localStorage

HTML5 localStorage is a nice way to store data on the client side. However, when using it to cache remote API calls, you may want to purge it from time to time. As per its spec, there’s no way to automatically set-up a timeOut, so here’s a tiny bit of jQuery that does the job:

var hours = 24; // Reset when storage is more than 24hours
var now = new Date().getTime();
var setupTime = localStorage.getItem('setupTime');
if (setupTime == null) {
    localStorage.setItem('setupTime', now)
} else {
    if(now-setupTime > hours*60*60*1000) {
        localStorage.clear()
        localStorage.setItem('setupTime', now);
    }
}

Back to blogging

Well, not really, even though I should find a way to restore all the posts of my previous blog somewhere on that domain.

I’ve setup this Tumblr for random notes, pics, etc. and a few longer posts. Yet, I guess that most of the blogging side will be on http://blog.seevl.net – both from the tech and the business side, discussing what we’re building over there in the music and data space. I’ll try to link them – or crosspost some bits and pieces – from here.