Updating doap:store architecture and rewriting queries
I recently changed the server that runs the various websites I host / maintain, moving from a Core2Duo 2×1.80Gh2 with 1Go RAM to a Celeron 2.0Ghz with only 256Mo.
The main reason I had such a server was hosting doap:store, as I wanted to run SPARQL queries in a reasonable time. Moving to the new server (a you can guess, for pricing reasons), most of the queries were really slow, even with an optimized MySQL config, some of them even freezing the MySQL server, hanging on “Copying to tmp table on disk” instructions, making the website almost unusable.
Looking for a solution to host the triplestore on a more powerful box, Kingsley Idehen and Openlink kindly offered to host it using Virtuoso Open-Source Edition with EC2. I just made the changes, re-imported the ~4600 RDF files fetched until now (now including doapspace data), and the service is live again, with really better performances in both finding and browsing projects.
So, now, doap:store runs thanks to :
- a Amazon EC2 server, using Virtuoso to store RDF data and to provide a SPARQL endpoint for it;
- a Debian GNU/Linux box, using Apache2 and PHP5 to build the interfaces. This one also fetches new projects descriptions thanks to Ping The Semantic Web (with a Python cron job) and updates / queries the former triple store.
In the meanwhile, I optimized some queries by removing useless vars and reordering statements, after reading this technical report about OptARQ. Finally, I took advantage of Virtuoso aggregate functions to use count, instead of fetching all graphs / projects and counting in PHP for projects stats.
Thanks again to Openlink for hosting the data and for their support !
Tags: doap, doapstore, rdf, scalability, virtuoso
Finding doap projects with YubNub
I was writing an opensearch plug-in for doapstore.org to allow searching projects directly from Firefox or IE7 search box, when I remembered yubnub.org, which already have such a plug-in. YubNub allows anyone to create command lines for the web, eg. typing "gim rdf" will search Google images for "rdf".
So rather than creating a plug-in for doapstore, I created a YubNub command, simply called "doap". It works as follow:
doap foowill search all projects with a name (doap:name) or description (doap:shortdescordoap:description) containingfoof;doap name=foowill search by name only;doap desc=foowill search by description (both long and short);doap lang=foowill search by programming language (doap:programming-language);doap host=foowill search by hostname (i.e. project URI)
The other advantage of YubNub is that it can be used not only as a search engine for your favourite browser, but also has various frontends, as Tiger widgets or command line scripts fro shell. Really useful !
NB: I also thaught as a generic SPARQL command for YubNub that would query different endpoints as Danny Ayers suggested and return a single set of results thanks to a SPARQL dispatcher but did not write anything about it.
