Archive for Tech

A startup founder should …

I recently came across this infographic about startup founders.

A Startup Founder Should be Able To…

Before starting to work on seevl, I would probably thought that this was either nuts and/or unrealistic: too many things, different skill-sets, lack of focus, etc. But I’ve actually realised that, during the last 7 days, I’ve:

  • Interacted with users to gather product feedback
  • Posted job offers, reviewed CV and done interviews
  • Learned about RabbitMQ and deployed it between EC2 instances
  • Met with solicitors to complete a fundraising round
  • Pitched the company, adjusted our deck
  • Debugged cross-domain jQuery calls on Firefox
  • Dived into market research and studied related reports
  • Tracked analytics, checked metrics, and analysed conversion funnels
  • Brainstormed about marketing and customer acquisition strategy
  • Sent tons of e-mails

After all, this infographic seems to be the perfect representation of what’s a founder’s life in an early stage start-up. It reminds me a quote from Emilie Olson that I’ve read in the excellent “Do more faster”:

It’s like I’m walking into a final exam every day that is composed of essay questions on topics I’ve never studied

Obviously, that’s challenging. But super exciting at the same time.

My experience with Hailo

I’ve been attending LeWeb in London and decided to give a try to Hailo. As they are launching soon in Dublin, this was an opportunity to test it in a place where they are already widely deployed, with more than 20,000 black cabs using it.

If you dont know it, Hailo is a (well-funded) mobile taxi booking app, so that you can get a taxi wherever you are (and pay it) simply by using your iPhone or Android. The app is very simple to setup: download, provide a phone number to validate your account, enter a few personal details, and that’s it.

That’s the first mobile app I’m using to get a taxi. I wanted to use Uber in the past when I was in the US, but the prices are much higher than a taxi, so I just gave up (Yet, they said at LeWeb that they’ll provide taxi services soon, that becomes interesting).

So, I used it 4 times in 2 days. I managed to immediately hail a taxi 3 times, while I had to re-try twice the fourth time, trying to get a taxi at 10PM from Soho (The app simply said that was a busy times). In all cases, the cab was here between 2 and 5 minutes once it has been confirmed. Geo-location (as you don’t need to enter any address) worked fine 3 times, and was slightly wrong the fourth time, locating me at the wrong street corner. Yet, the app gives you the plate number of the taxi so you can easily locate it when it’s around.

Hailo taxi booking interface

Hailo taxi booking interface

Yet, even if that’s super easy, I won’t consider the booking as the main value of the app. In all cases, I was in places where I may have been able to find a cab in 5 minutes as well – but I understand the value in less busy places. Managing everything from the phone is what really made me use and love it, especially the payment. You don’t have to do anything once your course is completed: just leave the cab while the driver confirms the course, then Hailo takes care of the payment (as long as you entered your card details before the course), and you receive a receipt in your inbox a few minutes later. No need to keep tickets, then to scan/photocopy to get refunded or forward to accountant. Super smart and simple. No fees for users (they take a cut from taxi drivers).

Payment and rating interface

Payment and rating interface

I also talked to the drivers that confirmed what was Hailo’s CEO Jay Bregman was telling at the Web, i.e. they see a lot of value in it as the service brings them new jobs, either in location where they can’t easily get customers, or when people don’t want to pay cash but drivers don’t have a machine. Plus some other driver-related features such as alerts, etc..

Overall, a very good experience. I’ll definitely use it again, and am looking forward to their launch in Dublin, as well as Uber entering the taxi domain in the US, and Hailo expanding over there!

seevl v2 is out

A brand new release: social graph integration, advanced play-listing, updated liner notes, and much more.



Get it now at http://seevl.net for hours of music discovery on YouTube, and check our blog post for more details. For the semweb-inclined, a tech blog post will follow.

Editorial tips (for students) on writing a research paper

During the past few years, I (co-)authored and reviewed a lot of research papers, from workshops to journals. By being on both sides of the table, here are some editorial tips that I’d like to share on writing a convincing (Computer Science / Web Technologies) research paper, and hopefully getting it accepted (don’t blame me if it’s not!).

Before starting, there are a few things that every student should know about the process:

- First, and that’s the law (game?) of blind-peer-review, you don’t know who reviews your paper. This means that the reviewer can be a super-busy professor who’ll make a decision in 2 minutes just by reading your abstract, or a pedantic junior researcher complaining that you don’t cite his poster presentation at yet-another-unknown-workshop. Hopefully, reviewers are generally careful researchers taking time to properly read the paper, make an opinion based on your research and their knowledge of the topic. In this case, even if the paper is rejected, you should end-up with a useful review that will let you improve the paper for the re-submission, so don’t be upset.

- Then, keep in mind that your conference/workshop paper (journal are different) will not be the only one assigned to the reviewer for this event. In some cases, reviewers are assigned up to 10 papers; here, reviews are generally spread among colleague or students – and the main reviewer should ensure that the quality of reviews is OK. But even by delegating, a reviewer could have to read 5+ reviews. So, your paper must catch the attention of the reviewer, and stand out of the crowd. Are you aware of the 5 seconds rule for websites? There’s something similar for research papers: you have to wow the reviewers/readers since the abstract. Personally, I have a first feeling about a paper on the first minute reading it (e.g. compelling abstract with clear contributions VS topic that has been discussed 10+ times in the same venue, etc.). While reading the full paper is what makes the final decision, the first impression counts and can influence the decision.

That being said, here are the tips (not about the content nor value of your research, neither the writing style):

- Write a compelling abstract and introduction. Convey a “WOW!” effect the the reviewer, rather than a “OK, yet-another-paper on …”. This means: clearly identify the problem, the gap in the state-of-the-art and your contribution. If you can do that in a paragraph, you’ve done the hard part of the job: giving the reviewer an incentive to carefully read your paper. And you’ve showed that you can clearly articulate your research in just a few sentences.

- If your paper describes some kind of software, include at least screenshots and – unless that’s something confidential – links to a demo, source code or video. Show that you’ve really done what you promise in the paper, and that it’s not just vaporware. It doesn’t need to be a super-fancy system with bullet-proof source-code (at least if you’re in academia, that’s just a research prototype), but it shows that the conclusion of your experiments (or the experiments themselves), are based on something concrete. Oh, and make sure that the links work during the review process (no 404 please).

- Show that your research matters and that it’s a problem worth solving. That should be in the intro, by using references of related scientific work but also to general reports (e.g. studies by mainstream media, business reports, etc.). That’s particularly true for competitive events that value practical research with real-world impact. If you can show that your research can solve large-scale problem (not CS problem, but general issue where CS is just a means to an end, e.g. environment, policy, etc.), that’s great.

- Make sure that any pictures / screenshots print well on black and white paper (unless, hmmm, you’re ready to pay for the journal extra costs). Make sure as well that their text (depiction, legend, etc.) can be properly read on the printed version, and ideally do not require to zoom the PDF. And if you’re writing your paper/figures using Word, make sure that the red-underlined text that the spell-checker hadn’t recognised doesn’t appear in your submission!

- Chose which side of the Atlantic you are, i.e. decide if your paper uses UK english or US english. You know: centre vs center, personalization vs personalisation etc. There may be some guidelines by the chairs/editors, so check those before your submission, in addition to other stylistic requirements (uppercase on titles, etc.). That’s a bit pedantic, but that shows that you care about all the details.

- Check the references. Especially, be sure there are no typos in the authors’s names, nor paper titles, etc. As the previous point, that seems minor, but that showcases how much precision you attach to details, and how precise and accurate you are in your overall article. A bibliography done at the last-minute/full-of-typos is generally not a good sign. If your BibTeX file is properly done, that should not be an issue, though.

- Adapt to your audience. If you were going to a party, would you explain your work/research the same way to a buch of geeks or to some MBAs? Probably not, and the same rule apply when writing a paper. Know the community/event you’re targeting and adapt the paper accordingly. That could mean explaining some acronyms that are obvious in a community but not in another one, adapting the background section (e.g. do not explain Linked Data if you’re submitting to LDOW, but do so for a general CS-Web conference), etc.

- Finally (easier said that done): wait one or two days between the final write-up and the submission. Ideally, do not touch the paper for a day or two, and read it again before the submission. You may see that you don’t understand some parts of what you wrote – so you can guess the reviewer will have no clue neither! Then, it’s time to rewrite these sections.

Remember, your research may be very good, but you have to present it in the best possible light to stand out of the crowd. In events where acceptance rate is under 25%, a minor detail can make the difference. So yes, whether you like it or not, there’s some marketing involved in getting your paper accepted.

Easy “copy and paste” from the Web to LaTeX with SPARQL

One thing I like about the Web is that content can be distributed, but still easily referenced and integrated using URIs. One this I don’t like about moving content from the Web to the desktop is copy and paste (actually, I also hate this when moving / syncing from one service to another).

Let’s take the example of my resume. I generally update my LinkedIn profile, but almost never the “desktop version” (LaTeX or doc) of my CV. So, when the time comes to forward an up-to-date resume in PDF, I used to copy and paste content from my LinkedIn. But wait, I also need to add my publications (even though bibtex makes the integration easy), the list of my research activities and talks, etc.

So, I’ve build sparqlTeX - a LaTeX class / python script to easily embed SPARQL results into TeX files. Using it, one can directly integrate data from SPARQL endpoints, from RDFa-enabled content (such as the previous talk pages), but also from any microformatted page (any Linkedin profile as they’re using hResume) into a LaTeX document. For the last one, I’m using any23 to convert such data into RDF (actually, any23 is used for any file-based query as it extracts RDF from HTML pages even if they’re not W3C-valid)

The scripts are available on github/sparqltex. They require roquet, and the SPARQLWrapper lib if you want to query remote endpoints. That’s a simple on-demand hack, so corner cases and complex structures are probably not managed directly, but feel free to clone the code, update (and push it back), this is public domain.

Here’s the kind of output that it generates, as you can see it’s synced with my LinkedIn profile or talk pages. And you can obviously also use hyperlinks in the templates, bringing the Web back to your resume !

Work section of my resume from LinkedIn to LaTeX through RDF+SPARQL

Talks section of my resume from RDFa-enabled content to LaTeX with SPARQL

SeatTrip – concert listing for your next trip (or “seatwave meets tripit”)

Another week-end, another MusicHackDay. This time, I’ve tried to new APIs:

  • seatwave – that just launched few days and that gives access to a wide range of events, including (obviously) concerts. Search by location, time-frame, venue (including coordinates!), and redirect to seatwave website to get event tickets. Interestingly, they do rev-share if some tickets are bought in one’s app using their API.
  • SendGrid – cloud-based e-mail services. Sending mails, but also – the most interesting part – receiving ones and parsing them. Simply configure a MX, a callback URL, and parse any incoming e-mail, including header, content and attachements – all in a REST-ful way

So, with those 2 APIs in mind, I’ve build SeatTrip – it’s like seatwave meets tripit. Send your plane ticket by e-mail, and get a listing of events that will happen in the area a few minutes later.

Sending a AerLingus ticket about a trip to London, I got the following e-mail in my inbox a few minutes later. First, a featured artist. I’m using our own seevl data to identify the featured artist using its meta-data, and display her/his biography.

Featured artist for your next trip!

Then, the listing of all concerts for the city at that time.

All concerts happening during your trip

For each event, it provide additional information from the seatwave API. First, it features a Google Map link to the venue (useful to buy your hotel nearby!).

Use GoogleMap to display the venue map

Also, it links to the seatwave website so that you can directly book your concert – and lists the number of remaining tickets if the show is almost sold-out!

Buy tickets from the seatwave website

Here’s now the fun part, about how this hack works:

  • First, once the e-mail is sent to an address mapped to SendGrip API – a PHP script extracts the location and the timeframe of the trip from the e-mail. The extraction if airline-specific, and so far the hack works only on AerLingus ticket (however, an abstraction layer allows to easily create new wrappers – a similar strategy as used in TripIt).
  • Then, the seatwave API is used to get the list of all events in the area for that period, including all events details.
  • Once we have the events, seevl is used to identify the featured artist from the list of available concerts.
  • Finally, the e-mail id rendered in HTML, and send via SendGrid.

It takes around 2 minutes to do the whole processing, check this short video to see it in action. Note that I changed the trip date as that was an old trip ticket for which seatwave didn’t get any data, and that the video has been cut to avoid the delay of receiving the e-mail with the listing.

Also, don’t forget to check this impressive list of 62 hacks - especially Buddhafy (mind-control for Spotify !) and Concerts2021 – the future of live gigs (or not, thanksfully ;-) !

Mixture – real-word music-discovery

MIDEM HackDay was – as expected – a wonderful event, where 18 hacks have been built over the week-end (screenshots on this MIDEM blog post).

In particular, I’d give my thumbs-up for Tourrent - helping bands to set-up their next tours based on Torrent downloads of their tracks, FlatDrop - Micropayments for Soundcloud tracks, and Badgify - audioscrobbler meets Arduino.

Together with Ian from rd.io and Guillaume from Webdoc, we’ve worked on a new music discovery approach, leveraging real-world data. Various 4-square hacks enabling geolocation-based discovery have been build on previous MHD, so we’ve decided to take another route: take a picture of anything, and play the songs that reference this thing. Whether it’s a bottle, some brown sugar, or a house.

So here’s Mixture, a simple hack / proof-of-concept of the approach, combining APIs from IQ Engines (image recognition – give it a try if you’re looking for something similar, even though queries can be a bit slow), musiXmatch (lyrics identification), rd.io (music streaming) and seevl (artist data). It may still be buggy (we’ll work on it) and some APIs have a daily-rate limit that could block the application, but you should be able to get the overall idea!

MIDEM HackDay teasing

When music meets the real world. It’s gonna be awesome.

 

About JSON-LD and Content-Negotiation

NB: This post was originally written on blog.seevl.net - I crosspost it here now as I’m starting to blog here again, and I think that’s definitely an approach and a data-serialisation worth using.

I wanted to write about our use of Content-Negotiation for a long time, and as we recently switched to JSON-LD as the unique format to represent our data, I decided to talk about both in this blog post.

As you may know (or can guess when checking the team page), we have been involved in efforts around the Semantic Web and Linked Data for many years, and are big fans of the graph model that these technologies offer. It’s flexible, it’s agile, and it’s a straightforward way to integrate and mash-up data from different sources. But when it comes to providing this data back to developers, things are more complex. One could chose to offer RDF/XML or Turtle, but that generally requires a new skillset. Why ? Because most platforms provide JSON, and developers are consequently more used to this than to other formats. Check in particular these recent stats (slide 22-24) from a talk that John Musser (ProgrammableWeb) gave at SemTech this year.

So, we decided to use JSON-LD for our data, after an initial home-grown JSON-modeling, that was not that far from the current JSON-LD spec. One thing I particularly like is that it enables to send “objects” over the wire rather than as a set of triples. For instance, consider this representation of facts about the Beatles:

{
    "@context": {
        "collaborated_with": "http://purl.org/ontology/mo/collaborated_with",
        "id": "http://purl.org/dc/terms/identifier",
        "origin": "http://purl.org/ontology/mo/origin",
        "prefLabel": "http://www.w3.org/2004/02/skos/core#prefLabel",
        "uri": "@subject"
    },
    "collaborated_with": [
        {
            "id": "hSmwe4Dq",
            "prefLabel": "The Quarrymen",
            "uri": "http://data.seevl.net/entity/hSmwe4Dq#id"
        }
    ],
    "origin": [
        {
            "id": "px6UYEPh",
            "prefLabel": "England",
            "uri": "http://data.seevl.net/entity/px6UYEPh#id"
        },
        {
            "id": "xMgUSM9b",
            "prefLabel": "Liverpool",
            "uri": "http://data.seevl.net/entity/xMgUSM9b#id"
        }
    ],
    "prefLabel": "The Beatles",
    "uri": "http://data.seevl.net/entity/pzkVnsbP#id"
}

If you are used to JSON, you probably understand with no additional effort, and can use any JSON toolkit to parse it. If you are aware of the Linked Data principles, you directly see that every entity has its own URI, that can also be accessed to get more infos about it. And if you care about triples, you can use a JSON-LD parser or the public playground, that will understand the @context values to translate this JSON-LD feed into raw triples (using MO and SKOS here). Clearly, the best of both worlds.

Then, Content-negotiation. When using a website and then deciding to develop around it,  I am often frustrated by the need to learn new URLs, new paths, new parameters. Why the hell humans and machines should have different way to access the same data, albeit in different formats ? This is exactly why we rely on Content-negotiation on data.seevl.net. By default, every page is rendered as HTML, but if you ask for JSON, you’ll get a JSON-LD representation of the same entity, separated into different “slices” (infos, link, facts, topics and related artists, as detailed on our dev zone). No need to learn new URIs, no additional paramaters. Just tell us you want JSON, and we’ll serve you what you need !

curl http://data.seevl.net/entity/?prefLabel=beatles
-H "Accept: application/json" -H "X_APP_ID: 1c55b80a"
-H "X_APP_KEY:65e7fbe154e8cee6c1704a9358dd8939"
Of course, we still want users to authenticate and gather metrics about data usage, but content-negotiation (versus a separate api.mydomain.org) does not prevent this at all. We are using 3scale and as opposed to most API-enabler, they do not proxy the API calls. This means that we can simply implement content-negotiation from our side (using our existing URIs), and just call them when authenticating and reporting metrics.
Overall, this combination of Content-negotiation and JSON-LD works like this (plus some other usual suspects such as Django, memcached, ApacheVarnish and Virtuoso - all on AWS)
Content-negotiation + JSON-LD

Content-negotiation + JSON-LD

To conclude this post, there are two things that really matter here:
  • First, WYSIWYM – What You See Is What You Mean. Using JSON-LD, we provide a view of our data directly mapped to our underlying model – in a simple JSON format. This helps to understand of how data is represented and how one can query it later (for example, “reversing” the previous representation to get a list of bands originated from Liverpool)
  • Then, we save costs. By implementing a Content-negotiation strategy, we have a single layer to maintain between users (humans and machines) and our data. That largely simplifies the deployment process, and minimises overhead. Also, every new feature is immediately available from both side with no added cost.
Enjoy a Web designed both for humans and machines. Enjoy a Web of Data.

Timeout for HTML5 localStorage

HTML5 localStorage is a nice way to store data on the client side. However, when using it to cache remote API calls, you may want to purge it from time to time. As per its spec, there’s no way to automatically set-up a timeOut, so here’s a tiny bit of jQuery that does the job:

var hours = 24; // Reset when storage is more than 24hours
var now = new Date().getTime();
var setupTime = localStorage.getItem('setupTime');
if (setupTime == null) {
    localStorage.setItem('setupTime', now)
} else {
    if(now-setupTime > hours*60*60*1000) {
        localStorage.clear()
        localStorage.setItem('setupTime', now);
    }
}