Editorial tips (for students) on writing a research paper

During the past few years, I (co-)authored and reviewed a lot of research papers, from workshops to journals. By being on both sides of the table, here are some editorial tips that I’d like to share on writing a convincing (Computer Science / Web Technologies) research paper, and hopefully getting it accepted (don’t blame me if it’s not!).

Before starting, there are a few things that every student should know about the process:

- First, and that’s the law (game?) of blind-peer-review, you don’t know who reviews your paper. This means that the reviewer can be a super-busy professor who’ll make a decision in 2 minutes just by reading your abstract, or a pedantic junior researcher complaining that you don’t cite his poster presentation at yet-another-unknown-workshop. Hopefully, reviewers are generally careful researchers taking time to properly read the paper, make an opinion based on your research and their knowledge of the topic. In this case, even if the paper is rejected, you should end-up with a useful review that will let you improve the paper for the re-submission, so don’t be upset.

- Then, keep in mind that your conference/workshop paper (journal are different) will not be the only one assigned to the reviewer for this event. In some cases, reviewers are assigned up to 10 papers; here, reviews are generally spread among colleague or students – and the main reviewer should ensure that the quality of reviews is OK. But even by delegating, a reviewer could have to read 5+ reviews. So, your paper must catch the attention of the reviewer, and stand out of the crowd. Are you aware of the 5 seconds rule for websites? There’s something similar for research papers: you have to wow the reviewers/readers since the abstract. Personally, I have a first feeling about a paper on the first minute reading it (e.g. compelling abstract with clear contributions VS topic that has been discussed 10+ times in the same venue, etc.). While reading the full paper is what makes the final decision, the first impression counts and can influence the decision.

That being said, here are the tips (not about the content nor value of your research, neither the writing style):

- Write a compelling abstract and introduction. Convey a “WOW!” effect the the reviewer, rather than a “OK, yet-another-paper on …”. This means: clearly identify the problem, the gap in the state-of-the-art and your contribution. If you can do that in a paragraph, you’ve done the hard part of the job: giving the reviewer an incentive to carefully read your paper. And you’ve showed that you can clearly articulate your research in just a few sentences.

- If your paper describes some kind of software, include at least screenshots and – unless that’s something confidential – links to a demo, source code or video. Show that you’ve really done what you promise in the paper, and that it’s not just vaporware. It doesn’t need to be a super-fancy system with bullet-proof source-code (at least if you’re in academia, that’s just a research prototype), but it shows that the conclusion of your experiments (or the experiments themselves), are based on something concrete. Oh, and make sure that the links work during the review process (no 404 please).

- Show that your research matters and that it’s a problem worth solving. That should be in the intro, by using references of related scientific work but also to general reports (e.g. studies by mainstream media, business reports, etc.). That’s particularly true for competitive events that value practical research with real-world impact. If you can show that your research can solve large-scale problem (not CS problem, but general issue where CS is just a means to an end, e.g. environment, policy, etc.), that’s great.

- Make sure that any pictures / screenshots print well on black and white paper (unless, hmmm, you’re ready to pay for the journal extra costs). Make sure as well that their text (depiction, legend, etc.) can be properly read on the printed version, and ideally do not require to zoom the PDF. And if you’re writing your paper/figures using Word, make sure that the red-underlined text that the spell-checker hadn’t recognised doesn’t appear in your submission!

- Chose which side of the Atlantic you are, i.e. decide if your paper uses UK english or US english. You know: centre vs center, personalization vs personalisation etc. There may be some guidelines by the chairs/editors, so check those before your submission, in addition to other stylistic requirements (uppercase on titles, etc.). That’s a bit pedantic, but that shows that you care about all the details.

- Check the references. Especially, be sure there are no typos in the authors’s names, nor paper titles, etc. As the previous point, that seems minor, but that showcases how much precision you attach to details, and how precise and accurate you are in your overall article. A bibliography done at the last-minute/full-of-typos is generally not a good sign. If your BibTeX file is properly done, that should not be an issue, though.

- Adapt to your audience. If you were going to a party, would you explain your work/research the same way to a buch of geeks or to some MBAs? Probably not, and the same rule apply when writing a paper. Know the community/event you’re targeting and adapt the paper accordingly. That could mean explaining some acronyms that are obvious in a community but not in another one, adapting the background section (e.g. do not explain Linked Data if you’re submitting to LDOW, but do so for a general CS-Web conference), etc.

- Finally (easier said that done): wait one or two days between the final write-up and the submission. Ideally, do not touch the paper for a day or two, and read it again before the submission. You may see that you don’t understand some parts of what you wrote – so you can guess the reviewer will have no clue neither! Then, it’s time to rewrite these sections.

Remember, your research may be very good, but you have to present it in the best possible light to stand out of the crowd. In events where acceptance rate is under 25%, a minor detail can make the difference. So yes, whether you like it or not, there’s some marketing involved in getting your paper accepted.

Easy “copy and paste” from the Web to LaTeX with SPARQL

One thing I like about the Web is that content can be distributed, but still easily referenced and integrated using URIs. One this I don’t like about moving content from the Web to the desktop is copy and paste (actually, I also hate this when moving / syncing from one service to another).

Let’s take the example of my resume. I generally update my LinkedIn profile, but almost never the “desktop version” (LaTeX or doc) of my CV. So, when the time comes to forward an up-to-date resume in PDF, I used to copy and paste content from my LinkedIn. But wait, I also need to add my publications (even though bibtex makes the integration easy), the list of my research activities and talks, etc.

So, I’ve build sparqlTeX - a LaTeX class / python script to easily embed SPARQL results into TeX files. Using it, one can directly integrate data from SPARQL endpoints, from RDFa-enabled content (such as the previous talk pages), but also from any microformatted page (any Linkedin profile as they’re using hResume) into a LaTeX document. For the last one, I’m using any23 to convert such data into RDF (actually, any23 is used for any file-based query as it extracts RDF from HTML pages even if they’re not W3C-valid)

The scripts are available on github/sparqltex. They require roquet, and the SPARQLWrapper lib if you want to query remote endpoints. That’s a simple on-demand hack, so corner cases and complex structures are probably not managed directly, but feel free to clone the code, update (and push it back), this is public domain.

Here’s the kind of output that it generates, as you can see it’s synced with my LinkedIn profile or talk pages. And you can obviously also use hyperlinks in the templates, bringing the Web back to your resume !

Work section of my resume from LinkedIn to LaTeX through RDF+SPARQL

Talks section of my resume from RDFa-enabled content to LaTeX with SPARQL

SeatTrip – concert listing for your next trip (or “seatwave meets tripit”)

Another week-end, another MusicHackDay. This time, I’ve tried to new APIs:

  • seatwave – that just launched few days and that gives access to a wide range of events, including (obviously) concerts. Search by location, time-frame, venue (including coordinates!), and redirect to seatwave website to get event tickets. Interestingly, they do rev-share if some tickets are bought in one’s app using their API.
  • SendGrid – cloud-based e-mail services. Sending mails, but also – the most interesting part – receiving ones and parsing them. Simply configure a MX, a callback URL, and parse any incoming e-mail, including header, content and attachements – all in a REST-ful way

So, with those 2 APIs in mind, I’ve build SeatTrip – it’s like seatwave meets tripit. Send your plane ticket by e-mail, and get a listing of events that will happen in the area a few minutes later.

Sending a AerLingus ticket about a trip to London, I got the following e-mail in my inbox a few minutes later. First, a featured artist. I’m using our own seevl data to identify the featured artist using its meta-data, and display her/his biography.

Featured artist for your next trip!

Then, the listing of all concerts for the city at that time.

All concerts happening during your trip

For each event, it provide additional information from the seatwave API. First, it features a Google Map link to the venue (useful to buy your hotel nearby!).

Use GoogleMap to display the venue map

Also, it links to the seatwave website so that you can directly book your concert – and lists the number of remaining tickets if the show is almost sold-out!

Buy tickets from the seatwave website

Here’s now the fun part, about how this hack works:

  • First, once the e-mail is sent to an address mapped to SendGrip API – a PHP script extracts the location and the timeframe of the trip from the e-mail. The extraction if airline-specific, and so far the hack works only on AerLingus ticket (however, an abstraction layer allows to easily create new wrappers – a similar strategy as used in TripIt).
  • Then, the seatwave API is used to get the list of all events in the area for that period, including all events details.
  • Once we have the events, seevl is used to identify the featured artist from the list of available concerts.
  • Finally, the e-mail id rendered in HTML, and send via SendGrid.

It takes around 2 minutes to do the whole processing, check this short video to see it in action. Note that I changed the trip date as that was an old trip ticket for which seatwave didn’t get any data, and that the video has been cut to avoid the delay of receiving the e-mail with the listing.

Also, don’t forget to check this impressive list of 62 hacks - especially Buddhafy (mind-control for Spotify !) and Concerts2021 – the future of live gigs (or not, thanksfully ;-) !

Mixture – real-word music-discovery

MIDEM HackDay was – as expected – a wonderful event, where 18 hacks have been built over the week-end (screenshots on this MIDEM blog post).

In particular, I’d give my thumbs-up for Tourrent - helping bands to set-up their next tours based on Torrent downloads of their tracks, FlatDrop - Micropayments for Soundcloud tracks, and Badgify - audioscrobbler meets Arduino.

Together with Ian from rd.io and Guillaume from Webdoc, we’ve worked on a new music discovery approach, leveraging real-world data. Various 4-square hacks enabling geolocation-based discovery have been build on previous MHD, so we’ve decided to take another route: take a picture of anything, and play the songs that reference this thing. Whether it’s a bottle, some brown sugar, or a house.

So here’s Mixture, a simple hack / proof-of-concept of the approach, combining APIs from IQ Engines (image recognition – give it a try if you’re looking for something similar, even though queries can be a bit slow), musiXmatch (lyrics identification), rd.io (music streaming) and seevl (artist data). It may still be buggy (we’ll work on it) and some APIs have a daily-rate limit that could block the application, but you should be able to get the overall idea!

MIDEM HackDay teasing

When music meets the real world. It’s gonna be awesome.

 

About JSON-LD and Content-Negotiation

NB: This post was originally written on blog.seevl.net - I crosspost it here now as I’m starting to blog here again, and I think that’s definitely an approach and a data-serialisation worth using.

I wanted to write about our use of Content-Negotiation for a long time, and as we recently switched to JSON-LD as the unique format to represent our data, I decided to talk about both in this blog post.

As you may know (or can guess when checking the team page), we have been involved in efforts around the Semantic Web and Linked Data for many years, and are big fans of the graph model that these technologies offer. It’s flexible, it’s agile, and it’s a straightforward way to integrate and mash-up data from different sources. But when it comes to providing this data back to developers, things are more complex. One could chose to offer RDF/XML or Turtle, but that generally requires a new skillset. Why ? Because most platforms provide JSON, and developers are consequently more used to this than to other formats. Check in particular these recent stats (slide 22-24) from a talk that John Musser (ProgrammableWeb) gave at SemTech this year.

So, we decided to use JSON-LD for our data, after an initial home-grown JSON-modeling, that was not that far from the current JSON-LD spec. One thing I particularly like is that it enables to send “objects” over the wire rather than as a set of triples. For instance, consider this representation of facts about the Beatles:

{
    "@context": {
        "collaborated_with": "http://purl.org/ontology/mo/collaborated_with",
        "id": "http://purl.org/dc/terms/identifier",
        "origin": "http://purl.org/ontology/mo/origin",
        "prefLabel": "http://www.w3.org/2004/02/skos/core#prefLabel",
        "uri": "@subject"
    },
    "collaborated_with": [
        {
            "id": "hSmwe4Dq",
            "prefLabel": "The Quarrymen",
            "uri": "http://data.seevl.net/entity/hSmwe4Dq#id"
        }
    ],
    "origin": [
        {
            "id": "px6UYEPh",
            "prefLabel": "England",
            "uri": "http://data.seevl.net/entity/px6UYEPh#id"
        },
        {
            "id": "xMgUSM9b",
            "prefLabel": "Liverpool",
            "uri": "http://data.seevl.net/entity/xMgUSM9b#id"
        }
    ],
    "prefLabel": "The Beatles",
    "uri": "http://data.seevl.net/entity/pzkVnsbP#id"
}

If you are used to JSON, you probably understand with no additional effort, and can use any JSON toolkit to parse it. If you are aware of the Linked Data principles, you directly see that every entity has its own URI, that can also be accessed to get more infos about it. And if you care about triples, you can use a JSON-LD parser or the public playground, that will understand the @context values to translate this JSON-LD feed into raw triples (using MO and SKOS here). Clearly, the best of both worlds.

Then, Content-negotiation. When using a website and then deciding to develop around it,  I am often frustrated by the need to learn new URLs, new paths, new parameters. Why the hell humans and machines should have different way to access the same data, albeit in different formats ? This is exactly why we rely on Content-negotiation on data.seevl.net. By default, every page is rendered as HTML, but if you ask for JSON, you’ll get a JSON-LD representation of the same entity, separated into different “slices” (infos, link, facts, topics and related artists, as detailed on our dev zone). No need to learn new URIs, no additional paramaters. Just tell us you want JSON, and we’ll serve you what you need !

curl http://data.seevl.net/entity/?prefLabel=beatles
-H "Accept: application/json" -H "X_APP_ID: 1c55b80a"
-H "X_APP_KEY:65e7fbe154e8cee6c1704a9358dd8939"
Of course, we still want users to authenticate and gather metrics about data usage, but content-negotiation (versus a separate api.mydomain.org) does not prevent this at all. We are using 3scale and as opposed to most API-enabler, they do not proxy the API calls. This means that we can simply implement content-negotiation from our side (using our existing URIs), and just call them when authenticating and reporting metrics.
Overall, this combination of Content-negotiation and JSON-LD works like this (plus some other usual suspects such as Django, memcached, ApacheVarnish and Virtuoso - all on AWS)
Content-negotiation + JSON-LD

Content-negotiation + JSON-LD

To conclude this post, there are two things that really matter here:
  • First, WYSIWYM – What You See Is What You Mean. Using JSON-LD, we provide a view of our data directly mapped to our underlying model – in a simple JSON format. This helps to understand of how data is represented and how one can query it later (for example, “reversing” the previous representation to get a list of bands originated from Liverpool)
  • Then, we save costs. By implementing a Content-negotiation strategy, we have a single layer to maintain between users (humans and machines) and our data. That largely simplifies the deployment process, and minimises overhead. Also, every new feature is immediately available from both side with no added cost.
Enjoy a Web designed both for humans and machines. Enjoy a Web of Data.

JSON-LD – JavaScript Object Notation for Linking Data

JSON-LD – JavaScript Object Notation for Linking Data

The best of both worlds.

Timeout for HTML5 localStorage

HTML5 localStorage is a nice way to store data on the client side. However, when using it to cache remote API calls, you may want to purge it from time to time. As per its spec, there’s no way to automatically set-up a timeOut, so here’s a tiny bit of jQuery that does the job:

var hours = 24; // Reset when storage is more than 24hours
var now = new Date().getTime();
var setupTime = localStorage.getItem('setupTime');
if (setupTime == null) {
    localStorage.setItem('setupTime', now)
} else {
    if(now-setupTime > hours*60*60*1000) {
        localStorage.clear()
        localStorage.setItem('setupTime', now);
    }
}

Back to blogging

Well, not really, even though I should find a way to restore all the posts of my previous blog somewhere on that domain.

I’ve setup this Tumblr for random notes, pics, etc. and a few longer posts. Yet, I guess that most of the blogging side will be on http://blog.seevl.net – both from the tech and the business side, discussing what we’re building over there in the music and data space. I’ll try to link them – or crosspost some bits and pieces – from here.

Internships positions: Web Engineering and BizDev

Originally posted on blog.seevl.net, and probably also relevant for the readers of this website or planetrdf.

You’re passionate about music? Want to join a team working on state-of-the-art Linked Data, Social Semantic Web and NoSQL technologies? Want to work with third-party services to let them know what they can do with our data? Let’s join the team! We are currently looking for interns on Web Engineering and Business Development. The internships last between three and six months and are based in Galway, Ireland.

About Us

Seevl provides a new way to explore the cultural and musical universe of your favorite artists, and lets you discover new ones by understanding how they are connected. We are a spin-out of the Digital Enterprise Research Institute (DERI), at the National University or Ireland, Galway. DERI is the world’s largest Semantic Web research institute, and Seevl brings together several years of R&D on the Semantic Web, Linked Data, Social Web and Web Science areas.

About You

You are passionate about Music, Social Web and Linked Data technologies. You think that the Web could be a better place for music discovery if data was more integrated, structured and interlinked – and also more open. You want to investigate the related engineering or business development challenges.

Required Skills

  • Good written and oral communication
  • Motivation, autonomy and self-organisation
  • Ongoing appetite for knowledge
  • Good musical knowledge, whatever the style is

Additional Skills for the Engineering Position

  • Python, Django, HTML, Javascript
  • RDF(S), Linked Data, APIs, REST architectures

Additional Skills for the BizDev Position

  • Knowledge of the Music and / or Web industries
  • Interest in Open Source / Open Data and related business models

Benefits

  • The opportunity to work on exciting and emerging Web standards and to integrate Music and technology in a brand-new product
  • A stimulating, dynamic and multi-cultural workplace combined with a spin-out environment, having strong ties with world-class researchers
  • Flexible working hours and free coffee

Want to apply?

Great. Please send us your resume (or LinkedIn profile), a short statement on why you are interested and some links to your blog and Twitter account to jobs[AT]seevl.net. For the engineering position, please send a link to your github account (or equivalent).