Geocoding with Ruby On Rails and Google Maps

As GoogleMaps now offers geocoding services (which works really fine for European streets !), you can use it in any web application to get coordinates from a given address. The services use REST to send queries, and you can get results in XML or JSON.

I’m currently hacking with RoR, so I first tested geocoding with a short ruby snippet:

 require 'open-uri'  require 'rubygems'  require 'hpricot'  key = 'xxx'  address = "105+avenue+de+la+Republique,+Paris,+France"  url = "http://maps.google.com/maps/geo?q=#{address}&output=xml&key=#{key}"  open(url) do |file|    @body = file.read    doc = Hpricot(@body)    (doc/:point/:coordinates).each do |link|      long, lat = link.inner_html.split(',')    end  end

I used hpricot to parse XML results, and even if it’s designed for HTML it works fine in this use case.

Yet, my initial goal was to automatically get coordinates from any address in a RoR application. I decided to add longitude and latitude properties to the class I wanted to geolocate, and add a before_save method in the model.

 # Get lat, long from address before save  def before_save    require 'open-uri'    address = CGI::escape("#{self.address},#{self.city},#{self.state},#{self.country}")    url = "http://maps.google.com/maps/geo?q=#{address}&output=xml&key=#{key}"    open(url) do |file|      @body = file.read      doc = Hpricot(@body)      (doc/:point/:coordinates).each do |link|        self.longitude, self.latitude = link.inner_html.split(',')      end    end  end

That’s it, every time a user create or edit an instance, its coordinates will be added. Well, I certainly should do more tests, but as my knowledge of ruby and rails is limited at the moment, I’ll start with this. If you think something should be changed, feel free to comment this post :) (btw, can someone tell me why I must use require 'open-uri' here, even if that’s already included in my environment.rb ?)

About these ads

Groumph … (ou comment installer des gems avec Locomotive)

Deux soirs que je m’acharne sur Locomotive, à essayer de comprendre pourquoi les gems que j’installe sont inaccessibles dans mes applis rails.

Et ce soir, la révélation !

 Marvin:~/Work/rails/rateitems alex$ gem install rfuzz  Attempting local installation of 'rfuzz'  Local gem file not found: rfuzz*.gem  Attempting remote installation of 'rfuzz'  Select which gem to install for your platform (i686-darwin8.6.1)   1. rfuzz 0.8 (mswin32)   2. rfuzz 0.8 (ruby)   3. rfuzz 0.7 (ruby)   4. rfuzz 0.7 (mswin32)   5. rfuzz 0.6 (ruby)   6. Cancel installation  > 2  Building native extensions.  This could take a while...  /Applications/Locomotive2/Bundles/rails112.locobundle/i386/lib/ruby/1.8/fileutils.rb:243:    command not found: make  /Applications/Locomotive2/Bundles/rails112.locobundle/i386/lib/ruby/1.8/fileutils.rb:243:    command not found: make install  [...]

Forcément, une compilation sans make, ça marche pas terrible …

Donc, install d’XCode, et c’est reparti:

  Marvin:~/Work/rails/rateitems alex$ gem install rfuzz  Attempting local installation of 'rfuzz'  Local gem file not found: rfuzz*.gem  Attempting remote installation of 'rfuzz'  Select which gem to install for your platform (i686-darwin8.6.1)   1. rfuzz 0.8 (mswin32)   2. rfuzz 0.8 (ruby)   3. rfuzz 0.7 (ruby)   4. rfuzz 0.7 (mswin32)   5. rfuzz 0.6 (ruby)   6. Cancel installation  > 2  Building native extensions.  This could take a while...  dyld: Library not loaded: /usr/i686-apple-darwin8/lib/libgcc_s.1.dylib    Referenced from: /usr/bin/gcc    Reason: image not found  make: *** [http11_client.o] Trace/BPT trap  dyld: Library not loaded: /usr/i686-apple-darwin8/lib/libgcc_s.1.dylib    Referenced from: /usr/bin/gcc    Reason: image not found  [...]

Un petit tour sur google pour au final trouver ce thread et modifier le path

 export DYLD_FALLBACK_LIBRARY_PATH=$DYLD_FALLBACK_LIBRARY_PATH:/usr/lib

Et hop, ca roule ! (même s’il reste quelques /Users/ryan/Desktop/building/min/framework/lib qui doivent être codé en dur quelque part).

Allez, un bug report histoire de signaler ça, et je retourne jouer avec rails.

Overview of the SIOC browser

I promised to do it about 2 months ago, here’s finally an overview of my SIOC browser.

As we’ll work on it with Uldis for our BlogTalk presentation with John, here’s the time for a few informations.

So, the browser is only a part of a more important architecture which consists in:

  • SIOC data from people creating it with some of the currently available exporters. Data could be produced by weblogs (dotclear example), forums (anime.ie example using Drupal) or on-line services, as TalkDigger;
  • Uldis SIOC crawler to get the data locally;
  • an RDF Store (I’m currently using 3store – will try to blog about it soon – but I’ve started with Joseki) to store this data;
  • the browser as a query / visualization interface.

(While writing this post, Uldis shown me this other view of the SIOC architecture)

The browser itself, consists of a set of query pages (well, currently only 3 of them ;) ) to visualize the content of the store with an user-friendly (I hope) AJAX interface.

Each query is define in a XML config file, which contains:

  • A title
  • A page description
  • The main SPARQL query
  • Query filters
  • Formating results function
  • Item SPARQL query
  • Item formating results function

So the files contains both parts of SPARQL and PHP code, as it can be seen in comments.xml file.

These config files are rendered by the browser engine – a PHP5 class that dynamically create rendering functions using part of code contained in the config files – that displays the browsing page, runs the main query without filters, using the formatting function to display results. Then, users can apply filters, that will reload the query using AJAX, or select result items to launch the Item query, where results will be formatted using the other method defined in the config file.

Regarding the interface with the store, queries are send over HTTP using SPARQL, and the results should be JSON formatted, to be parsed with a PHP extension.

So what’s need to be done ?

A few ideas:

  • Move XML config files to RDF ones ?;
  • Find cool queries;
  • Graphical queries / interfaces;
  • Add more data :)

We should then release the browser, so that you can play with your own data. Hope everything will be ready for BlogTalk !

PEAR HTTP Request and Apache redirections

I’ve just noticed that my SIOCwiki-2-rdf script didn’t resolve SIOC data URL for a few blogs from SIOC enabled sites. I first thaught it was the autodiscovery regexp that failed (it was the case for only one blog), but actually, the error came from the part of the script that fetch pages.

Indeed, on the wiki page, I mentionned my blog URL was http://www.apassant.net/blog/. Yet, this page is now redirected to http://apassant.net/blog using Apache RedirectMatch.

So, when using HTTP_Request to get the content of the page, it doesn’t return the expected body, but this page:

 <!DOCTYPE HTML PUBLIC "-//IETF//DTD HTML 2.0//EN">  <html><head>  <title>302 Found</title>  </head><body>  <h1>Found</h1>  <p>The document has moved <a href="http://apassant.net/blog/">here</a>.</p>  </body></html>

in which that’s difficult to find any reference to a SIOC link …

But HTTP_Request provides a getResponseCode() method – will return 302 in this case – and a getResponseHeader() method, that will give the following informations:

 Array  (    [date] => Fri, 04 Aug 2006 11:10:33 GMT    [server] => Apache/2.0.55 (Debian) mod_python/3.2.8 Python/2.4.4c0 PHP/5.1.4-0.1    [location] => http://apassant.net/blog/    [content-length] => 209    [connection] => close    [content-type] => text/html; charset=iso-8859-1    [x-pad] => avoid browser bug  )

So, using it, I can get the new location of the page. Yet, there are use cases where the location contains a relative URL (see http://www.openlinksw.com/blog/~kidehen).

So, finally, here’s the code I now use to get URL content, whatever they’ve moved or not:

 function url_get_content($url, $visited=array()) {    if(in_array($url, $visited)) {      return "Error: infinite redirection";    } else {      $visited[] = $url;    }    $req =& new HTTP_Request($url);    if (!PEAR::isError($req->sendRequest())) {      if(in_array($req->getResponseCode(), array('301', '302', '303'))) {        $headers = $req->getResponseHeader();        $location = $headers['location'];        if(array_key_exists('scheme', parse_url($location))) {          return url_get_content($location, $visited);        } else {          $parsed = parse_url($url);          $scheme = $parsed['scheme'];          $host = $parsed['host'];          return ($port = $parsed['port']) ?            url_get_content("$scheme://$host$location", $visited) :            url_get_content("$scheme:$port//$host$location", $visited);        }      }      return $req->getResponseBody();    } else {      return "Error: " . $req->getResponseCode();    }  }

I’ve fixed it in the script, that now returns an complete RDF file with SIOC data URLs for each blog that used an auto-discovery link.

Edit 09/08/2006 @ 13:30: Fixed a bug about infinite loops, see Richard comment about this point.