Hosting problems

As you may seen, I had a lot of hosting problems recently. Moreover, since I was away, I did not have time to check what’s wrong and since gandi beta hosting still have some issues about rebooting servers I was not able to restart it easily. Now I’m back, I finally restarted my box, updated Apache config (fingers crossed now), but still looking on how to optimize, as it takes most of the memory and seems to make other services unavailable (especially ssh and mysql).

All my apologies to those who where trying to access this website and one of the other website hosted there (MOAT, foafmap, doapstore) during the last two weeks.

Flickr feeds and feedparser

Gunnar asked this morning on #swig how to handle Flickr feeds, and especially images and thumbnails with feedparser. I remembered I hacked it once to make it work, but lost my changes with a server crash … hopefully I submitted a bug report about it.

Adding these 2 methods make it works as needed[1]:

def _start_media_content(self, attrsD):
  url = attrsD.get('url')
  if url:
    self._save('media_content', url)
def _start_media_thumbnail(self, attrsD):
  url = attrsD.get('url')
  if url:
    self._save('media_thumbnail', url)

Then, you can get thumbnails from Flickr feeds:

 
>>> import feedparser 
>>> d = feedparser.parse("http://api.flickr.com/services/feeds/photos_public.gne?id=33669349@N00&format=rss_200")
>>> print d['entries'][0]['media_thumbnail']
http://farm1.static.flickr.com/83/268382663_a5102bc6dd_s.jpg

Notes

[1] In case you don’t have any Python XML parser you will need to rename the second one _start_thumbnail, yet I didn’t check how to make the first one work in that case.

Plugins Dotclear2 et patterns d’URL

Après avoir ajouté le support des flux RSS 1.0 sous Dotclear 2 en modifiant le source (oui, je sais c’est moche), je m’attaque à la même chose, plus propre, sous forme de plug-in, qui sera probablement la première brique d’un plugin Web Sémantique pour Dotclear2 (waouh !).

L’API de DC2 pour l’écriture de plugins permet entre autre de définir des patterns d’URLs pour lesquels des actions vont être déclenchées. Problème: définir un pattern d’URL déjà pris en charge par un autre composant de DC2, via une autre expression rationnelle. Dans mon cas ^feed/(.+)/rdf$ déjà reconnue par ^feed/(.+)$ (dans la gestion des flux). La solution réside dans le nom que vous donnez à la forme de base de l’URL, à savoir le 2nd paramètre de la fonction register.

En effet, DC2 ordonne la liste des patterns d’URL par ordre alphabétique inverse de ces identifiants.

Dans le cas présent, la forme pour le matching de flux étant simplement feed, le fait d’identifier la nouvelle URL par feed/rdf fera que l’expression régulière associée sera matchée avant l’expression classique de traitement des flux, et qu’ainsi le plug-in “surchargera” le coeur de DC2 à ce niveau, le nouveau pattern-matching étant appelé avant celui pour les flux classique:

Edit 15:30: … j’avais oublié le pattern pour les tags, nommé feed/tag. Pour régler l’ordre, il faut donc dans ce cas – par exemple – nommer l’URL feed_rdf pour régler le problème, afin d’appeler le pattern avant celui nommé feed/tag:

 $core->url->register('rdf_feed','feed_rdf','^feed((.+)rdf(.*))$',array('dcSemwebURL','rdf_feed')); 

Geocoding with Ruby On Rails and Google Maps

As GoogleMaps now offers geocoding services (which works really fine for European streets !), you can use it in any web application to get coordinates from a given address. The services use REST to send queries, and you can get results in XML or JSON.

I’m currently hacking with RoR, so I first tested geocoding with a short ruby snippet:

 require 'open-uri'  require 'rubygems'  require 'hpricot'  key = 'xxx'  address = "105+avenue+de+la+Republique,+Paris,+France"  url = "http://maps.google.com/maps/geo?q=#{address}&output=xml&key=#{key}"  open(url) do |file|    @body = file.read    doc = Hpricot(@body)    (doc/:point/:coordinates).each do |link|      long, lat = link.inner_html.split(',')    end  end

I used hpricot to parse XML results, and even if it’s designed for HTML it works fine in this use case.

Yet, my initial goal was to automatically get coordinates from any address in a RoR application. I decided to add longitude and latitude properties to the class I wanted to geolocate, and add a before_save method in the model.

 # Get lat, long from address before save  def before_save    require 'open-uri'    address = CGI::escape("#{self.address},#{self.city},#{self.state},#{self.country}")    url = "http://maps.google.com/maps/geo?q=#{address}&output=xml&key=#{key}"    open(url) do |file|      @body = file.read      doc = Hpricot(@body)      (doc/:point/:coordinates).each do |link|        self.longitude, self.latitude = link.inner_html.split(',')      end    end  end

That’s it, every time a user create or edit an instance, its coordinates will be added. Well, I certainly should do more tests, but as my knowledge of ruby and rails is limited at the moment, I’ll start with this. If you think something should be changed, feel free to comment this post :) (btw, can someone tell me why I must use require 'open-uri' here, even if that’s already included in my environment.rb ?)

Groumph … (ou comment installer des gems avec Locomotive)

Deux soirs que je m’acharne sur Locomotive, à essayer de comprendre pourquoi les gems que j’installe sont inaccessibles dans mes applis rails.

Et ce soir, la révélation !

 Marvin:~/Work/rails/rateitems alex$ gem install rfuzz  Attempting local installation of 'rfuzz'  Local gem file not found: rfuzz*.gem  Attempting remote installation of 'rfuzz'  Select which gem to install for your platform (i686-darwin8.6.1)   1. rfuzz 0.8 (mswin32)   2. rfuzz 0.8 (ruby)   3. rfuzz 0.7 (ruby)   4. rfuzz 0.7 (mswin32)   5. rfuzz 0.6 (ruby)   6. Cancel installation  > 2  Building native extensions.  This could take a while...  /Applications/Locomotive2/Bundles/rails112.locobundle/i386/lib/ruby/1.8/fileutils.rb:243:    command not found: make  /Applications/Locomotive2/Bundles/rails112.locobundle/i386/lib/ruby/1.8/fileutils.rb:243:    command not found: make install  [...]

Forcément, une compilation sans make, ça marche pas terrible …

Donc, install d’XCode, et c’est reparti:

  Marvin:~/Work/rails/rateitems alex$ gem install rfuzz  Attempting local installation of 'rfuzz'  Local gem file not found: rfuzz*.gem  Attempting remote installation of 'rfuzz'  Select which gem to install for your platform (i686-darwin8.6.1)   1. rfuzz 0.8 (mswin32)   2. rfuzz 0.8 (ruby)   3. rfuzz 0.7 (ruby)   4. rfuzz 0.7 (mswin32)   5. rfuzz 0.6 (ruby)   6. Cancel installation  > 2  Building native extensions.  This could take a while...  dyld: Library not loaded: /usr/i686-apple-darwin8/lib/libgcc_s.1.dylib    Referenced from: /usr/bin/gcc    Reason: image not found  make: *** [http11_client.o] Trace/BPT trap  dyld: Library not loaded: /usr/i686-apple-darwin8/lib/libgcc_s.1.dylib    Referenced from: /usr/bin/gcc    Reason: image not found  [...]

Un petit tour sur google pour au final trouver ce thread et modifier le path

 export DYLD_FALLBACK_LIBRARY_PATH=$DYLD_FALLBACK_LIBRARY_PATH:/usr/lib

Et hop, ca roule ! (même s’il reste quelques /Users/ryan/Desktop/building/min/framework/lib qui doivent être codé en dur quelque part).

Allez, un bug report histoire de signaler ça, et je retourne jouer avec rails.

PEAR HTTP Request and Apache redirections

I’ve just noticed that my SIOCwiki-2-rdf script didn’t resolve SIOC data URL for a few blogs from SIOC enabled sites. I first thaught it was the autodiscovery regexp that failed (it was the case for only one blog), but actually, the error came from the part of the script that fetch pages.

Indeed, on the wiki page, I mentionned my blog URL was http://www.apassant.net/blog/. Yet, this page is now redirected to http://apassant.net/blog using Apache RedirectMatch.

So, when using HTTP_Request to get the content of the page, it doesn’t return the expected body, but this page:

 <!DOCTYPE HTML PUBLIC "-//IETF//DTD HTML 2.0//EN">  <html><head>  <title>302 Found</title>  </head><body>  <h1>Found</h1>  <p>The document has moved <a href="http://apassant.net/blog/">here</a>.</p>  </body></html>

in which that’s difficult to find any reference to a SIOC link …

But HTTP_Request provides a getResponseCode() method – will return 302 in this case – and a getResponseHeader() method, that will give the following informations:

 Array  (    [date] => Fri, 04 Aug 2006 11:10:33 GMT    [server] => Apache/2.0.55 (Debian) mod_python/3.2.8 Python/2.4.4c0 PHP/5.1.4-0.1    [location] => http://apassant.net/blog/    [content-length] => 209    [connection] => close    [content-type] => text/html; charset=iso-8859-1    [x-pad] => avoid browser bug  )

So, using it, I can get the new location of the page. Yet, there are use cases where the location contains a relative URL (see http://www.openlinksw.com/blog/~kidehen).

So, finally, here’s the code I now use to get URL content, whatever they’ve moved or not:

 function url_get_content($url, $visited=array()) {    if(in_array($url, $visited)) {      return "Error: infinite redirection";    } else {      $visited[] = $url;    }    $req =& new HTTP_Request($url);    if (!PEAR::isError($req->sendRequest())) {      if(in_array($req->getResponseCode(), array('301', '302', '303'))) {        $headers = $req->getResponseHeader();        $location = $headers['location'];        if(array_key_exists('scheme', parse_url($location))) {          return url_get_content($location, $visited);        } else {          $parsed = parse_url($url);          $scheme = $parsed['scheme'];          $host = $parsed['host'];          return ($port = $parsed['port']) ?            url_get_content("$scheme://$host$location", $visited) :            url_get_content("$scheme:$port//$host$location", $visited);        }      }      return $req->getResponseBody();    } else {      return "Error: " . $req->getResponseCode();    }  }

I’ve fixed it in the script, that now returns an complete RDF file with SIOC data URLs for each blog that used an auto-discovery link.

Edit 09/08/2006 @ 13:30: Fixed a bug about infinite loops, see Richard comment about this point.