Hosting problems
As you may seen, I had a lot of hosting problems recently. Moreover, since I was away, I did not have time to check what’s wrong and since gandi beta hosting still have some issues about rebooting servers I was not able to restart it easily. Now I’m back, I finally restarted my box, updated Apache config (fingers crossed now), but still looking on how to optimize, as it takes most of the memory and seems to make other services unavailable (especially ssh and mysql).
All my apologies to those who where trying to access this website and one of the other website hosted there (MOAT, foafmap, doapstore) during the last two weeks.
Server fixed
I’ve just fixed some configuration problems on this Debian box (sid), which made my website and other hosted services (SIOC browser, foafmap …) unavailable since yesterday. Trouble came from apache2.2 and libapache2-mod-php5, so I switched back to Apache 2.0.
Sorry for those that were looking for something here !
PEAR HTTP Request and Apache redirections
I’ve just noticed that my SIOCwiki-2-rdf script didn’t resolve SIOC data URL for a few blogs from SIOC enabled sites. I first thaught it was the autodiscovery regexp that failed (it was the case for only one blog), but actually, the error came from the part of the script that fetch pages.
Indeed, on the wiki page, I mentionned my blog URL was http://www.apassant.net/blog/. Yet, this page is now redirected to http://apassant.net/blog using Apache RedirectMatch.
So, when using HTTP_Request to get the content of the page, it doesn’t return the expected body, but this page:
<!DOCTYPE HTML PUBLIC "-//IETF//DTD HTML 2.0//EN"> <html><head> <title>302 Found</title> </head><body> <h1>Found</h1> <p>The document has moved <a href="http://apassant.net/blog/">here</a>.</p> </body></html>
in which that’s difficult to find any reference to a SIOC link …
But HTTP_Request provides a getResponseCode() method - will return 302 in this case - and a getResponseHeader() method, that will give the following informations:
Array ( [date] => Fri, 04 Aug 2006 11:10:33 GMT [server] => Apache/2.0.55 (Debian) mod_python/3.2.8 Python/2.4.4c0 PHP/5.1.4-0.1 [location] => http://apassant.net/blog/ [content-length] => 209 [connection] => close [content-type] => text/html; charset=iso-8859-1 [x-pad] => avoid browser bug )
So, using it, I can get the new location of the page. Yet, there are use cases where the location contains a relative URL (see http://www.openlinksw.com/blog/~kidehen).
So, finally, here’s the code I now use to get URL content, whatever they’ve moved or not:
function url_get_content($url, $visited=array()) { if(in_array($url, $visited)) { return "Error: infinite redirection"; } else { $visited[] = $url; } $req =& new HTTP_Request($url); if (!PEAR::isError($req->sendRequest())) { if(in_array($req->getResponseCode(), array('301', '302', '303'))) { $headers = $req->getResponseHeader(); $location = $headers['location']; if(array_key_exists('scheme', parse_url($location))) { return url_get_content($location, $visited); } else { $parsed = parse_url($url); $scheme = $parsed['scheme']; $host = $parsed['host']; return ($port = $parsed['port']) ? url_get_content("$scheme://$host$location", $visited) : url_get_content("$scheme:$port//$host$location", $visited); } } return $req->getResponseBody(); } else { return "Error: " . $req->getResponseCode(); } }
I’ve fixed it in the script, that now returns an complete RDF file with SIOC data URLs for each blog that used an auto-discovery link.
Edit 09/08/2006 @ 13:30: Fixed a bug about infinite loops, see Richard comment about this point.
Back to Apache2
As I’ve got a lot of errors while running PHP scripts in FactCGI mode under LightTPD, and have not enough time to see what’s wrong with it (the scripts run fine in command line, but the webserver returns a 500 error when running it), I finally switched back to Apache2.
URL rewriting should have been enabled as it used to be, but tell me if there are any 404.
