Curling for web sites

2008-05-27 • Shell, Curl • Comments

I wanted information about ISO 3166-1 alpha-2 country codes. Google found me the definitive link (http://www.iso.org/iso/country_codes/iso_3166_code_lists.htm) but clicking on it showed the ISO website to be temporarily down for maintenance.

Rather than check back again every few minutes or hunt for stale information in the google cache, I got curl and bash to notify me when the site went live.

$ url=http://www.iso.org/iso/country_codes/iso_3166_code_lists.htm
$ curl -I $url
HTTP/1.1 302 Found
Date: Tue, 27 May 2008 08:00:44 GMT
Server: BIG-IP
Location: http://www.iso.org/error/sitedown.html
Via: 1.1 www.iso.org
Connection: close
Content-Type: text/html

Curl -I fetches the page header only, which in this case uses a 302 status code to temporarily redirect clients to the sitedown.html page. Using this information I wrote a simple while loop to ping the site every minute and determine when this status changed.

$ http_status() { curl -I -s $1 | head -1 | cut -d " " -f 2; }
$ while [ $(http_status $url) == 302 ]; do sleep 60; done; open $url

Open is an OS X thing: when the loop completes open just opens the web page in a browser tab.

To run this command in the background, & it.

$ (while [ $(http_status $url) == 302 ]; do sleep 60; done; open $url)&
[1] 808

Here, the job has a handle of 1 and a process id of 808. You can recover this information using jobs.

$ jobs
[1]+  Running                 ( while [ $(http_status $url) == 302 ]; do
    sleep 300;
done; open $url ) &

If you need to kill the job, kill %1 does the trick.

Word Aligned

tales from the code face

Curling for web sites

Tagged

Chain

Feeds