msgbartop
Code Musings and Such
msgbarbottom

04 Mar 09 Scraping and Saving Flickr Images with Ruby

I've been playing around a bit more with the black arts of spidering and scraping in Ruby and I'm still amazed by how easy it is to do. For fun I whipped up a little script that will spider a Flickr photostream and download all the images.

Flickr provides a wonderful api and there's even a great Ruby interface for it, so this script is entirely futile. But it was fun and educational.

Usage

 
ruby init.rb yourusername /location/to/save
 

Download it!

Digg This
Reddit This
Stumble Now!
Buzz This
Vote on DZone
Share on Facebook
Bookmark this on Delicious
Kick It on DotNetKicks.com
Shout it
Share on LinkedIn
Bookmark this on Technorati
Post on Twitter
Google Buzz (aka. Google Reader)

Tags: , ,

01 Mar 09 Finding and Fixing Broken Images with Ruby

A family members was having a problem with some mixed up image names on a static html site. I could have fixed it manually in a few shakes, but that's no fun. Instead I used hpricot to scrape, open-uri to test for broken-ness, Find to search and some good old fashion regex to correct.

This was my first time messing around with hpricot and I found it to be powerful and easy to use, two thumbs up. I foresee some scraping and spidering posts in the near future.

On to the code:

My final script was a bit hairy so I broke out the bit I used to find the broken images.

If you run the script it'll print the offending paths to screen:

ruby image_scanner.rb http://site.com/busted.html

Or you can call the get_broken_images method to get an array back:

require 'image_scanner'
scanner = Image_Scanner.new
broken_images = scanner.get_broken_images "http://site.com/busted.html"

In case you're interested, I've also uploaded the full code that I used to search for and correct the images although it's implementation specific, riddled with lazy and is poorly tested. Read the disclaimer!

Just run it and be amazed!

ruby image_scanner.rb http://site.com/busted.html /media_folder /busted.html /fixed.html

Download only the broken image scanner
Download the full script

Digg This
Reddit This
Stumble Now!
Buzz This
Vote on DZone
Share on Facebook
Bookmark this on Delicious
Kick It on DotNetKicks.com
Shout it
Share on LinkedIn
Bookmark this on Technorati
Post on Twitter
Google Buzz (aka. Google Reader)

Tags: ,