This occurred because there was a period when twitpic was serving the wrong user's content in t he RSS feed, and my feed was populated with these. Then the user who actually owned them deleted the pic, and thus the download of the thumbnail fails. But this code isn't detecting that the image download failed and is trying to get the length of the body, which is invalid on a non-200 response.
(store_entry) wrote entry e/7c972289a80e5d3d9fdbc5b527f03796
(download_media_file) downloading 'http://twitpic.com/show/thumb/5scayj'
(download) downloading http://twitpic.com/show/thumb/5scayj
Traceback (most recent call last):
File "./friendfeed-archive", line 571, in <module>
File "./friendfeed-archive", line 250, in download_feed_entries
updated_entries = save_entries(ffdata['entries'])
File "./friendfeed-archive", line 237, in save_entries
updated_entries += ffdb.store_entry(v2e, force=options.iterateall)
File "./friendfeed-archive", line 393, in store_entry
File "./friendfeed-archive", line 454, in download_entrys_media
(downloaded, sizebytes) = self.download_media_file(t['url'], force)
File "./friendfeed-archive", line 498, in download_media_file
sizebytes = int(headers.getheader('content-length'))
TypeError: int() argument must be a string or a number, not 'NoneType'
This also occurred with s3 content, specifically brightkite.com, who removed their s3 bucket.
One possibility is showing a filler image if no image can be downloaded. But one might have the image in question, even if it isn't available from the original service, so a command line option to set an image file for one that is missing would be good.
Unfortunately, it doesn't appear that urllib.urlretrieve, which is what is being used to download the content, exposes the HTTPResponse object, just the headers as an instance of httplib.HTTPMessage and the value of .status is ''. So the code that does the downloading will have to change.