external image links that can't be downloaded cause crash #2

Open
thwarted opened this Issue Jan 22, 2012 · 3 comments

1 participant

@thwarted
Owner

This occurred because there was a period when twitpic was serving the wrong user's content in t he RSS feed, and my feed was populated with these. Then the user who actually owned them deleted the pic, and thus the download of the thumbnail fails. But this code isn't detecting that the image download failed and is trying to get the length of the body, which is invalid on a non-200 response.

(store_entry) wrote entry e/7c972289a80e5d3d9fdbc5b527f03796
(download_media_file) downloading 'http://twitpic.com/show/thumb/5scayj'
(download) downloading http://twitpic.com/show/thumb/5scayj
Traceback (most recent call last):
  File "./friendfeed-archive", line 571, in <module>
    download_feed_entries(feedname, options)
  File "./friendfeed-archive", line 250, in download_feed_entries
    updated_entries = save_entries(ffdata['entries'])
  File "./friendfeed-archive", line 237, in save_entries
    updated_entries += ffdb.store_entry(v2e, force=options.iterateall)
  File "./friendfeed-archive", line 393, in store_entry
    self.download_entrys_media(mergedentry)
  File "./friendfeed-archive", line 454, in download_entrys_media
    (downloaded, sizebytes) = self.download_media_file(t['url'], force)
  File "./friendfeed-archive", line 498, in download_media_file
    sizebytes = int(headers.getheader('content-length'))
TypeError: int() argument must be a string or a number, not 'NoneType'
@thwarted
Owner

This also occurred with s3 content, specifically brightkite.com, who removed their s3 bucket.

@thwarted
Owner

One possibility is showing a filler image if no image can be downloaded. But one might have the image in question, even if it isn't available from the original service, so a command line option to set an image file for one that is missing would be good.

@thwarted
Owner

Unfortunately, it doesn't appear that urllib.urlretrieve, which is what is being used to download the content, exposes the HTTPResponse object, just the headers as an instance of httplib.HTTPMessage and the value of .status is ''. So the code that does the downloading will have to change.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment