Join GitHub today
GitHub is home to over 31 million developers working together to host and review code, manage projects, and build software together.Sign up
Change robots.txt to exclude only media proxy URLs #10038
I am not sure many people would be thrilled about being archived by archive.org in the first place. Making this specific to Googlebot is not a good idea, because there is Bing, Yandex, etc etc. I guess it would be easier to add a whitelist entry for archive.org instead.
Okay, @nightpool provided compelling arguments about why artists might expect their art to show up in Google Image Search, and why the followers/following pages should be excluded via a noindex meta tag instead. The media_proxy URL is, however, a valid exclusion.