Fetching latest commit…
Cannot retrieve the latest commit at this time.
|Failed to load latest commit information.|
RUST :: corroding misbehaving web bots since 2012 =================================== WARNING: killing a web spider will not do good things for your SEO. make sure you only trap bots which are not respecting the rules of the web: robots.txt. =================================== introduction - - - - - - - RUST is designed to hamper web crawlers that refuse to respect robots.txt. It does this by providing several examples of responses which do not end. RUST hamper efforts to crawl your site in an automated manner because it: - stops syncronous from downloading any actual resources - will tie up (at least one) of the crawlers' worker threads, which will limit the numbers of concurrent connections which can be made to your site. - adds many junk resources to the crawler's queue, creating processing bottlenecks These relies on the fact that most crawlers are written without worrying too much about the possibility that someone might attempt to prevent them. RUST acts as a honeypot. It uses 0% CPU when inactive, and uses gevent to handle potentially thousands of concurrent connections. installation - - - - - - $ pip install gevent web.py $ git clone https://github.com/timClicks/rust.git usage - - - Step 1: Warn good bots Add a disallowed resource to your application's robots.txt. Here is an example. Further details are available at http://www.robotstxt.org/robotstxt.html User-agent: * Disallow: /rust/ Step 2: Set link bait Add a hidden hyperlink to one of the disallowed resources. <a href="/rust/flood/" style="display:none;"></a> <a href="/rust/infinidom/" rel="nofollow noindex"></a> Well-behaved bots, like search engines, will ignore this link. The others, who we are targetting, will not. Step 3: Configure RUST Configure the routes within `rust.py`. With the current setup, you should chance the `routes` variable from routes = ( "/bounce/", "Bounce", "/bounce/(.*)", "Bounce", "/infinidom/", "InfiniDOM", "/flood/", "Flood", "/mute/", "Mute", "/trickle/", "Trickle", "/junkmail/", "Junkmail" ) to routes = ( "/rust/flood/", "Flood" ) Step 4: Deployment You will probably want to configure your web server to proxy any requests to disallowed URLs to a running instance of RUST. Refer to your server's documentation on WSGI deployment. further development - - - - - - - - - - It is very plausible that the HashDoS and Slowloris attacks could be included in RUST. Pull requests are welcome. _Slowloris_ could be implemented easily by streaming incorrect HTTP response headers. This would require breaking PEP-333. _HashDoS_ could be implemented by sending form data with unique name fields. Form names and values are generally stored as hash maps. legal notices - - - - - - - - Consumer Guarantees Act 1993 If you use this software for personal use, you may be entitled to certain guarantees provided by New Zealand law. - Copyright Software is owned by Tim McNamara <firstname.lastname@example.org>. - Licence Software is released under the GNU Affero General Public License (AGPL) <http://www.gnu.org/licenses/agpl.html>. - Trade marks RUST is an unregistered trade mark of Tim McNamara.