Skip to content

HTTPS clone URL

Subversion checkout URL

You can clone with HTTPS or Subversion.

Download ZIP
Newer
Older
100644 136 lines (89 sloc) 3.453 kb
17e8652 @timClicks README tweaked
authored
1 RUST :: corroding misbehaving web bots since 2012
20b3534 @timClicks README expanded
authored
2
3
4 ===================================
5 WARNING: killing a web spider will
6 not do good things for your SEO.
7 make sure you only trap bots which
8 are not respecting the rules of the
9 web: robots.txt.
10 ===================================
11
12
13 introduction
14 - - - - - - -
15
16 RUST is designed to hamper web crawlers that refuse to respect
17 robots.txt. It does this by providing several examples of
18 responses which do not end. RUST hamper efforts to crawl
19 your site in an automated manner because it:
20
21 - stops syncronous from downloading any actual resources
22
23 - will tie up (at least one) of the crawlers' worker
24 threads, which will limit
25 the numbers of concurrent connections which can
26 be made to your site.
27
28 - adds many junk resources to the crawler's queue,
29 creating processing bottlenecks
30
31 These relies on the fact that most crawlers are written
32 without worrying too much about the possibility that
33 someone might attempt to prevent them.
34
35 RUST acts as a honeypot. It uses 0% CPU when inactive,
36 and uses gevent to handle potentially thousands of
37 concurrent connections.
38
39
8b33120 @timClicks Added (brief) installation instructions
authored
40 installation
41 - - - - - -
42
43 $ pip install gevent web.py
44 $ git clone https://github.com/timClicks/rust.git
45
46
47
20b3534 @timClicks README expanded
authored
48 usage
49 - - -
50
51
52 Step 1: Warn good bots
53
54 Add a disallowed resource to your application's robots.txt.
55 Here is an example. Further details are available at
56 http://www.robotstxt.org/robotstxt.html
57
58 User-agent: *
59 Disallow: /rust/
60
61
62 Step 2: Set link bait
63
64 Add a hidden hyperlink to one of the disallowed resources.
65
17e8652 @timClicks README tweaked
authored
66 <a href="/rust/flood/" style="display:none;"></a>
67
68 <a href="/rust/infinidom/" rel="nofollow noindex"></a>
20b3534 @timClicks README expanded
authored
69
70 Well-behaved bots, like search engines, will ignore this
71 link. The others, who we are targetting, will not.
72
73
74 Step 3: Configure RUST
75
76 Configure the routes within `rust.py`. With the current
77 setup, you should chance the `routes` variable from
78
79 routes = (
80 "/bounce/", "Bounce",
81 "/bounce/(.*)", "Bounce",
82 "/infinidom/", "InfiniDOM",
83 "/flood/", "Flood",
84 "/mute/", "Mute",
85 "/trickle/", "Trickle",
86 "/junkmail/", "Junkmail"
87 )
88
89 to
90
91 routes = (
92 "/rust/flood/", "Flood"
93 )
94
95
96 Step 4: Deployment
97
98 You will probably want to configure your
99 web server to proxy any requests to disallowed
100 URLs to a running instance of RUST. Refer to
101 your server's documentation on WSGI deployment.
102
103
104 further development
105 - - - - - - - - - -
106
107 It is very plausible that the HashDoS and Slowloris attacks
108 could be included in RUST. Pull requests are welcome.
109
110 _Slowloris_ could be implemented easily by streaming
17e8652 @timClicks README tweaked
authored
111 incorrect HTTP response headers. This
112 would require breaking PEP-333.
20b3534 @timClicks README expanded
authored
113
17e8652 @timClicks README tweaked
authored
114 _HashDoS_ could be implemented by sending form data
115 with unique name fields. Form names and values
116 are generally stored as hash maps.
5b71642 @timClicks initial commit
authored
117
118
119 legal notices
120 - - - - - - -
121
20b3534 @timClicks README expanded
authored
122 - Consumer Guarantees Act 1993
123 If you use this software for personal use, you may
124 be entitled to certain guarantees provided by
125 New Zealand law.
126
127 - Copyright
128 Software is owned by Tim McNamara <code@timmcnamara.co.nz>.
129
130 - Licence
7cc995f @timClicks Code now under AGPL
authored
131 Software is released under the GNU Affero General Public
132 License (AGPL) <http://www.gnu.org/licenses/agpl.html>.
5b71642 @timClicks initial commit
authored
133
20b3534 @timClicks README expanded
authored
134 - Trade marks
135 RUST is an unregistered trade mark of Tim McNamara.
Something went wrong with that request. Please try again.