Skip to content

Commit

Permalink
*) more correct robots.txt validation
Browse files Browse the repository at this point in the history
   - isDisallowed now uses getFile instead of getPath

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@1870 6c8d7289-2bf4-0310-a012-ef5d649a1542
  • Loading branch information
theli committed Mar 9, 2006
1 parent f046e18 commit 734d18f
Showing 1 changed file with 6 additions and 1 deletion.
7 changes: 6 additions & 1 deletion source/de/anomic/data/robotsParser.java
Expand Up @@ -67,6 +67,11 @@
* It only parses the Deny Part, yet.
* *
* http://www.robotstxt.org/wc/norobots-rfc.html
*
* TODO:
* - On the request attempt resulted in temporary failure a robot
* should defer visits to the site until such time as the resource
* can be retrieved.
*/
public final class robotsParser{

Expand Down Expand Up @@ -263,7 +268,7 @@ public static boolean isDisallowed(URL nexturl) {
}
}

if (robotsTxt4Host.isDisallowed(nexturl.getPath())) {
if (robotsTxt4Host.isDisallowed(nexturl.getFile())) {
return true;
}
return false;
Expand Down

0 comments on commit 734d18f

Please sign in to comment.