Skip to content

Commit

Permalink
better must-match pattern for intranet file-crawls
Browse files Browse the repository at this point in the history
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@7218 6c8d7289-2bf4-0310-a012-ef5d649a1542
  • Loading branch information
orbiter committed Oct 4, 2010
1 parent aacf572 commit 574346f
Showing 1 changed file with 1 addition and 1 deletion.
2 changes: 1 addition & 1 deletion htroot/Crawler_p.java
Expand Up @@ -144,7 +144,7 @@ public static serverObjects respond(final RequestHeader header, final serverObje
if (newcrawlingMustMatch.length() < 2) newcrawlingMustMatch = CrawlProfile.MATCH_ALL; // avoid that all urls are filtered out if bad value was submitted
// special cases:
if (crawlingStartURL!= null && fullDomain) {
newcrawlingMustMatch = crawlingStartURL.isFile() ? "file:///.*" : crawlingStartURL.isSMB() ? "smb://.*" : ".*" + crawlingStartURL.getHost() + ".*";
newcrawlingMustMatch = crawlingStartURL.isFile() ? "file://" + crawlingStartURL.getPath() + ".*" : crawlingStartURL.isSMB() ? "smb://" + crawlingStartURL.getPath() + ".*" : ".*" + crawlingStartURL.getHost() + ".*";
}
if (crawlingStart!= null && subPath && (pos = crawlingStart.lastIndexOf('/')) > 0) {
newcrawlingMustMatch = crawlingStart.substring(0, pos + 1) + ".*";
Expand Down

0 comments on commit 574346f

Please sign in to comment.