Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
added crawl queue monitor for global crawls
git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@372 6c8d7289-2bf4-0310-a012-ef5d649a1542
- Loading branch information
orbiter
committed
Jul 4, 2005
1 parent
e2d8766
commit 252c6e4
Showing
8 changed files
with
186 additions
and
58 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,48 @@ | ||
<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.0 Transitional//EN" "http://www.w3.org/TR/REC-html40/loose.dtd"> | ||
<html> | ||
<head> | ||
<title>YaCy: Index Creation / WWW Global Crawl Queue</title> | ||
#[metas]# | ||
</head> | ||
<body marginheight="0" marginwidth="0" leftmargin="0" topmargin="0"> | ||
#[header]# | ||
#[submenuIndexCreate]# | ||
<br> | ||
<h2>Index Creation: WWW Global Crawl Queue</h2> | ||
<p> | ||
This queue stores the urls that shall be sent to other peers to perform a remote crawl. | ||
If there is no peer for remote crawling available, the links are crawled locally. | ||
</p> | ||
<p> | ||
#(crawler-queue)# | ||
The global crawler queue is empty<br><br> | ||
:: | ||
<form action="IndexCreateWWWGlobalQueue_p.html" method="post" enctype="multipart/form-data"> | ||
<input type="submit" name="clearcrawlqueue" value="clear global crawl queue"> | ||
</form> | ||
<br> | ||
There are #[num]# entries in the global crawler queue. Showing #[show-num]# most recent entries: | ||
<table border="0" cellpadding="2" cellspacing="1"> | ||
<tr class="TableHeader"> | ||
<th class="small">Initiator</th> | ||
<th class="small">Depth</th> | ||
<th class="small">Modified Date</th> | ||
<th class="small">Anchor Name</th> | ||
<th class="small">URL</th> | ||
</tr> | ||
#{list}# | ||
<tr class="TableCell#(dark)#Light::Dark#(/dark)#" class="small"> | ||
<td width="60" class="small">#[initiator]#</td> | ||
<td width="10" class="small">#[depth]#</td> | ||
<td width="80" class="small">#[modified]#</td> | ||
<td width="180" class="small">#[anchor]#</td> | ||
<td class="small"><a class="small" href="#[url]#">#[url]#</a></td> | ||
</tr> | ||
#{/list}# | ||
</table> | ||
#(/crawler-queue)# | ||
</p> | ||
|
||
#[footer]# | ||
</body> | ||
</html> |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,118 @@ | ||
// IndexCreateWWWCrawlQueue_p.java | ||
// ------------------------------- | ||
// part of the AnomicHTTPD caching proxy | ||
// (C) by Michael Peter Christen; mc@anomic.de | ||
// first published on http://www.anomic.de | ||
// Frankfurt, Germany, 2004, 2005 | ||
// last major change: 04.07.2005 | ||
// | ||
// This program is free software; you can redistribute it and/or modify | ||
// it under the terms of the GNU General Public License as published by | ||
// the Free Software Foundation; either version 2 of the License, or | ||
// (at your option) any later version. | ||
// | ||
// This program is distributed in the hope that it will be useful, | ||
// but WITHOUT ANY WARRANTY; without even the implied warranty of | ||
// MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the | ||
// GNU General Public License for more details. | ||
// | ||
// You should have received a copy of the GNU General Public License | ||
// along with this program; if not, write to the Free Software | ||
// Foundation, Inc., 59 Temple Place, Suite 330, Boston, MA 02111-1307 USA | ||
// | ||
// Using this software in any meaning (reading, learning, copying, compiling, | ||
// running) means that you agree that the Author(s) is (are) not responsible | ||
// for cost, loss of data or any harm that may be caused directly or indirectly | ||
// by usage of this softare or this documentation. The usage of this software | ||
// is on your own risk. The installation and usage (starting/running) of this | ||
// software may allow other people or application to access your computer and | ||
// any attached devices and is highly dependent on the configuration of the | ||
// software which must be done by the user of the software; the author(s) is | ||
// (are) also not responsible for proper configuration and usage of the | ||
// software, even if provoked by documentation provided together with | ||
// the software. | ||
// | ||
// Any changes to this file according to the GPL as documented in the file | ||
// gpl.txt aside this file in the shipment you received can be done to the | ||
// lines that follows this copyright notice here, but changes must not be | ||
// done inside the copyright notive above. A re-distribution must contain | ||
// the intact and unchanged copyright notice. | ||
// Contributions and changes to the program code must be marked as such. | ||
|
||
// You must compile this file with | ||
// javac -classpath .:../classes IndexCreate_p.java | ||
// if the shell's current path is HTROOT | ||
|
||
import java.text.SimpleDateFormat; | ||
import java.util.Date; | ||
import java.util.Locale; | ||
|
||
import de.anomic.http.httpHeader; | ||
import de.anomic.plasma.plasmaCrawlNURL; | ||
import de.anomic.plasma.plasmaSwitchboard; | ||
import de.anomic.server.serverObjects; | ||
import de.anomic.server.serverSwitch; | ||
import de.anomic.yacy.yacyCore; | ||
import de.anomic.yacy.yacySeed; | ||
|
||
public class IndexCreateWWWGlobalQueue_p { | ||
|
||
private static SimpleDateFormat dayFormatter = new SimpleDateFormat("yyyy/MM/dd", Locale.US); | ||
private static String daydate(Date date) { | ||
if (date == null) return ""; else return dayFormatter.format(date); | ||
} | ||
|
||
public static serverObjects respond(httpHeader header, serverObjects post, serverSwitch env) { | ||
// return variable that accumulates replacements | ||
plasmaSwitchboard switchboard = (plasmaSwitchboard) env; | ||
serverObjects prop = new serverObjects(); | ||
|
||
if (post != null) { | ||
if (post.containsKey("clearcrawlqueue")) { | ||
String urlHash; | ||
int c = 0; | ||
while (switchboard.urlPool.noticeURL.stackSize(plasmaCrawlNURL.STACK_TYPE_LIMIT) > 0) { | ||
urlHash = switchboard.urlPool.noticeURL.pop(plasmaCrawlNURL.STACK_TYPE_LIMIT).hash(); | ||
if (urlHash != null) { switchboard.urlPool.noticeURL.remove(urlHash); c++; } | ||
} | ||
prop.put("info", 3);//crawling queue cleared | ||
prop.put("info_numEntries", c); | ||
} | ||
} | ||
|
||
int stackSize = switchboard.urlPool.noticeURL.stackSize(plasmaCrawlNURL.STACK_TYPE_LIMIT); | ||
if (stackSize == 0) { | ||
prop.put("crawler-queue", 0); | ||
} else { | ||
prop.put("crawler-queue", 1); | ||
plasmaCrawlNURL.entry[] crawlerList = switchboard.urlPool.noticeURL.top(plasmaCrawlNURL.STACK_TYPE_LIMIT, 100); | ||
prop.put("crawler-queue_num", stackSize);//num Entries | ||
prop.put("crawler-queue_show-num", crawlerList.length); //showin sjow-num most recent | ||
plasmaCrawlNURL.entry urle; | ||
boolean dark = true; | ||
yacySeed initiator; | ||
int i; | ||
for (i = 0; i < crawlerList.length; i++) { | ||
urle = crawlerList[i]; | ||
if (urle != null) { | ||
initiator = yacyCore.seedDB.getConnected(urle.initiator()); | ||
prop.put("crawler-queue_list_"+i+"_dark", ((dark) ? 1 : 0) ); | ||
prop.put("crawler-queue_list_"+i+"_initiator", ((initiator == null) ? "proxy" : initiator.getName()) ); | ||
prop.put("crawler-queue_list_"+i+"_depth", urle.depth()); | ||
prop.put("crawler-queue_list_"+i+"_modified", daydate(urle.loaddate()) ); | ||
prop.put("crawler-queue_list_"+i+"_anchor", urle.name()); | ||
prop.put("crawler-queue_list_"+i+"_url", urle.url()); | ||
dark = !dark; | ||
} | ||
} | ||
prop.put("crawler-queue_list", i); | ||
} | ||
|
||
// return rewrite properties | ||
return prop; | ||
} | ||
|
||
} | ||
|
||
|
||
|
17 changes: 10 additions & 7 deletions
17
htroot/IndexCreateWWWLocalCrawlQueue_p.html → htroot/IndexCreateWWWLocalQueue_p.html
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters