Skip to content

Commit

Permalink
added a 'transition feature' that shall lower the barrier to move fro…
Browse files Browse the repository at this point in the history
…m g**gle to yacy (yes!):

Here a new concept called 'search heuristics' is introduced. A heuristic is a kind of 'shortcut' to good results in IT, here for good search results. In this case it will be used to get a very transparent way to compare what YaCy is able to produce as search result and what g**gle produces as search result. Here is what your can do now:
- add the phrase 'heuristic:scroogle' to your search query, like 'oil spill heuristic:scroogle' and then a call to scroogle is made to get anonymous search results from g**gle.
- these results are _not_ taken as meta-search results, but are used to instantly feed a crawling and indexing process. This happens very fast, here 20 results from scroogle are taken and loaded all simultanously, parsed and indexed immediately and from the results of the parsed content the search result is feeded, along to the normal p2p search
- when new results from that heuristic (more to come) get part of the search results, then it is verified if such results are redundant to existing (they had been part of the normal YaCy search result anyway) or if they had been completely new to YaCy.
- in the search results the new search results from heuristics are marked with a 'H ++' and search results from heuristics that had been already found by YaCy are marked with a 'H ='. That means:
- you can now see YaCy and Scroogle search results in one result page but you also see that you would not have 'missed' the g**gle results when you would only have used YaCy.

- to make it short: YaCy now subsumes g**gle results. If you use only YaCy, you miss nothing.

to come: a configuration page that let you configure the usage of heuristics and get this feature by default.

git-svn-id: https://svn.berlios.de/svnroot/repos/yacy/trunk@6944 6c8d7289-2bf4-0310-a012-ef5d649a1542
  • Loading branch information
orbiter committed Jun 25, 2010
1 parent d5d48b8 commit dcd0169
Show file tree
Hide file tree
Showing 13 changed files with 255 additions and 164 deletions.
1 change: 0 additions & 1 deletion htroot/Crawler_p.java
Expand Up @@ -376,7 +376,6 @@ public static serverObjects respond(final RequestHeader header, final serverObje
final Map.Entry<MultiProtocolURI, String> e = linkiterator.next();
if (e.getKey() == null) continue;
nexturl = new DigestURI(e.getKey());
if (nexturl == null) continue;

// enqueuing the url for crawling
sb.crawlStacker.enqueueEntry(new Request(
Expand Down
6 changes: 3 additions & 3 deletions htroot/env/base.css
Expand Up @@ -490,9 +490,9 @@ a:hover.MenuItemLink {

div.urlactions a {
display:block;
width: 12px;
height: 12px;
margin: 2px 0px;
width: 11px;
height: 11px;
margin: 0px 0px 0px 3px;
}

a.bookmarklink:hover, div.searchresults:hover a.bookmarklink, div.searchresults.hover a.bookmarklink {
Expand Down
Binary file added htroot/env/grafics/heuristic_new.gif
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added htroot/env/grafics/heuristic_redundant.gif
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
10 changes: 9 additions & 1 deletion htroot/yacysearch.java
Expand Up @@ -332,6 +332,12 @@ public static serverObjects respond(final RequestHeader header, final serverObje
while (sitehost.endsWith(".")) sitehost = sitehost.substring(0, sitehost.length() - 1);
sitehash = DigestURI.domhash(sitehost);
}

int heuristic = querystring.indexOf("heuristic:scroogle");
if (heuristic >= 0) {
querystring = querystring.replace("heuristic:scroogle", "");
}

int authori = querystring.indexOf("author:");
String authorhash = null;
if (authori >= 0) {
Expand Down Expand Up @@ -503,7 +509,9 @@ public static serverObjects respond(final RequestHeader header, final serverObje
final SearchEvent theSearch = SearchEventCache.getEvent(theQuery, sb.peers, sb.crawlResults, (sb.isRobinsonMode()) ? sb.clusterhashes : null, false, sb.loader);
try {Thread.sleep(global ? 100 : 10);} catch (InterruptedException e1) {} // wait a little time to get first results in the search

if (sitehost != null && authenticated) sb.quickFillSite(sitehost, theSearch);
if (sitehost != null && authenticated) sb.heuristicSite(theSearch, sitehost);
if (heuristic >= 0 && authenticated) sb.heuristicScroogle(theSearch);

// generate result object
//serverLog.logFine("LOCAL_SEARCH", "SEARCH TIME AFTER ORDERING OF SEARCH RESULTS: " + (System.currentTimeMillis() - timestamp) + " ms");
//serverLog.logFine("LOCAL_SEARCH", "SEARCH TIME AFTER RESULT PREPARATION: " + (System.currentTimeMillis() - timestamp) + " ms");
Expand Down
24 changes: 14 additions & 10 deletions htroot/yacysearchitem.html
@@ -1,20 +1,24 @@
#(content)#::
<div class="searchresults">
<h4 class="linktitle">
<img src="ViewImage.png?width=16&amp;height=16&amp;code=#[faviconCode]#" id="f#[urlhash]#" class="favicon" style="width:16px; height:16px;" alt="" />
<img width="16" height="16" src="ViewImage.png?width=16&amp;height=16&amp;code=#[faviconCode]#" id="f#[urlhash]#" class="favicon" style="width:16px; height:16px;" alt="" />
<a href="#[link]#">#[title]#</a></h4>
#(authorized)#::
<div class="urlactions">
<a href="/Bookmarks.html?edit=#[urlhash]#" class="bookmarklink" title="bookmark"><img src="/env/grafics/empty.gif" title="bookmark" alt="bookmark" class="bookmarkIcon" /></a>
<div class="urlactions">
#(heuristic)#::
<img width="16" height="9" src="/env/grafics/heuristic_redundant.gif" title="heuristic:#[name]# (redundant)" style="width:16px; height:9px;" alt="heuristic#[name]# (redundant)"/>::
<img width="16" height="9" src="/env/grafics/heuristic_new.gif" title="heuristic:#[name]# (new link)" style="width:16px; height:9px;" alt="heuristic#[name]# (new link)"/>
#(/heuristic)#
#(authorized)#::
<a href="/Bookmarks.html?edit=#[urlhash]#" class="bookmarklink" title="bookmark"><img width="11" height="11" src="/env/grafics/empty.gif" title="bookmark" alt="bookmark" class="recommendIcon" /></a>
#(recommend)#
<img src="/env/grafics/empty.gif" title="" alt="recommend" class="recommendIcon" />
<img src="/env/grafics/empty.gif" title="" alt="delete" class="deleteIcon" />
<img width="11" height="11" src="/env/grafics/empty.gif" title="" alt="recommend" class="recommendIcon" />
<img width="11" height="11" src="/env/grafics/empty.gif" title="" alt="delete" class="deleteIcon" />
::
<a href="#[recommendlink]#" class="recommendlink" title="recommend"><img src="/env/grafics/empty.gif" title="recommend" alt="recommend" class="recommendIcon" /></a>
<a href="#[deletelink]#" title="delete" class="deletelink" ><img src="/env/grafics/empty.gif" title="delete" alt="delete" class="deleteIcon" /></a>
<a href="#[recommendlink]#" class="recommendlink" title="recommend"><img width="11" height="11" src="/env/grafics/empty.gif" title="recommend" alt="recommend" class="recommendIcon" /></a>
<a href="#[deletelink]#" title="delete" class="deletelink" ><img width="11" height="11" src="/env/grafics/empty.gif" title="delete" alt="delete" class="deleteIcon" /></a>
#(/recommend)#
</div>
#(/authorized)#
#(/authorized)#
</div>
<p class="snippet"><span class="snippetLoaded" id="h#[urlhash]#">#[description]#</span></p>
<p class="url"><a href="#[link]#" id="url#[urlhash]#">#[urlname]#</a></p>
<p class="urlinfo">#[date]# | #[sizename]# | <a href="api/yacydoc.html?urlhash=#[urlhash]#" onclick="return hs.htmlExpand(this, { objectType: 'ajax'} )">Metadata</a> | <a href="ViewFile.html?urlHash=#[urlhash]#&amp;words=#[words]#&amp;display=#[display]#">Parser</a> | <a href="yacysearch.html?cat=image&amp;url=#[link]#&amp;query=#[former]#&amp;display=#[display]#">Pictures</a></p>
Expand Down
11 changes: 11 additions & 0 deletions htroot/yacysearchitem.java
Expand Up @@ -117,6 +117,17 @@ public static serverObjects respond(final RequestHeader header, final serverObje
prop.putHTML("content_authorized_recommend_deletelink", "/yacysearch.html?query=" + theQuery.queryString.replace(' ', '+') + "&Enter=Search&count=" + theQuery.displayResults() + "&offset=" + (theQuery.neededResults() - theQuery.displayResults()) + "&order=" + crypt.simpleEncode(theQuery.ranking.toExternalString()) + "&resource=local&time=3&deleteref=" + new String(result.hash()) + "&urlmaskfilter=.*");
prop.putHTML("content_authorized_recommend_recommendlink", "/yacysearch.html?query=" + theQuery.queryString.replace(' ', '+') + "&Enter=Search&count=" + theQuery.displayResults() + "&offset=" + (theQuery.neededResults() - theQuery.displayResults()) + "&order=" + crypt.simpleEncode(theQuery.ranking.toExternalString()) + "&resource=local&time=3&recommendref=" + new String(result.hash()) + "&urlmaskfilter=.*");
prop.put("content_authorized_urlhash", new String(result.hash()));
SearchEvent.HeuristicResult heuristic = theSearch.getHeuristic(result.hash());
if (heuristic == null) {
prop.put("content_heuristic", 0);
} else {
if (heuristic.redundant) {
prop.put("content_heuristic", 1);
} else {
prop.put("content_heuristic", 2);
}
prop.put("content_heuristic_name", heuristic.heuristicName);
}
String resulthashString = new String(result.hash());
prop.putHTML("content_title", result.title());
prop.putXML("content_title-xml", result.title());
Expand Down

0 comments on commit dcd0169

Please sign in to comment.