Permalink
Commits on Feb 24, 2017
  1. fix htmlParser <script> text extraction on code containing expression

    recognized as tag like 1<a
    reported in #109
    
    Script content is ignored by default, but the text is filtered for html
    tags. Modified scraper to skip tag filtering while within a <script> 
    section (until a closing tag is detected </script>. 
    Possible side effect, missing </script> end-tag will truncate trailing 
    content text.
    reger24 committed Feb 24, 2017
Commits on Feb 23, 2017
  1. Improved MultiprocotolURL non ASCII characters support.

    After @sinkuu Pull Request #108 added JUnit tests, updated some JavaDoc
    and also improved URL tokenization to support non ASCII characters.
    luccioman committed Feb 23, 2017
  2. Merge pull request #110 from goofy-bz/patch-1

    Fixing some typos
    luccioman committed on GitHub Feb 23, 2017
  3. Fixing some typos

    up to line #1000 only
    goofy-bz committed on GitHub Feb 23, 2017
Commits on Feb 22, 2017
  1. Correct dublincore title property text to lowercase in htmlresponsewr…

    …iter,
    
    remove unused (carry over) local variable
    Do the same for other responsewriter.
    reger24 committed Feb 22, 2017
  2. Update SearchEvent.java

    Fix NPE on disabled local SolrIndex, occuring on search moving to the 2nd result page.
    The debug purpose only setting to disabeling local SolrIndex (System Admin -> Debug Settings) should long term probably be removed from production code.
    reger24 committed on GitHub Feb 22, 2017
Commits on Feb 21, 2017
  1. Switched some Solr fields from mandatory to optional

    These fields are default enabled but with no doubt not strictly
    mandatory with the current code base.
    
    As reported by @reger24, splitting between essential mandatory and
    optional fields is still to be improved to reflect the current YaCy
    needs.
    luccioman committed Feb 21, 2017
Commits on Feb 20, 2017
  1. Add extract of queries.log in form of top search word cloud (last 7 d…

    …ays)
    
    to AccessTracker_p.html (Network Access -> Local Search Log page).
    It displays top 20 words of search queries.
    reger24 committed Feb 20, 2017
  2. Refactored and enforced Solr mandatory fields for proper operation

    - Added a new method to check activation of mandatory fields on
    Collection Configuration commit, consistently with checks previously
    performed in Switchboard startup and with mandatory fields in the
    default schema.
    - Reorganized default schema and CollectionConfiguration enumeration :
    moved no more mandatory fields in a specific section, and moved fields
    enabled at startup to the mandatory section. 
    - Marked mandatory fields as required and with stronger font in the
    IndexSchema_p.html page
    luccioman committed Feb 20, 2017
Commits on Feb 19, 2017
  1. correct fromDate init value on missing param in api/timeline_p servlet

    revert test modification from last commit in AccessTracker.main
    reger24 committed Feb 19, 2017
  2. add hint of query syntax in AccessTracker log (qs=normal querystring,

    sq=solr-querystring) to allow to filter simple text queries for processing,
    remove toString for counter parameter
    use more predefined constants in solrservlet
    reger24 committed Feb 19, 2017
Commits on Feb 17, 2017
  1. Fixed a NullPointerException case possible on Index Export

    As reported by Palulukas in YaCy forum
    (http://forum.yacy-websuche.de/viewtopic.php?f=18&t=5944&sid=dcef5b899ab4aa9b40e3a3d158c13aed#p33454)
    the Index Export operation can fails, notably when the Solr index
    contains one or more documents with empty (despite required)
    "load_date_dt" field.
    
    This fixes the export failure when the situation finally occurs, but
    more should be done to harden verifications on minimum required fields.
    luccioman committed Feb 17, 2017
Commits on Feb 16, 2017
  1. Reduce self generated content for text_t (visible text index field)

    to avoid repeat of tokenized url as description,
    continuation of 7e09bff
    1409cab
    Add some javadoc, and not needed remove of omitted fields in postprocessing.
    reger24 committed Feb 16, 2017
Commits on Feb 15, 2017
  1. Added robots.txt support for heuristics federated search.

    As noticed by @reger24, abusive use of OpenSearch systems should be
    prevented, especially if allowing to parse and reuse HTML results.
    robots.txt file is now checked before requesting an external OpenSearch
    system to respect the host exclusions and eventual crawl-delay value.
    The check is also performed when trying to add a new OpenSearch URL
    template through the /ConfigHeuristics_p.html admin page.
    luccioman committed Feb 15, 2017
Commits on Feb 14, 2017
  1. Use java.net.URLDecoder

    sinkuu committed Feb 11, 2017
Commits on Feb 13, 2017
  1. update opensearch conf - remove suche.sueddeutsche.de

    apparently they've revoked the participation in opensearch initiative.
    reger24 committed Feb 13, 2017
  2. Added support for HTML OpenSearch results.

    Many OpenSearch systems do not provide results as standard RSS/Atom
    feeds but only as HTML. 
    
    This modification add some support for custom OpenSearch HTML results
    through the use of mapping files (as already done for federated Solr
    search) relying on CSS-like selectors to retrieve information from HTML
    content.
    
    An example mapping file is provided to map results from the
    www.npmjs.com OpenSearch URL.
    luccioman committed Feb 13, 2017
Commits on Feb 11, 2017
  1. upd to Jetty-9.2.21.v20170120

    reger24 committed Feb 11, 2017
Commits on Feb 10, 2017
  1. Upgraded Apache Ant to 1.10.1 in the Docker alpine flavor image

    For a more reliable Docker image build, also switched to the ant archive
    repository to fetch the needed binary as other repositories only provide
    the latest versions.
    luccioman committed Feb 10, 2017
Commits on Feb 9, 2017
  1. Replaced absolute redirection locations by relative ones when possible.

    This makes integration of YaCy behind a reverse proxy subfolder easier.
    luccioman committed Feb 9, 2017
  2. Added a new Debug/Analysis advanced settings subsection.

    As discussed in PR #93 with @JeremyRand and @reger24 this new advanced
    settings page includes:
     - a new setting to control remote Solr responses encoding
     - some existing debug settings which could not be set through the admin
    user interface
    luccioman committed Feb 9, 2017
Commits on Feb 6, 2017
  1. Improved termination of timed out remote solr requests to peers.

    On timeout, closing remote Solr requests is proper than simply using
    Thread.interrupt() that is not effective in most cases. Closing does not
    ask commit on remote solr, but release http connections resources and is
    more likely to end those threads that can else wait indefinitely.
    
    Other related improvements included :
     - no more marking remote peer as not available when remote search is
    interrupted before timeout by the cleanup job.
     - added a short fine log level trace of failing remote solr requests
    luccioman committed Feb 6, 2017
Commits on Feb 3, 2017
  1. Removed deprecated "localMissCount" prop from yacysearchlatestinfo.json.

    This property has been deprecated four years ago by commit
    d74472f. For any active search event
    id, it was then always filled with "-UNRESOLVED_PATTERN-".
    luccioman committed Feb 3, 2017
Commits on Feb 1, 2017
  1. Refactored the DHT-Trigger section in Performance_p.html page.

    This is to be more easily understandable and to reflect more accurately
    the current memory strategies implementations that eventually set the
    "proper" state not only because DHT reception.
    luccioman committed Feb 1, 2017
Commits on Jan 31, 2017
  1. Updated French translation for the /Performance_p.html page.

    Also updated the master xliff file with missing recent changes.
    luccioman committed Jan 31, 2017
  2. Fixed unresolved pattern on directory entries in HostBrowser.xml api.

    As described in mantis 725 (http://mantis.tokeek.de/view.php?id=725) the
    HostBrowser.xml api directory entries had incorrect count attribute
    value. 
    This was because the HostBrowser html page and backing template servlet
    evolved, but modifications were not reported on the xml api.
    luccioman committed Jan 31, 2017