Permalink
Browse files

URL Viewer : apply crawler size limits when adding to local index.

This allow large files parsing and preview, while preventing unwanted
OutOfMemory errors which are likely to occur when adding to the Solr
Index resources larger than configured crawler limits.
  • Loading branch information...
luccioman committed Jul 16, 2017
1 parent eda7b0a commit 8100c033a23d2f205e784dca6280f0b332ffb12a
Showing with 9 additions and 4 deletions.
  1. +9 −4 htroot/ViewFile.java
@@ -363,10 +363,15 @@ public static serverObjects respond(final RequestHeader header, final serverObje
prop.put("showSnippet_teasertext", desc);
prop.put("showSnippet", 1);
}
// update index with parsed resouce if index entry is older or missing
if (urlEntry == null || urlEntry.loaddate().before(response.lastModified())) {
Switchboard.getSwitchboard().toIndexer(response);
}
// update index with parsed resource if index entry is older or missing
final long responseSize = response.size();
if (urlEntry == null || urlEntry.loaddate().before(response.lastModified())) {
/* Also check resource size is lower than configured crawler limits */
if (responseSize >= 0
&& responseSize <= Switchboard.getSwitchboard().loader.protocolMaxFileSize(response.url())) {
Switchboard.getSwitchboard().toIndexer(response);
}
}
if (document != null) document.close();
}
prop.put("error", "0");

0 comments on commit 8100c03

Please sign in to comment.