Skip to content

Commit

Permalink
Merged changes from master.
Browse files Browse the repository at this point in the history
  • Loading branch information
luccioman committed Jul 19, 2016
2 parents 0c327d0 + 774b390 commit 47d4862
Show file tree
Hide file tree
Showing 10 changed files with 331 additions and 74 deletions.
5 changes: 5 additions & 0 deletions locales/de.lng
Original file line number Diff line number Diff line change
Expand Up @@ -104,6 +104,11 @@ Augmented Browsing:==Angereichertes Browsing:
Enables or disables augmented browsing. If enabled, all websites will be modified during loading.==Schaltet angereichertes Browsing an oder ab. Wenn aktiviert werden alle Webseite während des Ladens modifiziert.
#-----------------------------

#File: Autocrawl_p.html
#---------------------------
"Save"=="Speichern"
#-----------------------------

#File: Blacklist_p.html
#---------------------------
Blacklist Administration==Blacklist Verwaltung
Expand Down
224 changes: 178 additions & 46 deletions locales/fr.lng

Large diffs are not rendered by default.

59 changes: 59 additions & 0 deletions locales/master.lng.xlf
Original file line number Diff line number Diff line change
Expand Up @@ -190,6 +190,62 @@
</body>
</file>

<file original="Autocrawl_p.html" source-language="en" datatype="html">
<body>
<trans-unit id="263005b5" xml:space="preserve" approved="no" translate="yes">
<source>&gt;Autocrawler&lt;</source>
</trans-unit>
<trans-unit id="7015ea9" xml:space="preserve" approved="no" translate="yes">
<source>Autocrawler automatically selects and adds tasks to the local crawl queue.</source>
</trans-unit>
<trans-unit id="173d9787" xml:space="preserve" approved="no" translate="yes">
<source>This will work best when there are already quite a few domains in the index.</source>
</trans-unit>
<trans-unit id="ef85f111" xml:space="preserve" approved="no" translate="yes">
<source>Autocralwer Configuration</source>
</trans-unit>
<trans-unit id="45fd99f0" xml:space="preserve" approved="no" translate="yes">
<source>You need to restart for some settings to be applied</source>
</trans-unit>
<trans-unit id="7b631d2" xml:space="preserve" approved="no" translate="yes">
<source>Enable Autocrawler:</source>
</trans-unit>
<trans-unit id="66a1bd2c" xml:space="preserve" approved="no" translate="yes">
<source>Deep crawl every:</source>
</trans-unit>
<trans-unit id="2291c65d" xml:space="preserve" approved="no" translate="yes">
<source>Warning: if this is bigger than "Rows to fetch" only shallow crawls will run.</source>
</trans-unit>
<trans-unit id="46c18c30" xml:space="preserve" approved="no" translate="yes">
<source>Rows to fetch at once:</source>
</trans-unit>
<trans-unit id="6b6b7b1b" xml:space="preserve" approved="no" translate="yes">
<source>Recrawl only older than # days:</source>
</trans-unit>
<trans-unit id="1472a55c" xml:space="preserve" approved="no" translate="yes">
<source>Get hosts by query:</source>
</trans-unit>
<trans-unit id="6dd8103f" xml:space="preserve" approved="no" translate="yes">
<source>Can be any valid Solr query.</source>
</trans-unit>
<trans-unit id="bc75d794" xml:space="preserve" approved="no" translate="yes">
<source>Shallow crawl depth (0 to 2):</source>
</trans-unit>
<trans-unit id="6c1bc4ce" xml:space="preserve" approved="no" translate="yes">
<source>Deep crawl depth (1 to 5):</source>
</trans-unit>
<trans-unit id="5c70dfbf" xml:space="preserve" approved="no" translate="yes">
<source>Index text:</source>
</trans-unit>
<trans-unit id="25aff004" xml:space="preserve" approved="no" translate="yes">
<source>Index media:</source>
</trans-unit>
<trans-unit id="3ec44343" xml:space="preserve" approved="no" translate="yes">
<source>"Save"</source>
</trans-unit>
</body>
</file>

<file original="BlacklistCleaner_p.html" source-language="en" datatype="html">
<body>
<trans-unit id="da2f8473" xml:space="preserve" approved="no" translate="yes">
Expand Down Expand Up @@ -560,6 +616,9 @@
<trans-unit id="824e78ee" xml:space="preserve" approved="no" translate="yes">
<source>Please select the XML-file you want to import:</source>
</trans-unit>
<trans-unit id="4cf708d" xml:space="preserve" approved="no" translate="yes">
<source>Text:</source>
</trans-unit>
</body>
</file>

Expand Down
19 changes: 9 additions & 10 deletions locales/sk.lng
Original file line number Diff line number Diff line change
Expand Up @@ -137,15 +137,6 @@ Show==Zobraz
Bookmarks per page.==z&aacute;loziek na str&aacute;nku.
#-----------------------------

#File: ConfigAdvanced_p.html
#---------------------------
Advanced Config==Pokrocile nastavenia
Here are all configuration options from YaCy.==Tu sa nachadzaju vsetky konfiguracne nastavenia YaCy.
You can change anything, but some options need a restart, and some options can crash YaCy, if wrong values are used.==Vsetky konfiguracne nastavenia mozu byt zmenene, avsak niektore volby vyzaduju restart a niektore mozu sposobit pad YaCy v pripade zadnia nespravnych hodnot.
For explanation please look into defaults/yacy.init==Vysvetlenie najdete v subore defaults/yacy.init
"Save"=="Uloz"
#-----------------------------

#File: ConfigBasic.html
#---------------------------
Select a language for the interface==Zvolte jazyk web rozhrania
Expand Down Expand Up @@ -223,6 +214,15 @@ Comment==Koment&aacute;r
"Save"=="Uloz profil"
#-----------------------------

#File: ConfigProperties_p.html
#---------------------------
Advanced Config==Pokrocile nastavenia
Here are all configuration options from YaCy.==Tu sa nachadzaju vsetky konfiguracne nastavenia YaCy.
You can change anything, but some options need a restart, and some options can crash YaCy, if wrong values are used.==Vsetky konfiguracne nastavenia mozu byt zmenene, avsak niektore volby vyzaduju restart a niektore mozu sposobit pad YaCy v pripade zadnia nespravnych hodnot.
For explanation please look into defaults/yacy.init==Vysvetlenie najdete v subore defaults/yacy.init
"Save"=="Uloz"
#-----------------------------

#File: ConfigSkins_p.html
#---------------------------
Skin Selection==Vyber skinov
Expand Down Expand Up @@ -1852,7 +1852,6 @@ user</a> page.==stranky pouzivatelov</a>.

#File: ViewFile.html
#---------------------------
YaCy '#[clientname]#': View URL Content==YaCy '#[clientname]#': Zobraz obsah URL adresy
View URL Content==Zobraz obsah URL adresy
#URL==URL
#Hash==Hash
Expand Down
14 changes: 9 additions & 5 deletions source/net/yacy/cora/date/GenericFormatter.java
Original file line number Diff line number Diff line change
Expand Up @@ -137,14 +137,18 @@ public Calendar parse(final String timeString, final int timezoneOffset) throws
* @throws ParseException
*/
public Calendar parse(final String timeString, final String UTCOffset) throws ParseException {
// FIXME: This method returns an incorrect date, check callers!
// ex: de.anomic.server.serverDate.parseShortSecond("20070101120000", "+0200").toGMTString()
// => 1 Jan 2007 13:00:00 GMT
if (timeString == null || timeString.isEmpty()) { return Calendar.getInstance(UTCtimeZone); }
if (UTCOffset == null || UTCOffset.isEmpty()) { return Calendar.getInstance(UTCtimeZone); }
return parse(timeString, UTCDiff(UTCOffset));
return parse(timeString, UTCDiff(UTCOffset)); // offset expected in min
}

/**
* Calculates the time offset in minutes given as timezoneoffsetstring (diffString)
* e.g. "+0300" returns 180
*
* @param diffString with fixed timezone format
* @return parsed timezone string in minutes
*/
private static int UTCDiff(final String diffString) {
if (diffString.length() != 5) throw new IllegalArgumentException("UTC String malformed (wrong size):" + diffString);
boolean ahead = true;
Expand All @@ -153,7 +157,7 @@ private static int UTCDiff(final String diffString) {
else throw new IllegalArgumentException("UTC String malformed (wrong sign):" + diffString);
final int oh = NumberTools.parseIntDecSubstring(diffString, 1, 3);
final int om = NumberTools.parseIntDecSubstring(diffString, 3);
return (int) ((ahead) ? 1 : -1 * (oh * AbstractFormatter.hourMillis + om * AbstractFormatter.minuteMillis));
return (int) ( ((ahead) ? 1 : -1) * (oh * 60 + om));
}

/**
Expand Down
5 changes: 3 additions & 2 deletions source/net/yacy/cora/document/id/DigestURL.java
Original file line number Diff line number Diff line change
Expand Up @@ -244,11 +244,12 @@ private final byte[] urlHashComputation() {
// find rootpath
int rootpathStart = 0;
int rootpathEnd = this.path.length() - 1;
if (!this.path.isEmpty() && this.path.charAt(0) == '/')
if (!this.path.isEmpty() && (this.path.charAt(0) == '/' || this.path.charAt(0) == '\\'))
rootpathStart = 1;
if (this.path.endsWith("/"))
rootpathEnd = this.path.length() - 2;
p = this.path.indexOf('/', rootpathStart);
if (this.isFile() && p < 0) p = this.path.indexOf('\\', rootpathStart); // double-check for windows path (if it's a file url)
String rootpath = "";
if (p > 0 && p < rootpathEnd) {
rootpath = this.path.substring(rootpathStart, p);
Expand All @@ -264,7 +265,7 @@ private final byte[] urlHashComputation() {
final StringBuilder hashs = new StringBuilder(12);
assert hashs.length() == 0;
// form the 'local' part of the hash
final String normalform = toNormalform(true, true);
final String normalform = toNormalform(true, true); // normalizes also Windows backslash in path to '/' for file url
final String b64l = Base64Order.enhancedCoder.encode(Digest.encodeMD5Raw(normalform));
if (b64l.length() < 5) return null;
hashs.append(b64l.substring(0, 5)); // 5 chars
Expand Down
21 changes: 18 additions & 3 deletions source/net/yacy/cora/document/id/MultiProtocolURL.java
Original file line number Diff line number Diff line change
Expand Up @@ -216,7 +216,13 @@ public MultiProtocolURL(String url) throws MalformedURLException {
if (!this.protocol.equals("file") && url.substring(p + 1, p + 3).equals("//")) {
// identify host, userInfo and file for http and ftp protocol
int q = url.indexOf('/', p + 3);
if (q < 0) q = url.indexOf("?", p + 3); // check for www.test.com?searchpart
if (q < 0) { // check for www.test.com?searchpart
q = url.indexOf("?", p + 3);
} else { // check that '/' was not in searchpart (example http://test.com?data=1/2/3)
if (url.lastIndexOf("?", q) >= 0) {
q = url.indexOf("?", p + 3);
}
}
int r;
if (q < 0) {
if ((r = url.indexOf('@', p + 3)) < 0) {
Expand Down Expand Up @@ -832,7 +838,7 @@ public String getFileName() {
}

/**
* Get extension out of a filename
* Get extension out of a filename in lowercase
* cuts off query part
* @param fileName
* @return extension or ""
Expand Down Expand Up @@ -1064,8 +1070,14 @@ public String toNormalform(final boolean excludeAnchor) {
return toNormalform(excludeAnchor, false);
}

/**
* Generates a normal form of the URL.
* For file: url it normalizes also path delimiter to be '/' (replace possible Windows '\'
* @param excludeAnchor
* @param removeSessionID
* @return
*/
public String toNormalform(final boolean excludeAnchor, final boolean removeSessionID) {
// generates a normal form of the URL
boolean defaultPort = false;
if (this.protocol.equals("mailto")) {
return this.protocol + ":" + this.userInfo + "@" + this.host;
Expand Down Expand Up @@ -1096,6 +1108,9 @@ public String toNormalform(final boolean excludeAnchor, final boolean removeSess
u.append(":");
u.append(this.port);
}
if (isFile() && urlPath.indexOf('\\') >= 0) { // normalize windows backslash (important for hash computation)
urlPath = urlPath.replace('\\', '/');
}
u.append(urlPath);
String result = u.toString();

Expand Down
35 changes: 27 additions & 8 deletions source/net/yacy/document/Document.java
Original file line number Diff line number Diff line change
Expand Up @@ -638,10 +638,13 @@ public static Map<MultiProtocolURL, String> allSubpaths(final Collection<?> link
return v;
}

/**
* We find all links that are part of a reference inside a url
*
* @param links links is either a Set of AnchorURL, Strings (with urls) or htmlFilterImageEntries
* @return map with contained urls as key and "ref" as value
*/
private static Map<AnchorURL, String> allReflinks(final Collection<?> links) {
// links is either a Set of Strings (with urls) or
// htmlFilterImageEntries
// we find all links that are part of a reference inside a url
final Map<AnchorURL, String> v = new HashMap<AnchorURL, String>();
final Iterator<?> i = links.iterator();
Object o;
Expand All @@ -663,7 +666,9 @@ else if (o instanceof ImageEntry)
continue loop;
}
u = url.toNormalform(true);
if ((pos = u.toLowerCase().indexOf("http://", 7)) > 0) {

// find start of a referenced http url
if ((pos = u.toLowerCase().indexOf("http://", 7)) > 0) { // 7 = skip the protocol part of the source url
i.remove();
u = u.substring(pos);
while ((pos = u.toLowerCase().indexOf("http://", 7)) > 0)
Expand All @@ -673,16 +678,30 @@ else if (o instanceof ImageEntry)
v.put(url, "ref");
continue loop;
}
if ((pos = u.toLowerCase().indexOf("/www.", 7)) > 0) {

// find start of a referenced https url
if ((pos = u.toLowerCase().indexOf("https://", 7)) > 0) { // 7 = skip the protocol part of the source url
i.remove();
u = "http:/" + u.substring(pos);
while ((pos = u.toLowerCase().indexOf("/www.", 7)) > 0)
u = "http:/" + u.substring(pos);
u = u.substring(pos);
while ((pos = u.toLowerCase().indexOf("https://", 7)) > 0)
u = u.substring(pos);
url = new AnchorURL(u);
if (!(v.containsKey(url)))
v.put(url, "ref");
continue loop;
}

if ((pos = u.toLowerCase().indexOf("/www.", 11)) > 0) { // 11 = skip protocol part + www of source url "http://www."
i.remove();
u = url.getProtocol()+":/" + u.substring(pos);
while ((pos = u.toLowerCase().indexOf("/www.", 11)) > 0)
u = url.getProtocol()+":/" + u.substring(pos);

AnchorURL addurl = new AnchorURL(u);
if (!(v.containsKey(addurl)))
v.put(addurl, "ref");
continue loop;
}
} catch (final MalformedURLException e) {
}
return v;
Expand Down
20 changes: 20 additions & 0 deletions test/java/net/yacy/cora/document/id/DigestURLTest.java
Original file line number Diff line number Diff line change
Expand Up @@ -2,6 +2,7 @@

import java.net.MalformedURLException;
import junit.framework.TestCase;
import net.yacy.cora.document.encoding.ASCII;
import org.junit.Test;

public class DigestURLTest extends TestCase {
Expand Down Expand Up @@ -30,4 +31,23 @@ public void testIdentPort() throws MalformedURLException {
}
}

/**
* Test hash() of DigestURL and File protocol to deliver same hash for
* allowed Windows or Java notation of same file
*/
@Test
public void testHash_ForFile() throws MalformedURLException {
String winUrlStr = "file:///C:\\tmp\\test.html"; // allowed Windows notation
String javaUrlStr = "file:///C:/tmp/test.html"; // allowed Java notation for Windows file system

DigestURL winUrl = new DigestURL(winUrlStr);
DigestURL javaUrl = new DigestURL(javaUrlStr);

String winHashResult = ASCII.String(winUrl.hash());
String javaHashResult = ASCII.String(javaUrl.hash());

assertEquals("hash for same file url", javaHashResult, winHashResult);

}

}
3 changes: 3 additions & 0 deletions test/java/net/yacy/cora/document/id/MultiProtocolURLTest.java
Original file line number Diff line number Diff line change
Expand Up @@ -127,6 +127,9 @@ public void testGetHost() throws MalformedURLException {
new String[]{"http://www.yacy.net?query=test", "www.yacy.net"},
new String[]{"http://www.yacy.net:?query=test", "www.yacy.net"},
new String[]{"//www.yacy.net:?query=test", "www.yacy.net"},

new String[]{"http://www.yacy.net?data=1/2/3", "www.yacy.net"},
new String[]{"http://www.yacy.net?url=http://test.com", "www.yacy.net"}
};

for (int i = 0; i < testStrings.length; i++) {
Expand Down

0 comments on commit 47d4862

Please sign in to comment.