You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: docs/synonym.md
+3-4Lines changed: 3 additions & 4 deletions
Original file line number
Diff line number
Diff line change
@@ -30,15 +30,16 @@ You can partially make use of the Sudachi synonym resource's detailed informatio
30
30
31
31
### Punctuation Symbols
32
32
33
-
You may need to remove certain synonym words such as `€` and `&` when you use the analyzer with setting `"discard_punctuation": true` (Otherwise you will be get an error, e.g., `"term: € was completely eliminated by analyzer"`). Alternatively, you can set `"lenient": true` for the synonym filter to ignore the exceptions.
33
+
You may need to remove certain synonym words such as `€` and `&` when you use the analyzer with setting `"discard_punctuation": true` (Otherwise you will be get an error, e.g., `"term: € was completely eliminated by analyzer"`). If you are using [ssyn2es.py](./ssyn2es.py), use `--discard-punctuation` option to skip those words. Alternatively, you can set `"lenient": true` for the synonym filter to ignore the exceptions.
34
34
35
-
These symbols are defined as punctuations; See [SudachiTokenizer.java](https://github.com/WorksApplications/elasticsearch-sudachi/blob/develop/src/main/java/com/worksap/nlp/lucene/sudachi/ja/SudachiTokenizer.java#L140) for the detail.
35
+
These symbols are defined as punctuations; See [Strings.java](https://github.com/WorksApplications/elasticsearch-sudachi/blob/develop/src/main/java/com/worksap/nlp/lucene/sudachi/ja/util/Strings.java) for the detail.
36
36
37
37
38
38
## Synonym Filter
39
39
40
40
You can use the converted Solr format file with Elasticsearch's default synonym filters, [Synonym token filter](https://www.elastic.co/guide/en/elasticsearch/reference/current/analysis-synonym-tokenfilter.html) or [Synonym graph filter](https://www.elastic.co/guide/en/elasticsearch/reference/current/analysis-synonym-graph-tokenfilter.html).
41
41
42
+
As `sudachi_split` filter produces token graph, you *cannot* use it with synonym filter.
42
43
43
44
### Example: Set up
44
45
@@ -73,8 +74,6 @@ You can use the converted Solr format file with Elasticsearch's default synonym
73
74
74
75
Here we assume that the converted synonym file is placed as `$ES_PATH_CONF/sudachi/synonym.txt`.
75
76
76
-
If you would like to use `sudachi_split` filter, set it *after* the synonym filter (otherwise you will get an error, e.g., `term: 不明確 analyzed to a token (不) with position increment != 1 (got: 0)`).
0 commit comments