Join GitHub today
GitHub is home to over 50 million developers working together to host and review code, manage projects, and build software together.
Sign upsupport for ES 7 (fixes #155) #161
Conversation
| @@ -0,0 +1,96 @@ | |||
| {"index":{"_index":"shakespeare","_id":2}} | |||
| {"line_id":3,"play_name":"Henry IV","speech_number":"","line_number":"","speaker":"","text_entry":"Enter KING HENRY, LORD JOHN OF LANCASTER, the EARL of WESTMORELAND, SIR WALTER BLUNT, and others"} | |||
| {"index":{"_index":"shakespeare","_id":3}} | |||
jameslamb
Aug 22, 2019
Author
Member
it was necessary to make this new file because _type was removed in ES7. Having documents with multiple types within one index is now not allowed and attempts to set _type will raise errors complaining about multiple document types (unless you explicitly include a default document type in your mapping).
It's possible this file could be irrelevant and that es7_mapping.json could just have some argument added to it that specifics the default document type to line, but I couldn't figure out how to do that in 5 minutes and this worked. Definitely would like to come back to it.
it was necessary to make this new file because _type was removed in ES7. Having documents with multiple types within one index is now not allowed and attempts to set _type will raise errors complaining about multiple document types (unless you explicitly include a default document type in your mapping).
It's possible this file could be irrelevant and that es7_mapping.json could just have some argument added to it that specifics the default document type to line, but I couldn't figure out how to do that in 5 minutes and this worked. Definitely would like to come back to it.
austin3dickey
Aug 22, 2019
Member
nice, want to make an issue for that after this is merged?
nice, want to make an issue for that after this is merged?
|
|
||
| ## Features | ||
|
|
||
| ### Full support for ES7.x |
austin3dickey
Aug 22, 2019
Member
Maybe just "Support" without "Full"? Since new features aren't accounted for yet?
Maybe just "Support" without "Full"? Since new features aren't accounted for yet?
jameslamb
Aug 22, 2019
Author
Member
yeah good call
yeah good call
austin3dickey
Sep 4, 2019
Member
still have to do this I think
still have to do this I think
jameslamb
Sep 4, 2019
Author
Member
ugh you're right. I want Bitbucket tasks back :(
ugh you're right. I want Bitbucket tasks back :(
| expect_true(data.table::is.data.table(outDT)) | ||
| expect_true(nrow(outDT) == 4) | ||
| expect_true(nrow(outDT) == 3) |
austin3dickey
Aug 22, 2019
Member
Looks like this is failing checks?
Looks like this is failing checks?
jameslamb
Aug 23, 2019
Author
Member
Ha yeah I kind of thought it might! This is a weird inconsistency between ES <7 and Es 7. I think that what's happening is that previously you were able to have multiple document types in one index and that + something weird in the way we write test data was causing duplicate entries for some indices. I'll have to figure that out to get this merged.
Ha yeah I kind of thought it might! This is a weird inconsistency between ES <7 and Es 7. I think that what's happening is that previously you were able to have multiple document types in one index and that + something weird in the way we write test data was causing duplicate entries for some indices. I'll have to figure that out to get this merged.
jameslamb
Aug 31, 2019
Author
Member
nope I was totally wrong. It's a different thing. In the ES7.x one, I'm using the keyword type (first introduced in ES6.x) for field speaker. That tells Elasticsearch "don't pass the values through a tokenizer, treat the full text as a single level of a categorical".
For ES6 I'm using the text type and for ES5 I'm using the string type. Those default to breaking down their inputs into tokens with a whitespace tokenizer.
So basically for the ES7 test it thinks there are three unique speakers: henry iv, king, and westmoreland. But for all earlier versions, the same terms agg gives you four levels:
thing doc_count
1: henry 34
2: iv 34
3: king 34
4: westmoreland 13
I chose the keyword type for this field in the ES7 mapping because Elasticsearch removed the use of fielddata = true to say "make it possible to do a terms agg".
For now, I'm going to go with the approach of added an explicit check on the version around this test. It's gross so I'll open a "come back and make this less gross" bug, but I feel like it's a thing that will:
- Give us confidence that this PR doesn't break backwards compatibility of our library with all earlier Elasticsearch versions
- Give us confidence that
uptasticsearch can process the result of a terms agg from ES7.x correctly
nope I was totally wrong. It's a different thing. In the ES7.x one, I'm using the keyword type (first introduced in ES6.x) for field speaker. That tells Elasticsearch "don't pass the values through a tokenizer, treat the full text as a single level of a categorical".
For ES6 I'm using the text type and for ES5 I'm using the string type. Those default to breaking down their inputs into tokens with a whitespace tokenizer.
So basically for the ES7 test it thinks there are three unique speakers: henry iv, king, and westmoreland. But for all earlier versions, the same terms agg gives you four levels:
thing doc_count
1: henry 34
2: iv 34
3: king 34
4: westmoreland 13
I chose the keyword type for this field in the ES7 mapping because Elasticsearch removed the use of fielddata = true to say "make it possible to do a terms agg".
For now, I'm going to go with the approach of added an explicit check on the version around this test. It's gross so I'll open a "come back and make this less gross" bug, but I feel like it's a thing that will:
- Give us confidence that this PR doesn't break backwards compatibility of our library with all earlier Elasticsearch versions
- Give us confidence that
uptasticsearchcan process the result of atermsagg from ES7.x correctly
|
I couldn't find Travis checking 7.3.0; is that expected? |
|
TIL the "work in progress" feature! Haven't seen that before |
yep it's a new-ish feature! I of course appreciate the review, but I did open it in WIP so you'd know you didn't have to review yet |
Nope that's an omission, thank you! |
|
Ok I rebased to catch the changes in #163 , so the diff for |
Codecov Report
@@ Coverage Diff @@
## master #161 +/- ##
========================================
+ Coverage 92.8% 93.1% +0.3%
========================================
Files 8 8
Lines 556 595 +39
========================================
+ Hits 516 554 +38
- Misses 40 41 +1
Continue to review full report at Codecov.
|
This is working! Going to add ALL of the versions back and make this an official PR. @austin3dickey take a look whenever you have time. |
| respose to a ``POST /_search`` request, return the total | ||
| number of docs matching the query | ||
| """ | ||
| return response_json['hits']['total']['value'] |
austin3dickey
Sep 4, 2019
Member
Nice
Nice
|
LGTM! Just that small NEWS tweak |
|
Ok one more review por favor (I also miss the bitbucket "leave tasks plus approve" thing) |
|
nice!! |

This pull requests aims to add support in
uptasticsearchfor Elasticsearch 7.x. See the linked issue and changes to NEWS.md for details on what has changed in ES7.x.This PR's scope is limited to "get
uptasticsearchcode working with Es7.x". It does not include taking advantage of any ES7-specific features like new types of aggregations.Opening this as a draft PR as it currently only addresses the R side. The Python side needs to be updated n this PR as well.