Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

support for ES 7 (fixes #155) #161

Merged
merged 15 commits into from Sep 4, 2019
Merged

support for ES 7 (fixes #155) #161

merged 15 commits into from Sep 4, 2019

Conversation

@jameslamb
Copy link
Member

@jameslamb jameslamb commented Aug 22, 2019

This pull requests aims to add support in uptasticsearch for Elasticsearch 7.x. See the linked issue and changes to NEWS.md for details on what has changed in ES7.x.

This PR's scope is limited to "get uptasticsearch code working with Es7.x". It does not include taking advantage of any ES7-specific features like new types of aggregations.

Opening this as a draft PR as it currently only addresses the R side. The Python side needs to be updated n this PR as well.

@@ -0,0 +1,96 @@
{"index":{"_index":"shakespeare","_id":2}}
{"line_id":3,"play_name":"Henry IV","speech_number":"","line_number":"","speaker":"","text_entry":"Enter KING HENRY, LORD JOHN OF LANCASTER, the EARL of WESTMORELAND, SIR WALTER BLUNT, and others"}
{"index":{"_index":"shakespeare","_id":3}}

This comment has been minimized.

@jameslamb

jameslamb Aug 22, 2019
Author Member

it was necessary to make this new file because _type was removed in ES7. Having documents with multiple types within one index is now not allowed and attempts to set _type will raise errors complaining about multiple document types (unless you explicitly include a default document type in your mapping).

It's possible this file could be irrelevant and that es7_mapping.json could just have some argument added to it that specifics the default document type to line, but I couldn't figure out how to do that in 5 minutes and this worked. Definitely would like to come back to it.

This comment has been minimized.

@austin3dickey

austin3dickey Aug 22, 2019
Member

nice, want to make an issue for that after this is merged?

This comment has been minimized.

@jameslamb

jameslamb Sep 4, 2019
Author Member

created #167

NEWS.md Outdated

## Features

### Full support for ES7.x

This comment has been minimized.

@austin3dickey

austin3dickey Aug 22, 2019
Member

Maybe just "Support" without "Full"? Since new features aren't accounted for yet?

This comment has been minimized.

@jameslamb

jameslamb Aug 22, 2019
Author Member

yeah good call

This comment has been minimized.

@austin3dickey

austin3dickey Sep 4, 2019
Member

still have to do this I think

This comment has been minimized.

@jameslamb

jameslamb Sep 4, 2019
Author Member

ugh you're right. I want Bitbucket tasks back :(

expect_true(data.table::is.data.table(outDT))
expect_true(nrow(outDT) == 4)
expect_true(nrow(outDT) == 3)

This comment has been minimized.

@austin3dickey

austin3dickey Aug 22, 2019
Member

Looks like this is failing checks?

This comment has been minimized.

@jameslamb

jameslamb Aug 23, 2019
Author Member

Ha yeah I kind of thought it might! This is a weird inconsistency between ES <7 and Es 7. I think that what's happening is that previously you were able to have multiple document types in one index and that + something weird in the way we write test data was causing duplicate entries for some indices. I'll have to figure that out to get this merged.

This comment has been minimized.

@jameslamb

jameslamb Aug 31, 2019
Author Member

nope I was totally wrong. It's a different thing. In the ES7.x one, I'm using the keyword type (first introduced in ES6.x) for field speaker. That tells Elasticsearch "don't pass the values through a tokenizer, treat the full text as a single level of a categorical".

For ES6 I'm using the text type and for ES5 I'm using the string type. Those default to breaking down their inputs into tokens with a whitespace tokenizer.

So basically for the ES7 test it thinks there are three unique speakers: henry iv, king, and westmoreland. But for all earlier versions, the same terms agg gives you four levels:

          thing doc_count
1:        henry        34
2:           iv        34
3:         king        34
4: westmoreland        13

I chose the keyword type for this field in the ES7 mapping because Elasticsearch removed the use of fielddata = true to say "make it possible to do a terms agg".

For now, I'm going to go with the approach of added an explicit check on the version around this test. It's gross so I'll open a "come back and make this less gross" bug, but I feel like it's a thing that will:

  1. Give us confidence that this PR doesn't break backwards compatibility of our library with all earlier Elasticsearch versions
  2. Give us confidence that uptasticsearch can process the result of a terms agg from ES7.x correctly
@austin3dickey
Copy link
Member

@austin3dickey austin3dickey commented Aug 22, 2019

I couldn't find Travis checking 7.3.0; is that expected?

@austin3dickey
Copy link
Member

@austin3dickey austin3dickey commented Aug 22, 2019

TIL the "work in progress" feature! Haven't seen that before

@jameslamb
Copy link
Member Author

@jameslamb jameslamb commented Aug 22, 2019

TIL the "work in progress" feature! Haven't seen that before

yep it's a new-ish feature! I of course appreciate the review, but I did open it in WIP so you'd know you didn't have to review yet

@jameslamb
Copy link
Member Author

@jameslamb jameslamb commented Aug 22, 2019

I couldn't find Travis checking 7.3.0; is that expected?

Nope that's an omission, thank you!

@jameslamb jameslamb mentioned this pull request Aug 30, 2019
@jameslamb jameslamb force-pushed the jameslamb:es7 branch from 5d16a17 to b9c53bd Sep 4, 2019
@jameslamb
Copy link
Member Author

@jameslamb jameslamb commented Sep 4, 2019

Ok I rebased to catch the changes in #163 , so the diff for .travis.yml looks a lot different now. I also commented out all but one 6.2.x build and one 7.3.x build for each language, to speed up the feedback cycle on this PR. I'll uncomment them all whenever I feel ready to move this from draft to official open PR

@codecov-io
Copy link

@codecov-io codecov-io commented Sep 4, 2019

Codecov Report

Merging #161 into master will increase coverage by 0.3%.
The diff coverage is n/a.

Impacted file tree graph

@@           Coverage Diff            @@
##           master    #161     +/-   ##
========================================
+ Coverage    92.8%   93.1%   +0.3%     
========================================
  Files           8       8             
  Lines         556     595     +39     
========================================
+ Hits          516     554     +38     
- Misses         40      41      +1
Impacted Files Coverage Δ
R/es_search.R 87.87% <0%> (-0.25%) ⬇️
R/get_fields.R 94.69% <0%> (+2.48%) ⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 4a6b39b...19ae702. Read the comment docs.

@jameslamb
Copy link
Member Author

@jameslamb jameslamb commented Sep 4, 2019

Ok I rebased to catch the changes in #163 , so the diff for .travis.yml looks a lot different now. I also commented out all but one 6.2.x build and one 7.3.x build for each language, to speed up the feedback cycle on this PR. I'll uncomment them all whenever I feel ready to move this from draft to official open PR

This is working! Going to add ALL of the versions back and make this an official PR.

image

@austin3dickey take a look whenever you have time.

@jameslamb jameslamb marked this pull request as ready for review Sep 4, 2019
respose to a ``POST /_search`` request, return the total
number of docs matching the query
"""
return response_json['hits']['total']['value']

This comment has been minimized.

Copy link
Member

@austin3dickey austin3dickey left a comment

LGTM! Just that small NEWS tweak

@jameslamb jameslamb requested a review from austin3dickey Sep 4, 2019
@jameslamb jameslamb force-pushed the jameslamb:es7 branch from 19ae702 to e3abdee Sep 4, 2019
@jameslamb
Copy link
Member Author

@jameslamb jameslamb commented Sep 4, 2019

Ok one more review por favor (I also miss the bitbucket "leave tasks plus approve" thing)

Copy link
Member

@austin3dickey austin3dickey left a comment

nice!!

@jameslamb jameslamb merged commit 3283bca into uptake:master Sep 4, 2019
1 check passed
1 check passed
continuous-integration/travis-ci/pr The Travis CI build passed
Details
@jameslamb jameslamb deleted the jameslamb:es7 branch Sep 4, 2019
This was referenced Sep 4, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Linked issues

Successfully merging this pull request may close these issues.

None yet

3 participants
You can’t perform that action at this time.