Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

support for ES 7 (fixes #155) #161

Merged
merged 15 commits into from
Sep 4, 2019
Merged

support for ES 7 (fixes #155) #161

merged 15 commits into from
Sep 4, 2019

Conversation

jameslamb
Copy link
Collaborator

This pull requests aims to add support in uptasticsearch for Elasticsearch 7.x. See the linked issue and changes to NEWS.md for details on what has changed in ES7.x.

This PR's scope is limited to "get uptasticsearch code working with Es7.x". It does not include taking advantage of any ES7-specific features like new types of aggregations.

Opening this as a draft PR as it currently only addresses the R side. The Python side needs to be updated n this PR as well.

@@ -0,0 +1,96 @@
{"index":{"_index":"shakespeare","_id":2}}
{"line_id":3,"play_name":"Henry IV","speech_number":"","line_number":"","speaker":"","text_entry":"Enter KING HENRY, LORD JOHN OF LANCASTER, the EARL of WESTMORELAND, SIR WALTER BLUNT, and others"}
{"index":{"_index":"shakespeare","_id":3}}
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it was necessary to make this new file because _type was removed in ES7. Having documents with multiple types within one index is now not allowed and attempts to set _type will raise errors complaining about multiple document types (unless you explicitly include a default document type in your mapping).

It's possible this file could be irrelevant and that es7_mapping.json could just have some argument added to it that specifics the default document type to line, but I couldn't figure out how to do that in 5 minutes and this worked. Definitely would like to come back to it.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nice, want to make an issue for that after this is merged?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

created #167

NEWS.md Outdated

## Features

### Full support for ES7.x
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe just "Support" without "Full"? Since new features aren't accounted for yet?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yeah good call

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

still have to do this I think

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ugh you're right. I want Bitbucket tasks back :(

expect_true(data.table::is.data.table(outDT))
expect_true(nrow(outDT) == 4)
expect_true(nrow(outDT) == 3)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks like this is failing checks?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ha yeah I kind of thought it might! This is a weird inconsistency between ES <7 and Es 7. I think that what's happening is that previously you were able to have multiple document types in one index and that + something weird in the way we write test data was causing duplicate entries for some indices. I'll have to figure that out to get this merged.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nope I was totally wrong. It's a different thing. In the ES7.x one, I'm using the keyword type (first introduced in ES6.x) for field speaker. That tells Elasticsearch "don't pass the values through a tokenizer, treat the full text as a single level of a categorical".

For ES6 I'm using the text type and for ES5 I'm using the string type. Those default to breaking down their inputs into tokens with a whitespace tokenizer.

So basically for the ES7 test it thinks there are three unique speakers: henry iv, king, and westmoreland. But for all earlier versions, the same terms agg gives you four levels:

          thing doc_count
1:        henry        34
2:           iv        34
3:         king        34
4: westmoreland        13

I chose the keyword type for this field in the ES7 mapping because Elasticsearch removed the use of fielddata = true to say "make it possible to do a terms agg".

For now, I'm going to go with the approach of added an explicit check on the version around this test. It's gross so I'll open a "come back and make this less gross" bug, but I feel like it's a thing that will:

  1. Give us confidence that this PR doesn't break backwards compatibility of our library with all earlier Elasticsearch versions
  2. Give us confidence that uptasticsearch can process the result of a terms agg from ES7.x correctly

@austin3dickey
Copy link
Collaborator

I couldn't find Travis checking 7.3.0; is that expected?

@austin3dickey
Copy link
Collaborator

TIL the "work in progress" feature! Haven't seen that before

@jameslamb
Copy link
Collaborator Author

TIL the "work in progress" feature! Haven't seen that before

yep it's a new-ish feature! I of course appreciate the review, but I did open it in WIP so you'd know you didn't have to review yet

@jameslamb
Copy link
Collaborator Author

I couldn't find Travis checking 7.3.0; is that expected?

Nope that's an omission, thank you!

@jameslamb
Copy link
Collaborator Author

Ok I rebased to catch the changes in #163 , so the diff for .travis.yml looks a lot different now. I also commented out all but one 6.2.x build and one 7.3.x build for each language, to speed up the feedback cycle on this PR. I'll uncomment them all whenever I feel ready to move this from draft to official open PR

@codecov-io
Copy link

codecov-io commented Sep 4, 2019

Codecov Report

Merging #161 into master will increase coverage by 0.3%.
The diff coverage is n/a.

Impacted file tree graph

@@           Coverage Diff            @@
##           master    #161     +/-   ##
========================================
+ Coverage    92.8%   93.1%   +0.3%     
========================================
  Files           8       8             
  Lines         556     595     +39     
========================================
+ Hits          516     554     +38     
- Misses         40      41      +1
Impacted Files Coverage Δ
R/es_search.R 87.87% <0%> (-0.25%) ⬇️
R/get_fields.R 94.69% <0%> (+2.48%) ⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 4a6b39b...19ae702. Read the comment docs.

@jameslamb
Copy link
Collaborator Author

Ok I rebased to catch the changes in #163 , so the diff for .travis.yml looks a lot different now. I also commented out all but one 6.2.x build and one 7.3.x build for each language, to speed up the feedback cycle on this PR. I'll uncomment them all whenever I feel ready to move this from draft to official open PR

This is working! Going to add ALL of the versions back and make this an official PR.

image

@austin3dickey take a look whenever you have time.

@jameslamb jameslamb marked this pull request as ready for review September 4, 2019 03:22
respose to a ``POST /_search`` request, return the total
number of docs matching the query
"""
return response_json['hits']['total']['value']
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice

Copy link
Collaborator

@austin3dickey austin3dickey left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM! Just that small NEWS tweak

@jameslamb
Copy link
Collaborator Author

jameslamb commented Sep 4, 2019

Ok one more review por favor (I also miss the bitbucket "leave tasks plus approve" thing)

Copy link
Collaborator

@austin3dickey austin3dickey left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nice!!

@jameslamb jameslamb merged commit 3283bca into uptake:master Sep 4, 2019
@jameslamb jameslamb deleted the es7 branch September 4, 2019 21:52
This was referenced Sep 4, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants