Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

es_search() : aggregation query crashes if empty bins #58

Closed
jvondollen opened this issue May 9, 2018 · 7 comments
Closed

es_search() : aggregation query crashes if empty bins #58

jvondollen opened this issue May 9, 2018 · 7 comments

Comments

@jvondollen
Copy link

I'm having issues getting aggregate searches working. I use the exact aggregate query within the elastic site search and get results fine, but when I run it via the es_search() function, I get an error about missing data:

Error in log_fatal(msg) :
The column given to unpack_nested_data had no data in it.

I understand that there are empty buckets in the aggregation that are affecting the "unpack_nested_data" function. However, this isn't a problem when the aggregation results are written to a file and the read back in and parsed with "chomp_aggs".

@jameslamb jameslamb added the bug label May 9, 2018
@jameslamb
Copy link
Collaborator

Thanks for opening this up @jvondollen . I have one followup question...what type of aggs are using?

Each aggregation result has a slightly different format, so if you can provide that it would help us narrow down the issue.

The fact that you got log_fatal means that this is an error we intended to throw, so we may also need to revisit why unpack_nested_data breaks with an empty column instead of warning and returning NULL.

cc @wdearden @austin3dickey

@jvondollen
Copy link
Author

jvondollen commented May 9, 2018

This involves some multilevel nested sums aggregated over different terms. It also involves a reverse_nested aggregation. I'm not sure what exactly you men by "type" of aggs, but the query structure is something of the form:

aggs
  nested
    aggs
       nested
          aggs
             terms
                aggs
                   reverse_nested
                     aggs
                        sum

@jameslamb
Copy link
Collaborator

Oh sorry. By type I mean e.g. stats, terms, extended_stats, etc.

Also I'm sure that we don't have a test for reverse_nested, so that is something we will have to explore.

One other question...what version of ES are you on? We test uptasticsearch against all major versions going back to 1.0, but having the version number will help us to nail down where we need to look for additional documentation on Elastic's site.

@jvondollen
Copy link
Author

Ah, this makes sense since most of my aggregations involve reverse_nested ( stats and terms).
I'm currently using ES version 5.6.9

@jameslamb
Copy link
Collaborator

Ok cool, that helps!

@jvondollen and I talked over email and it revealed something valuable. Sounds like his query was returning 0 results, and the error he got was because of that.

I propose that in that situation, it would be more natural to throw a warning and return an empty data.table. I think that's more natural. Like imagine the case where you're running some pipeline like this:

resultList <- lapply(vector_of_stuff, function(thing){
    es_search("myhost.com:9200", "my_index", sprintf(query_template, thing))
})
resultDT <- data.table::rbindlist(
    resultList
    , fill = TRUE
)

It would be real annoying if you had 100 queries and couldn't get your result because one of them returned no results.

I prefer an empty data.table to a straight-up NULL because it will also prevent code that expects one data.table from breaking.

This thread revealed another problem...we have no tests for reverse_nested aggregations! So thanks to you @jvondollen for exposing a couple of weaknesses in the package!

@jameslamb
Copy link
Collaborator

To close this issue:

  1. Change aggs handling code to warn and return data.table::data.table() on a null result
  2. Add test confirming that empty results from aggs queries return that empty data.table and throw a warning

@jameslamb
Copy link
Collaborator

Going to close this one, as I think it is addressed by the changes in #70

#59 is still open and we don't currently have support for reverse-nested stuff. @jvondollen if you have the time we'd love your help in addressing that issue! I'm not really familiar with reverse_nested stuff.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants