Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Indexing too many document fails in one commit fails. #2

Open
fulmicoton opened this issue Dec 23, 2019 · 5 comments
Open

Indexing too many document fails in one commit fails. #2

fulmicoton opened this issue Dec 23, 2019 · 5 comments

Comments

@fulmicoton
Copy link

fulmicoton commented Dec 23, 2019

Context: I am adding rucene to https://github.com/tantivy-search/search-benchmark-game.

It is a search benchmarking comparing Lucene, Tantivy, Bleve and now Rucene.
Indexing works but I have to periodically commit to avoid getting a panic.

See the following two lines of code and comment.
https://github.com/tantivy-search/search-benchmark-game/blob/master/engines/rucene-0.1/src/bin/build_index.rs#L103-L104

(I suspect a u32 overflow)

@fulmicoton
Copy link
Author

FYI Here is the backtrace.

doc 2420000
doc 2430000
doc 2440000
doc 2450000
doc 2460000
doc 2470000
doc 2480000
doc 2490000
doc 2500000
doc 2510000
doc 2520000
doc 2530000
thread 'main' panicked at 'index out of bounds: the len is 65537 but the index is 562949953355776', /rustc/c8ea4ace9213ae045123fdfeb59d1ac887656d31/src/libcore/slice/mod.rs:2806:10
stack backtrace:
   0: backtrace::backtrace::libunwind::trace
             at /cargo/registry/src/github.com-1ecc6299db9ec823/backtrace-0.3.40/src/backtrace/libunwind.rs:88
   1: backtrace::backtrace::trace_unsynchronized
             at /cargo/registry/src/github.com-1ecc6299db9ec823/backtrace-0.3.40/src/backtrace/mod.rs:66
   2: std::sys_common::backtrace::_print_fmt
             at src/libstd/sys_common/backtrace.rs:84
   3: <std::sys_common::backtrace::_print::DisplayBacktrace as core::fmt::Display>::fmt
             at src/libstd/sys_common/backtrace.rs:61
   4: core::fmt::write
             at src/libcore/fmt/mod.rs:1025
   5: std::io::Write::write_fmt
             at src/libstd/io/mod.rs:1426
   6: std::sys_common::backtrace::_print
             at src/libstd/sys_common/backtrace.rs:65
   7: std::sys_common::backtrace::print
             at src/libstd/sys_common/backtrace.rs:50
   8: std::panicking::default_hook::{{closure}}
             at src/libstd/panicking.rs:193
   9: std::panicking::default_hook
             at src/libstd/panicking.rs:210
  10: std::panicking::rust_panic_with_hook
             at src/libstd/panicking.rs:471
  11: rust_begin_unwind
             at src/libstd/panicking.rs:375
  12: core::panicking::panic_fmt
             at src/libcore/panicking.rs:84
  13: core::panicking::panic_bounds_check
             at src/libcore/panicking.rs:62
  14: rucene::core::codec::postings::terms_hash_per_field::TermsHashPerFieldBase<T>::write_byte
  15: rucene::core::codec::postings::terms_hash_per_field::TermsHashPerField::add
  16: rucene::core::index::writer::doc_consumer::DocConsumer<D,C,MS,MP>::process_document
  17: rucene::core::index::writer::doc_writer::DocumentsWriter<D,C,MS,MP>::update_document
  18: build_index::main
  19: std::rt::lang_start::{{closure}}
  20: main
  21: __libc_start_main
  22: _start
note: Some details are omitted, run with `RUST_BACKTRACE=full` for a verbose backtrace.

@sunxiaoguang
Copy link
Contributor

Can you reproduce the panic with RUST_BACKTRACE=full enabled? There are multiple array accesses in TermsHashPerFieldBase::write_byte. Line number would make it easier to find out the place caused overflow. Thanks

@fulmicoton
Copy link
Author

fulmicoton commented Dec 24, 2019

I don't have time for this but you can reproduce on your own by running

ENGINES=rucene-0.1 make index

in the search benchmark project...
https://github.com/tantivy-search/search-benchmark-game

@sunxiaoguang
Copy link
Contributor

Sure, let me try it out

@jtong11
Copy link
Collaborator

jtong11 commented Dec 27, 2019

@fulmicoton, It is a a 2GB limit with using i32. We will fix it soon.

@jtong11 jtong11 closed this as completed Dec 27, 2019
@sunxiaoguang sunxiaoguang reopened this Dec 27, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants