Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Crash on classification with config file #98

Closed
lnicola opened this issue Feb 21, 2022 · 11 comments
Closed

Crash on classification with config file #98

lnicola opened this issue Feb 21, 2022 · 11 comments
Labels
bug Something isn't working

Comments

@lnicola
Copy link

lnicola commented Feb 21, 2022

✅ Inferring train table columns. 2s
✅ Loading train table. 2s
✅ Loading test table. 5s
✅ Shuffling. 0s 628ms
✅ Computing train stats. 9s
✅ Computing test stats. 27s
✅ Finalizing stats. 16s
🏁 Computing baseline metrics. 212389 / 230150 92% 0s 15ms elapsed 0ms remaining
[=======================================================================>      ]
[Thread 0x7ffff7c7e640 (LWP 419555) exited]
thread panicked while panicking. aborting.

Thread 1 "tangram" received signal SIGILL, Illegal instruction.

#0  std::panicking::rust_panic_with_hook () at library/std/src/panicking.rs:621
#1  0x00005555572d34a0 in std::panicking::begin_panic_handler::{closure#0} () at library/std/src/panicking.rs:502
#2  0x00005555572d1944 in std::sys_common::backtrace::__rust_end_short_backtrace<std::panicking::begin_panic_handler::{closure#0}, !> () at library/std/src/sys_common/backtrace.rs:139
#3  0x00005555572d3409 in std::panicking::begin_panic_handler () at library/std/src/panicking.rs:498
#4  0x0000555555893a51 in core::panicking::panic_fmt () at library/core/src/panicking.rs:107
#5  0x0000555555893b43 in core::result::unwrap_failed () at library/core/src/result.rs:1613
#6  0x00005555558d6764 in core::result::Result::unwrap<(), std::sync::mpsc::SendError<core::option::Option<tangram_core::progress::ProgressEvent>>> () at /rustc/db9d1b20bba1968c1ec1fc49616d4742c1725b4b/library/core/src/result.rs:1295
#7  tangram::train::{impl#1}::drop () at crates/cli/train.rs:169
#8  0x00005555559319a9 in core::ptr::drop_in_place<tangram::train::ProgressThread> () at /rustc/db9d1b20bba1968c1ec1fc49616d4742c1725b4b/library/core/src/ptr/mod.rs:188
#9  core::ptr::drop_in_place<core::option::Option<tangram::train::ProgressThread>> () at /rustc/db9d1b20bba1968c1ec1fc49616d4742c1725b4b/library/core/src/ptr/mod.rs:188
#10 0x000055555593a651 in tangram::train::train::{closure#1} () at crates/cli/train.rs:117
#11 0x00005555558d5c27 in std::panicking::try::do_call<tangram::train::train::{closure#1}, core::result::Result<(tangram_core::model::Model, std::path::PathBuf), anyhow::Error>> () at /rustc/db9d1b20bba1968c1ec1fc49616d4742c1725b4b/library/std/src/panicking.rs:406
#12 std::panicking::try<core::result::Result<(tangram_core::model::Model, std::path::PathBuf), anyhow::Error>, tangram::train::train::{closure#1}> () at /rustc/db9d1b20bba1968c1ec1fc49616d4742c1725b4b/library/std/src/panicking.rs:370
#13 std::panic::catch_unwind<tangram::train::train::{closure#1}, core::result::Result<(tangram_core::model::Model, std::path::PathBuf), anyhow::Error>> () at /rustc/db9d1b20bba1968c1ec1fc49616d4742c1725b4b/library/std/src/panic.rs:133
#14 tangram::train::train () at crates/cli/train.rs:37
#15 0x000055555590c001 in tangram::main () at crates/cli/main.rs:170

I commented out the stuff in drop and got:

✅ Inferring train table columns. 0s 9ms
✅ Loading train table. 0s 11ms
✅ Loading test table. 0s 35ms
✅ Shuffling. 0s 3ms
✅ Computing train stats. 0s 24ms
✅ Computing test stats. 0s 77ms
✅ Finalizing stats. 0s 50ms
🏁 Computing baseline metrics. 218421 / 230150 95% 0s 15ms elapsed 0ms remaining
[==========================================================================>   ]
error: panicked at 'called `Result::unwrap()` on an `Err` value: SendError { .. }', crates/cli/train.rs:163:14
   0: tangram::train::train::{{closure}}
             at /home/grayshade/tangram/crates/cli/train.rs:34:40
   1: std::panicking::rust_panic_with_hook
             at /rustc/db9d1b20bba1968c1ec1fc49616d4742c1725b4b/library/std/src/panicking.rs:610:17
   2: std::panicking::begin_panic_handler::{{closure}}
             at /rustc/db9d1b20bba1968c1ec1fc49616d4742c1725b4b/library/std/src/panicking.rs:502:13
   3: std::sys_common::backtrace::__rust_end_short_backtrace
             at /rustc/db9d1b20bba1968c1ec1fc49616d4742c1725b4b/library/std/src/sys_common/backtrace.rs:139:18
   4: rust_begin_unwind
             at /rustc/db9d1b20bba1968c1ec1fc49616d4742c1725b4b/library/std/src/panicking.rs:498:5
   5: core::panicking::panic_fmt
             at /rustc/db9d1b20bba1968c1ec1fc49616d4742c1725b4b/library/core/src/panicking.rs:107:14
   6: core::result::unwrap_failed
             at /rustc/db9d1b20bba1968c1ec1fc49616d4742c1725b4b/library/core/src/result.rs:1613:5
   7: core::result::Result<T,E>::unwrap
             at /rustc/db9d1b20bba1968c1ec1fc49616d4742c1725b4b/library/core/src/result.rs:1295:23
      tangram::train::ProgressThread::send_progress_event
             at /home/grayshade/tangram/crates/cli/train.rs:159:3
      tangram::train::train::{{closure}}::{{closure}}
             at /home/grayshade/tangram/crates/cli/train.rs:96:5
   8: tangram_core::train::train_grid_item::{{closure}}
             at /home/grayshade/tangram/crates/core/train.rs:1031:3
   9: tangram_core::train::train_linear_regressor::{{closure}}
             at /home/grayshade/tangram/crates/core/train.rs:1284:3
  10: tangram_linear::multiclass_classifier::MulticlassClassifier::train
             at /home/grayshade/tangram/crates/linear/multiclass_classifier.rs:132:3
  11: tangram_core::train::train_linear_multiclass_classifier
             at /home/grayshade/tangram/crates/core/train.rs:1484:21
      tangram_core::train::train_model
             at /home/grayshade/tangram/crates/core/train.rs:1233:8
      tangram_core::train::train_grid_item
             at /home/grayshade/tangram/crates/core/train.rs:1030:27
      tangram_core::train::Trainer::train_grid::{{closure}}
             at /home/grayshade/tangram/crates/core/train.rs:252:5
      core::ops::function::impls::<impl core::ops::function::FnOnce<A> for &mut F>::call_once
             at /rustc/db9d1b20bba1968c1ec1fc49616d4742c1725b4b/library/core/src/ops/function.rs:280:13
  12: core::option::Option<T>::map
             at /rustc/db9d1b20bba1968c1ec1fc49616d4742c1725b4b/library/core/src/option.rs:846:29
      <core::iter::adapters::map::Map<I,F> as core::iter::traits::iterator::Iterator>::next
             at /rustc/db9d1b20bba1968c1ec1fc49616d4742c1725b4b/library/core/src/iter/adapters/map.rs:103:9
      <alloc::vec::Vec<T> as alloc::vec::spec_from_iter_nested::SpecFromIterNested<T,I>>::from_iter
             at /rustc/db9d1b20bba1968c1ec1fc49616d4742c1725b4b/library/alloc/src/vec/spec_from_iter_nested.rs:23:32
      <alloc::vec::Vec<T> as alloc::vec::spec_from_iter::SpecFromIter<T,I>>::from_iter
             at /rustc/db9d1b20bba1968c1ec1fc49616d4742c1725b4b/library/alloc/src/vec/spec_from_iter.rs:33:9
  13: <alloc::vec::Vec<T> as core::iter::traits::collect::FromIterator<T>>::from_iter
             at /rustc/db9d1b20bba1968c1ec1fc49616d4742c1725b4b/library/alloc/src/vec/mod.rs:2549:9
      core::iter::traits::iterator::Iterator::collect
             at /rustc/db9d1b20bba1968c1ec1fc49616d4742c1725b4b/library/core/src/iter/traits/iterator.rs:1745:9
      tangram_core::train::Trainer::train_grid
             at /home/grayshade/tangram/crates/core/train.rs:246:33
  14: tangram::train::train::{{closure}}
             at /home/grayshade/tangram/crates/cli/train.rs:100:33
  15: std::panicking::try::do_call
             at /rustc/db9d1b20bba1968c1ec1fc49616d4742c1725b4b/library/std/src/panicking.rs:406:40
      std::panicking::try
             at /rustc/db9d1b20bba1968c1ec1fc49616d4742c1725b4b/library/std/src/panicking.rs:370:19
      std::panic::catch_unwind
             at /rustc/db9d1b20bba1968c1ec1fc49616d4742c1725b4b/library/std/src/panic.rs:133:14
      tangram::train::train
             at /home/grayshade/tangram/crates/cli/train.rs:37:15
  16: tangram::main
             at /home/grayshade/tangram/crates/cli/main.rs:170:30
  17: core::ops::function::FnOnce::call_once
             at /rustc/db9d1b20bba1968c1ec1fc49616d4742c1725b4b/library/core/src/ops/function.rs:227:5
      std::sys_common::backtrace::__rust_begin_short_backtrace
             at /rustc/db9d1b20bba1968c1ec1fc49616d4742c1725b4b/library/std/src/sys_common/backtrace.rs:123:18
  18: std::rt::lang_start::{{closure}}
             at /rustc/db9d1b20bba1968c1ec1fc49616d4742c1725b4b/library/std/src/rt.rs:145:18
  19: core::ops::function::impls::<impl core::ops::function::FnOnce<A> for &F>::call_once
             at /rustc/db9d1b20bba1968c1ec1fc49616d4742c1725b4b/library/core/src/ops/function.rs:259:13
      std::panicking::try::do_call
             at /rustc/db9d1b20bba1968c1ec1fc49616d4742c1725b4b/library/std/src/panicking.rs:406:40
      std::panicking::try
             at /rustc/db9d1b20bba1968c1ec1fc49616d4742c1725b4b/library/std/src/panicking.rs:370:19
      std::panic::catch_unwind
             at /rustc/db9d1b20bba1968c1ec1fc49616d4742c1725b4b/library/std/src/panic.rs:133:14
      std::rt::lang_start_internal::{{closure}}
             at /rustc/db9d1b20bba1968c1ec1fc49616d4742c1725b4b/library/std/src/rt.rs:128:48
      std::panicking::try::do_call
             at /rustc/db9d1b20bba1968c1ec1fc49616d4742c1725b4b/library/std/src/panicking.rs:406:40
      std::panicking::try
             at /rustc/db9d1b20bba1968c1ec1fc49616d4742c1725b4b/library/std/src/panicking.rs:370:19
      std::panic::catch_unwind
             at /rustc/db9d1b20bba1968c1ec1fc49616d4742c1725b4b/library/std/src/panic.rs:133:14
      std::rt::lang_start_internal
             at /rustc/db9d1b20bba1968c1ec1fc49616d4742c1725b4b/library/std/src/rt.rs:128:20
  20: main
  21: __libc_start_call_main
  22: __libc_start_main@GLIBC_2.2.5
  23: _start
@deciduously deciduously added the bug Something isn't working label Feb 21, 2022
@lnicola
Copy link
Author

lnicola commented Feb 21, 2022

At a first look, I don't see how the sender can be None on drop, but I'm probably missing something. Anyway, this happens on one dataset I have, if I specify both --file-train and --file-test and a config file.

@isabella
Copy link
Contributor

And it is not happening when you just pass a --file-train and --file-test without a config file?

@lnicola
Copy link
Author

lnicola commented Feb 21, 2022

Yeah, but then it trains the wrong thing (as per my previous question from today). I suppose there's something wrong with my config file. I have the same classes in both the training and validation file, at least.

@isabella
Copy link
Contributor

It's just bizarre that its a SIGILL. What is the version of tangram that you are using?

@lnicola
Copy link
Author

lnicola commented Feb 21, 2022

Both the latest release and a git build fail in the same way. SIGILL is an abort, because a thread panicked while panicking (I updated my original comment).

EDIT: I removed my comment below. The test command line was tangram train --file-train training_ss.csv --file-test validation_ss.csv -t CTnumL4A -o t.tangram --config config.json.

@isabella
Copy link
Contributor

I meant the format of your config file shouldnt cause it. We can try and debug this over a video call. Can you join our discord https://discord.gg/fqyvVMsJ

@isabella
Copy link
Contributor

No problem, I am able to reproduce this on my end. I'll start digging in and let you know what I find.

@isabella
Copy link
Contributor

There is definitely an issue with the progress bar. I'm looking into this more. in the meantime, you can train a model by passing the flag --no-progress

@isabella
Copy link
Contributor

Hi @lnicola I fixed the issue. The problem was that we were using the train_row_count as the total for the progress bar when it was in fact the test_row_count that was the total which caused a value that we assumed to be positive to be negative. The eta was negative and the following line caused the panic https://github.com/tangramdotdev/tangram/blob/47340b8de905399912dfb4d181e9a45025c403c8/crates/cli/train.rs#L605

The issue is fixed on the main branch. The same bug should have been hit with regression but because progress draws on a timer and the regression code path was faster, the progress bar didn't get a chance to draw and so that code path was not hit.

@lnicola
Copy link
Author

lnicola commented Feb 21, 2022

Thanks, it's working now. I just had to make a small change for it to build:

diff --git i/crates/cli/train.rs w/crates/cli/train.rs
index 874d77c..cc7836b 100644
--- i/crates/cli/train.rs
+++ w/crates/cli/train.rs
@@ -97,8 +97,7 @@ pub fn train(args: TrainArgs) -> Result<()> {
                        }
                };
                let kill_chip = unsafe { ctrl_c::register_ctrl_c_handler()? };
-               let train_grid_item_outputs =
-                       trainer.train_grid(Some(kill_chip), &mut handle_progress_event)?;
+               let train_grid_item_outputs = trainer.train_grid(kill_chip, &mut handle_progress_event)?;
                unsafe { ctrl_c::unregister_ctrl_c_handler()? };
                if kill_chip.is_activated() {
                        if let Some(progress_thread) = progress_thread.as_mut() {

@lnicola lnicola closed this as completed Feb 21, 2022
@isabella
Copy link
Contributor

yes, my bad! Thank you :)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

3 participants