Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Conditional jump or move depends on uninitialised value #23

Closed
jwijffels opened this issue Sep 1, 2023 · 5 comments
Closed

Conditional jump or move depends on uninitialised value #23

jwijffels opened this issue Sep 1, 2023 · 5 comments

Comments

@jwijffels
Copy link

jwijffels commented Sep 1, 2023

Hello foxik.

In 2020 I've created an R wrapper around the nametag library. It's on CRAN since June 2020 (https://cran.r-project.org/package=nametagger).
Recently the CRAN build system has changed and the build log at CRAN shows that a valgrind issue has appeared at ufal::nametag::utils::lzma::MatchFinder_Create(

==1458363== Memcheck, a memory error detector
==1458363== Copyright (C) 2002-2022, and GNU GPL'd, by Julian Seward et al.
==1458363== Using Valgrind-3.21.0 and LibVEX; rerun with -h for copyright info
==1458363== Command: /data/blackswan/ripley/R/R-devel-vg/bin/exec/R --vanilla
==1458363== 
...
> data(europeananews)
> x <- subset(europeananews, doc_id %in% "enp_NL.kb.bio")
> traindata <- subset(x, sentence_id >  100)
> testdata  <- subset(x, sentence_id <= 100)
> path <- "nametagger-nl.ner" 
> ## Don't show: 
> path <- tempfile("nametagger-nl_", fileext = ".ner")
> traindata <- subset(x, sentence_id >  100 & sentence_id < 300)
> testdata  <- subset(x, sentence_id <= 100)
> ## End(Don't show) 
> opts <- nametagger_options(file = path,
+                            token = list(window = 2),
+                            token_normalisedsuffix = list(window = 0, from = 1, to = 4),
+                            ner_previous = list(window = 2),
+                            time = list(use = TRUE),
+                            url_email = list(url = "URL", email = "EMAIL"))
> ## Don't show: 
> model <- nametagger(x.train = traindata, x.test = testdata,
+                     iter = 1, lambda = 0.5, control = opts)
==1458363== Conditional jump or move depends on uninitialised value(s)
==1458363==    at 0x17E945C7: ufal::nametag::utils::lzma::MatchFinder_Create(ufal::nametag::utils::lzma::CMatchFinder*, unsigned int, unsigned int, unsigned int, unsigned int, ufal::nametag::utils::lzma::ISzAlloc*) (packages/tests-vg/nametagger/src/nametag/src/utils/compressor_save.cpp:540)
==1458363==    by 0x17E96768: LzmaEnc_Alloc (packages/tests-vg/nametagger/src/nametag/src/utils/compressor_save.cpp:2984)
==1458363==    by 0x17E96768: ufal::nametag::utils::lzma::LzmaEnc_AllocAndInit(ufal::nametag::utils::lzma::CLzmaEnc*, unsigned int, ufal::nametag::utils::lzma::ISzAlloc*, ufal::nametag::utils::lzma::ISzAlloc*) [clone .constprop.0] (packages/tests-vg/nametagger/src/nametag/src/utils/compressor_save.cpp:3075)
==1458363==    by 0x17E96C1C: ufal::nametag::utils::lzma::LzmaEnc_MemEncode(void*, unsigned char*, unsigned long*, unsigned char const*, unsigned long, int, ufal::nametag::utils::lzma::ICompressProgress*, ufal::nametag::utils::lzma::ISzAlloc*, ufal::nametag::utils::lzma::ISzAlloc*) (packages/tests-vg/nametagger/src/nametag/src/utils/compressor_save.cpp:3269)
==1458363==    by 0x17E96D00: ufal::nametag::utils::lzma::LzmaEncode(unsigned char*, unsigned long*, unsigned char const*, unsigned long, ufal::nametag::utils::lzma::CLzmaEncProps const*, unsigned char*, unsigned long*, int, ufal::nametag::utils::lzma::ICompressProgress*, ufal::nametag::utils::lzma::ISzAlloc*, ufal::nametag::utils::lzma::ISzAlloc*) (packages/tests-vg/nametagger/src/nametag/src/utils/compressor_save.cpp:3293)
==1458363==    by 0x17E96DC3: ufal::nametag::utils::compressor::save(std::ostream&, ufal::nametag::utils::binary_encoder const&) (packages/tests-vg/nametagger/src/nametag/src/utils/compressor_save.cpp:3320)
==1458363==    by 0x17E87DC6: ufal::nametag::entity_map::save(std::ostream&) const (packages/tests-vg/nametagger/src/nametag/src/ner/entity_map_encoder.cpp:24)
==1458363==    by 0x17E85846: ufal::nametag::bilou_ner_trainer::train(ufal::nametag::ner_ids::ner_id, int, ufal::nametag::network_parameters const&, ufal::nametag::tagger const&, std::istream&, std::istream&, std::istream&, std::ostream&) (packages/tests-vg/nametagger/src/nametag/src/ner/bilou_ner_trainer.cpp:71)
==1458363==    by 0x17E99241: nametag_train(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, char const*, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, int, int, double, double, double, double, int, bool, char const*) (packages/tests-vg/nametagger/src/rcpp_nametag.cpp:189)
==1458363==    by 0x17EA3C06: _nametagger_nametag_train (packages/tests-vg/nametagger/src/RcppExports.cpp:63)
==1458363==    by 0x4A3B59: R_doDotCall (svn/R-devel/src/main/dotcode.c:927)
==1458363==    by 0x4A4203: do_dotcall (svn/R-devel/src/main/dotcode.c:1551)
==1458363==    by 0x4DD026: bcEval (svn/R-devel/src/main/eval.c:7567)
==1458363==  Uninitialised value was created by a heap allocation
==1458363==    at 0x48432F3: operator new[](unsigned long) (/builddir/build/BUILD/valgrind-3.21.0/coregrind/m_replacemalloc/vg_replace_malloc.c:714)
==1458363==    by 0x17E961C7: ufal::nametag::utils::lzma::LzmaEnc_Create(ufal::nametag::utils::lzma::ISzAlloc*) (packages/tests-vg/nametagger/src/nametag/src/utils/compressor_save.cpp:2769)
==1458363==    by 0x17E96C7B: ufal::nametag::utils::lzma::LzmaEncode(unsigned char*, unsigned long*, unsigned char const*, unsigned long, ufal::nametag::utils::lzma::CLzmaEncProps const*, unsigned char*, unsigned long*, int, ufal::nametag::utils::lzma::ICompressProgress*, ufal::nametag::utils::lzma::ISzAlloc*, ufal::nametag::utils::lzma::ISzAlloc*) (packages/tests-vg/nametagger/src/nametag/src/utils/compressor_save.cpp:3283)
==1458363==    by 0x17E96DC3: ufal::nametag::utils::compressor::save(std::ostream&, ufal::nametag::utils::binary_encoder const&) (packages/tests-vg/nametagger/src/nametag/src/utils/compressor_save.cpp:3320)
==1458363==    by 0x17E87DC6: ufal::nametag::entity_map::save(std::ostream&) const (packages/tests-vg/nametagger/src/nametag/src/ner/entity_map_encoder.cpp:24)
==1458363==    by 0x17E85846: ufal::nametag::bilou_ner_trainer::train(ufal::nametag::ner_ids::ner_id, int, ufal::nametag::network_parameters const&, ufal::nametag::tagger const&, std::istream&, std::istream&, std::istream&, std::ostream&) (packages/tests-vg/nametagger/src/nametag/src/ner/bilou_ner_trainer.cpp:71)
==1458363==    by 0x17E99241: nametag_train(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, char const*, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, int, int, double, double, double, double, int, bool, char const*) (packages/tests-vg/nametagger/src/rcpp_nametag.cpp:189)
==1458363==    by 0x17EA3C06: _nametagger_nametag_train (packages/tests-vg/nametagger/src/RcppExports.cpp:63)
==1458363==    by 0x4A3B59: R_doDotCall (svn/R-devel/src/main/dotcode.c:927)
==1458363==    by 0x4A4203: do_dotcall (svn/R-devel/src/main/dotcode.c:1551)
==1458363==    by 0x4DD026: bcEval (svn/R-devel/src/main/eval.c:7567)
==1458363==    by 0x4F595F: Rf_eval (svn/R-devel/src/main/eval.c:1146)
==1458363== 
> ## End(Don't show)

Although in the R package (at https://github.com/bnosac/nametagger) I use commit 598666b of nametag, I think the issue is still there in the current version of nametag.
Not sure how difficult it is to fix this.

@foxik
Copy link
Member

foxik commented Sep 3, 2023

Hi,

the issue is in the public-domain code performing LZMA (de)compression (we use it to compress the parts of the models where it make sense). The same code is also in UDPipe 1, MorphoDiTa, ... .

As you say, I am not sure how difficult will be to find the cause (given that it is someone else's code), but I will try to do it next week (and do minor releases with the fix).

@jwijffels
Copy link
Author

That would be great! Thanks for the effort.

@foxik
Copy link
Member

foxik commented Sep 8, 2023

Hi,

after investigation, I believe it is just a false positive report by valgrind (the code actually does read an uninitialized value, but the conditional jump is guarded by other conditions, so the program behavior cannot be influenced by the unitialized value). I tried to fix the problem by ufal/cpp_utils@fc215c8 (on g++-10 and clang-11 the valgrind report disappeared) and then updated cpp_utils in nametag in the lastest commit 294466f.

Could you please try building the nametag R wrapper to verify the problem is really solved? Thanks!

@jwijffels
Copy link
Author

Many thanks for the fixes and your time. I've incorporated the changes and uploaded to CRAN. It's currently under CRAN review.
I tried to reproduce the valgrind issue myself but it depends clearly on valgrind 3.21.0 and I don't have an instrumented build of R-devel compiled with valgrind level 2 instrumentation alongside valgrind 3.21.0 to be able to reproduce it.

What I did already got as feedback from CRAN is that I had to replace std::interator as it is deprecated in C++17 and future compilers will not support C++11. I'll make a new issue for this if you don't mind.

I'll keep you updated here if I have more feedback from CRAN.

Found the following significant warnings:
  ./nametag/src/unilib/utf8.h:148:52: warning: 'template<class _Category, class _Tp, class _Distance, class _Pointer, class _Reference> struct std::iterator' is deprecated [-Wdeprecated-declarations]
  ./nametag/src/unilib/utf8.h:180:52: warning: 'template<class _Category, class _Tp, class _Distance, class _Pointer, class _Reference> struct std::iterator' is deprecated [-Wdeprecated-declarations]
  nametag/src/unilib/utf16.h:116:53: warning: 'template<class _Category, class _Tp, class _Distance, class _Pointer, class _Reference> struct std::iterator' is deprecated [-Wdeprecated-declarations]
  nametag/src/unilib/utf16.h:148:53: warning: 'template<class _Category, class _Tp, class _Distance, class _Pointer, class _Reference> struct std::iterator' is deprecated [-Wdeprecated-declarations]
  nametag/src/unilib/utf8.h:148:52: warning: 'template<class _Category, class _Tp, class _Distance, class _Pointer, class _Reference> struct std::iterator' is deprecated [-Wdeprecated-declarations]
  nametag/src/unilib/utf8.h:180:52: warning: 'template<class _Category, class _Tp, class _Distance, class _Pointer, class _Reference> struct std::iterator' is deprecated [-Wdeprecated-declarations]

@jwijffels
Copy link
Author

The R package was accepted on CRAN yesterday including the fix for this issue which incorporated your changes of 294466f.
Yesterday CRAN tested as well with valgrind and the fix solved the problem.
Many thanks again!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants