Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

OpenNMT-py #108

Open
boegel opened this issue Mar 13, 2023 · 3 comments
Open

OpenNMT-py #108

boegel opened this issue Mar 13, 2023 · 3 comments
Assignees
Labels
difficulty: easy software that should be easy to support GPU new new software priority: high Python site:ugent Software installation request for UGent Tier-2 sources-only Only sources available (no conda, binaries, container)

Comments

@boegel
Copy link
Contributor

boegel commented Mar 13, 2023

@boegel boegel added difficulty: easy software that should be easy to support new new software priority: high Python site:ugent Software installation request for UGent Tier-2 GPU labels Mar 13, 2023
@boegel boegel self-assigned this Mar 14, 2023
@boegel
Copy link
Contributor Author

boegel commented Mar 14, 2023

Currently stuck on:

/tmp/vsc40023/easybuild_build/Tokenizer/1.37.1/foss-2022a/Tokenizer-1.37.1/src/SentencePiece.cc: In member function virtual void onmt::SentencePiece::set_vocabulary(const std::vector<std::__cxx11::basic_string<char> >&, const onmt::Tokenizer::Options*):
/tmp/vsc40023/easybuild_build/Tokenizer/1.37.1/foss-2022a/Tokenizer-1.37.1/src/SentencePiece.cc:57:45: error: cannot convert const std::vector<std::__cxx11::basic_string<char> > to const std::vector<std::basic_string_view<char> >&
   57 |     auto status = _processor->SetVocabulary(vocabulary);
      |                                             ^~~~~~~~~~
      |                                             |
      |                                             const std::vector<std::__cxx11::basic_string<char> >
In file included from /tmp/vsc40023/easybuild_build/Tokenizer/1.37.1/foss-2022a/Tokenizer-1.37.1/src/SentencePiece.cc:3:
/user/gent/400/vsc40023/eb_arcaninescratch/RHEL8/skylake-ib/software/SentencePiece/0.1.97-GCC-11.3.0/include/sentencepiece_processor.h:279:45: note:   initializing argument 1 of virtual sentencepiece::util::Status sentencepiece::SentencePieceProcessor::SetVocabulary(const std::vector<std::basic_string_view<char> >&)
  279 |       const std::vector<absl::string_view> &valid_vocab);
      |       ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~^~~~~~~~~~~
make[2]: *** [CMakeFiles/OpenNMTTokenizer.dir/build.make:135: CMakeFiles/OpenNMTTokenizer.dir/src/SentencePiece.cc.o] Error 1

@boegel
Copy link
Contributor Author

boegel commented Mar 16, 2023

compiler problem for Tokenizer fixed with patch included in 08dcce9

see also OpenNMT/Tokenizer#323

@Micket
Copy link

Micket commented Mar 16, 2023

I think you can simplify the patch to just do SetVocabulary(ToPieceArray(vocabulary)); since that's what SentencePiece themselves do in this situation google/sentencepiece@631420b#diff-77e6a3b3bfda73d84fe1fef8205f2a2ec1d46b8f232100041f7135505f8adcefR217
Function should be defined in the same header that's already included.

It's a bit better since it the vector allocations up front.

@PetrKralCZ PetrKralCZ added the sources-only Only sources available (no conda, binaries, container) label Jun 12, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
difficulty: easy software that should be easy to support GPU new new software priority: high Python site:ugent Software installation request for UGent Tier-2 sources-only Only sources available (no conda, binaries, container)
Projects
None yet
Development

No branches or pull requests

3 participants