Welcome to Seq2SeqSharp Discussions! #38
Replies: 3 comments 7 replies
-
Hi. How about tokenizing a dataset on the fly like in OpenNMT? Without having to tokenize new text separately. |
Beta Was this translation helpful? Give feedback.
-
Hi @SileNTViP Seq2SeqSharp has supported it. You can try "-SrcSentencePieceModelPath" and "-TgtSentencePieceModelPath" parameters along with SentencePiece model file path in the command line. Here is an example for translation from English to Chinese. .\bin\Seq2SeqConsole\Seq2SeqConsole.exe -Task Test -ModelFilePath .\model\seq2seq_mt_enu_chs.model -InputTestFile .\data\test\test_enu_raw.txt -OutputFile out_chs.txt -MaxTestSrcSentLength 110 -MaxTestTgtSentLength 110 -ProcessorType CPU -SrcSentencePieceModelPath .\spm\enuSpm.model -TgtSentencePieceModelPath .\spm\chsSpm.model -BeamSearchSize 1 -BatchSize 2 -DeviceIds 0,1,2,3 -ShuffleType Random Thanks |
Beta Was this translation helpful? Give feedback.
-
I''ll try train on big dataset and get this: |
Beta Was this translation helpful? Give feedback.
-
👋 Welcome!
We’re using Discussions as a place to connect with other members of our community. We hope that you:
build together 💪.
To get started, comment below with an introduction of yourself and tell us about what you do with this community.
Beta Was this translation helpful? Give feedback.
All reactions