-
Notifications
You must be signed in to change notification settings - Fork 1.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Combining 2 input sequences with sequence
Combiner
#729
Comments
Your task is exactly the reason why the sequence combiner exists :) It looks like you did everything correctly apart from Let me know if this fixes your issue. (this also suggests me to provide a mechanism to notify the user when wrong keys are used in the model definition, this would have avoided your error to happen if you got notified). I am really glad that people are starting to use those more advanced features in Ludwig! |
@w4nderlust Thank you for your fast reply, the promise of an easy way to use those advanced features was what has drawn me to Ludwig in the first place (after seeing you present at Interspeech 2019). Unfortunately your recommendation does not solve the problem (same issue persists) - likely because As to your other points: I think that ensuring correct user input is an essential addition, as there are few things more frustrating that chasing a configuration typo for half a day ;) You could use something like jsonschema for it - it is pretty easy to write and convenient to use for validation. Alternatively Data Classes in 3.7's PEP 557 could be of use, but maybe its too early for that. I guess there is also a bug in documentation, as this was a copy-paste from the tagger definition. Where it says |
Also it would be awesome to have a clear example for such a use-case in Examples :) |
Nice :)
Let me try to fix it. Could you please provide a complete reproducible example with data, configuration and command? The data can be synthetic data generated using the script in
These are good suggestions, thank you. Although Ludwig's configuration file is highly dynamic, in the sense that it maps directly the parmeters you put in the configuration to parameters in the classes and functions it uses. For that reason, and for the fact that the classes themselves are often changing and new ones are added, so far I preferred not to have a static parsers for the configuration (it's a v0.x still so I know things will change quite a lot still). Moreover, if a use adds their own additional encoder for instance, they would have to modify the parser too, and I want to avoid that. What I plan to do instead is find a simple mechanism that notifies of all the unrecognized keyowrds passed to each function / object, that could be a temporary solution that preserves dynamicity and avoids failing silently like it happens right now.
Fixed it. Although the default value for sequence features is
If you agree I can use yours as an example :) |
this will generate the data
here is the
and here is a command to reproduce
|
Thank you, I found the bug and fixed it. Can you please confirm it now works for you? Also if you confirm your approval, I will add this to both examples and test. |
I will do that first thing after the Wed meeting spree ;) Thank you @w4nderlust ! :) You have my approval to use the examples I provided in any way you like, as they are purely synthetic :) |
@w4nderlust now I get the following error (both on the synthetic example and the real dataset).
i also tried changing the It did work on the sample dataset when I edited "ludwig/models/modules/reduction_modules.py" locally to map when I run it with this fix on the real dataset I get the following
which is the same problem I was running into when trying to use the I think it is unlikely that ludwig is applying some other tokenization on the text than spliting on spaces (which is what I want here) so could it be some other vector is wrongfully concatenated to the end of the text or tag feature? |
Pushed a new commit, it should fix The first issue, sorry about the additional step ;) As for the second issue, that may be data dependent. The logic is that vectors are concatenated along the the third (hidden) dimension, so they should have the first two that match: batch size and sequence length.
If even this does not fix the issue, i can try taking a look at it myself (but i would need at least a couple datapoints for which the problem is originated). |
@w4nderlust I guess we found another bug in the docs then :) as it does say also adding these lines under the I think this is something to underline in the documentation :) If any one has the same problem the following config could be of help:
|
Good catch. yes Glad you were able to train the model! |
Is your feature request related to a problem? Please describe.
I am trying to figure out how to combine 2 input sequences with
sequence
Combiner and I am not sure if this is possible. It would be great if it would ;) Right now I am gettingif I try to put two sequences as inputs to
sequence
combiner.Describe the use case
Consider following (simplified) features where
tag
is the ouput feature.Describe the solution you'd like
I would like to embed text and lexicon series and predict tag with tagger.
So I would like to set up the combiner in a way that the first row that goes into the encoder of the sequence combiner would look like
like this:
Describe alternatives you've considered
I was hoping for this to work out of the box, but now I am not sure if this is supported.
Additional context
The text was updated successfully, but these errors were encountered: