-
Notifications
You must be signed in to change notification settings - Fork 611
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
CRF layer continued #1363
CRF layer continued #1363
Conversation
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
@howl-anderson in tensorflow 2.2, it's possible to override the train_step method. Here is an example with gans: https://twitter.com/fchollet/status/1250622989541838848 Maybe we could keep the CFR layer as implemented here and let the user call The implementation would then be the CRF layer implemented here + a tutorial showing how to override |
Superseded by #1733 |
This is a new take on #377
I started with the implementation of @howl-anderson and tried to make it as keras friendly as possible.
Disclaimer: I don't know anything about CRF or the math behind it, so I won't be able to answer any questions about the maths. This PR is just #377 with the code moved around.
Review process
I suggest that if we agree on the general idea, we merge this and then we can add a docstring and tutorials. I didn't add the layer to the public API so it's fine to merge even if there are some rough edges. We can merge this PR as-is and then let @howl-anderson polish the CRF in other pull requests since 90% of this pull request is his work.
With this new architecture, you need two models, one for training and one for inferences. Both can be saved and loaded normally in the keras format. There seems to be a minor bug when using the
tf
format, I believe that it's a tensorflow bug, I'll open an issue and link it here.minimal example:
There are two models, one for training, one for inference and they share some layers.
One layer is in charged of computing the loss, the loss is returned as a tensor in the network. We then give fake targets (0) and ask
.compile()
to use the MAE to leave the loss tensor untouched.I took the numerical accuracy test from @howl-anderson and the loss is exactly the same :)