Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

syntaxnet: Enable building Parsey McParseface in one model to be served in TF serving #250

Closed
wants to merge 16 commits into from

Conversation

dmansfield
Copy link
Contributor

At a high level, this PR enables the entire Parsey McParseface model to be built in a single model, exported, and served with TF serving.

This required two main changes:

  1. The ability to use tensor input instead of stdin/stdout/files for feeding "text" input. This is really two changes - the first is to be able to pass serialized sentence protos into BeamParseReader (instead of having it read and parse text). This is necessary to be able to connect the "output" of brain-tagger to the "input" of brain-parser. The second part is to be able to feed "text" input to DocumentSource. This is necessary if the sentences to be parsed are coming in via grpc as in TF serving, for example.
  2. An actual script to build and export the Parsey model all in one session. To accomplish this a new script is created, parsey_mcparseface.py. This script "replaces" demo.sh by building both halves of the model in a single session using name_scope/variable_scope to separate the tagger and parser. It exports the model (3 different ways) so it can be imported in a session bundle by TF serving (as well as viewed by tensorboard, and thirdly as a "pbtxt" file for manual inspection).

This PR has the following issues/limitations:

  1. The interface to BeamParseReader and DocumentSource has changed in an incompatible way. The input tensor (which may be used to pass text/sentence protos) is required, even when not used. The decision to use is controlled by an attribute.
  2. The model, once loaded in TF serving, does not seem to be threadsafe.
  3. The script, parsey_mcparseface.py, currently has an option of where to export to, but does not function in a meaningful way if the option is not given (it just parses a hard-coded sentence in the source code in this case).

@flaviovdf
Copy link

This is great work, it will help me out on some projects. Congratulating and subscribing 👍

@arvindsg
Copy link

@dmansfield Hi, I was trying to test your fork, but getting error no such package '@protobuf//': error loading package 'external'

One replacing the same with //tensorflow/google/protobuf on line 16 of syntaxnet.bzl , while this error is resolved, but then i get a similar error on Build file line 36
Can you please help

@apodolny
Copy link

I've successfully built your code, but can't run the parsey_mcparseface.py script from bazel-bin because it cannot import tensorflow_serving. Does tensorflow_serving have to be added to the build file as a dependency?

from tensorflow_serving.session_bundle import exporter ImportError: No module named tensorflow_serving.session_bundle

@dmansfield
Copy link
Contributor Author

@apodolny . Oops, yes. In fact I have TF serving built in a parallel directory and then I've used PYTHONPATH environment variable to pull in the python dependencies (very bad and not "bazel"-ish). That's a TODO I guess.

I'm not really happy with the complex dependencies between these three TF projects (TF itself, models and serving), and spent quite a lot of time sorting it out and getting the various projects in sync in terms of git checkout level (submodules etc).

I'll think a bit more about this and see if there's something better. Maybe "Parsey for export" should be a standalone project.

@arvindsg
Copy link

@dmansfield i was able to ˀbuild parey_mcparseface and export model properly. Can you also put some light on the serving code to load the model. I am getting segmentation fault event after adding parser_ops.so to build path with backtrace
0x00007ffff6255c37 in __GI_raise (sig=sig@entry=6)
at ../nptl/sysdeps/unix/sysv/linux/raise.c:56
56 ../nptl/sysdeps/unix/sysv/linux/raise.c: No such file or directory.
(gdb) bt
#0 0x00007ffff6255c37 in __GI_raise (sig=sig@entry=6)
at ../nptl/sysdeps/unix/sysv/linux/raise.c:56
#1 0x00007ffff6259028 in __GI_abort () at abort.c:89
#2 0x00007ffff780d514 in tensorflow::internal::LogMessageFatal::~LogMessageFatal() ()
from bazel-out/local-fastbuild/bin/_solib_k8/_U_S_Stensorflow_Userving_Sexample_Cmnist_Uinference___Uexternal_Shacky_Usyntaxnet/parser_ops.so
#3 0x000000000047e0f6 in main ()

@arvindsg
Copy link

@apodolny yes you need to include tensorflow_serving to dependencies, i did it by setting env variable PYTHONPATH to include tensorflow_serving path before running parsey_mcparseface script

@dmansfield
Copy link
Contributor Author

@arvindsg I'm going to create a new repository with a sample client/server that works with an exported model. Stay tuned...

@abscondment
Copy link

Subscribing.

SentenceBatch accepts a vector of Sentence protos to use instead
of reading from the configured corpus. When using a feed, the corpus
will be ignored and only the fed sentences will be used. Supports
"rewind" on the feed.
… protos to parse

Add an input Tensor to BeamParseReader to provide serialised sentence protos to parse.
Documents may come from the input tensor (e.g. read by a DocumentSource op)
or read directly from the corpus as defined in the task context. The decision
of the approach should is explicit based on attributes during graph configuration.
If constructed with a vector of strings, TextReader will use a
new "RandomAccessFile" implementation, VectorIn (cloned from the
StdIn implementation) to read the strings from the vector. The
format of the strings is still determined by the corpus configured
in the task context.
The DocumentSource requires a text input tensor to be provided, even when
the text is to be read from the corpus, because tensors cannot be
optional. The value of the tensor is unused.
If the document source is provided, it will be used and the parser will
not read from the corpus. If it is provided, the document protos will be
consumed from the provided op and the configured corpus will be ignored
This new script replaces demo.sh for running Parsey McParseface model.
It takes a different approach from demo.sh which is that it builds the
entire operation chain (tagging AND parsing) in a single model, complete
with a document source and document sink. The script also exports the model
in various ways, including using the TF Serving exporter, such that the
model can be served using TF Serving
@dmansfield
Copy link
Contributor Author

dmansfield commented Jul 22, 2016

I rebased against master to fix the conflicts, also included the exporting of the "assets" in the export. The "assets" are the files that the graph needs at runtime, which in this case is the entire models/syntaxnet/syntaxnet/models/parsey_mcparseface directory.

The missing assets are probably the cause of the segfault experienced by @arvindsg (even just a symlink directly into the syntaxnet tree will fix this).

I've just rebuilt everything and re-exported and tested and it works for me. Note: I'm building on-top of master of tensorflow, serving and models (models+this PR) in my environment.

I've created this project:

https://github.com/dmansfield/parsey-mcparseface-api

Which includes a working TF Serving server and a test client (nodejs). As of now I've committed the exported model and it's in the checkout. There's still a lot to do though! Please let me know if you get any further.

@arvindsg
Copy link

arvindsg commented Jul 24, 2016

@dmansfield Thank you, I was able to get everything working after a fresh clone, segmentation fault was due to version mismatch of tensor flow when i exported and when i was trying to serve.

@dmansfield
Copy link
Contributor Author

Yes, I had that segfault too.

On Jul 24, 2016 3:56 AM, "arvindsg" notifications@github.com wrote:

@dmansfield https://github.com/dmansfield Thank you, I was able to get
everything working after a fresh clone, segmentation fault was due to
version mismatch tensor flow when i exported and when i was trying to
serve.


You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
#250 (comment),
or mute the thread
https://github.com/notifications/unsubscribe-auth/AAobI96fSnvW5nCiO2IVOSRgG44W44Ylks5qYxqjgaJpZM4JBZeh
.

@drpngx
Copy link
Contributor

drpngx commented Oct 19, 2016

Is this still relevant? If so, pull rebase?

@drpngx drpngx added the stat:awaiting response Waiting on input from the contributor label Oct 19, 2016
@dmansfield
Copy link
Contributor Author

@drpngx I will rebase.

@xiamx
Copy link

xiamx commented Oct 25, 2016

@dmansfield can't wait to see this merge 👍

@johnb30
Copy link

johnb30 commented Feb 7, 2017

Anyone updates on this? Would love to see it merged!

@darrengarvey
Copy link

To try this out, I rebased this on master, fixed a couple of conflicts and added a couple of tiny bits on top. For reference, it's here.

This next bit is a bit off topic, but is the reason I found this PR:

I tried to use parsey without needing all of the sources or running scripts from within the source. It's been a while since this PR was updated, but in general I'm not sure if this approach, or using Dragnn is the best way to go here? I spent some time and found it quite tricky trying to use parsey as if it was a library; for now I gave up trying to hack parsey into dragnn, even though the example code might be very close to what's needed...

This PR was the best approach I actually got working without running from the source tree (thanks @dmansfield!), but it seems like it still needs a bit of work to make parsey available from other applications.

@skerit
Copy link

skerit commented Mar 29, 2017

I'm not able to get this to work. Is this still compatible with @dmansfield his parsey-mcparseface-api repository? Would this till build under the latest bazel? Or is there another way to keep the program running?

@drpngx
Copy link
Contributor

drpngx commented Jun 14, 2017

@skerit could you pull/rebase?

@nealwu is there an active owner that could look into this?

@nealwu
Copy link
Contributor

nealwu commented Jun 14, 2017

@drpngx thanks for checking in. The active owners would be @bogatyy and @calberti for the syntaxnet model.

@karmel karmel requested a review from calberti February 22, 2018 18:02
@karmel
Copy link
Contributor

karmel commented Feb 22, 2018

@calberti Can you take a look at this PR and the related issue #687 ?

@tensorflow-jenkins
Copy link
Collaborator

Can one of the admins verify this patch?

@tfboyd
Copy link
Member

tfboyd commented Jun 17, 2019

Closing very old PRs. I do not like seeing work lost. I and others are working to accept any changes to /official; but changes to other directories require the original owner to take action. I am sorry this change died/stalled.

I am copy and pasting this message in each PR I a closing (yes kind of like an automated response); and I want to say I feel this way about each PR and wish they had ended with a merge or some other more positive finality.

Thank you for your contribution and time.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
cla: yes stat:awaiting maintainer Waiting on input from the maintainer
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet