-
Notifications
You must be signed in to change notification settings - Fork 381
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
implement autotranslation with arrow #7577
Comments
Looking forward to it. |
Hello, my name is Murray Davis. I am a developer with AdTheorent (http://adtheorent.com/). I have just made a POC that creates a Pandas DataFrame in a Jupyter Notebook, converts it to an ArrowRecordBatch with PyArrow, and stores the ArrowRecordBatch in a Plasma Object Store, i.e. shared memory. Then, it invokes through Py4J a Scala/Java application to fetch the ArrowRecordBatch from the Plasma Object Store with the Java Plasma Client, convert it into our own DataFrame format, and learn the data with our own real-time Java machine learner. We successfully tested it with multiple Python threads after synchronizing the code segment that uses the Python Plasma Client, since it is not thread-safe. (We are using Arrow 0.9.0.) Our motivation was to improve the performance of a Python Jupyter Notebook when integrated with our Java machine learner. We had found that transmitting serialized Pandas DataFrames through Py4J is a significant bottleneck. Our POC has been successful in significantly improving the speed of our integration. We have very recently become aware of BeakerX, and I have only started to try it out this week. We can see that BeakerX may have the potential to host our Python/Java integration with Arrow Plasma. Does it sound like our work could contribute to the implementation of autotranslation with Arrow? |
(context outside So, again, used in-memory integration and no remoting or copying large data as with with |
followup to #5039
use arrow and shared memory.
The text was updated successfully, but these errors were encountered: