implement autotranslation with arrow #7577

scottdraves · 2018-06-26T03:15:28Z

followup to #5039

use arrow and shared memory.

dclong · 2018-06-26T04:18:53Z

Looking forward to it.

MurrayAdth · 2018-08-17T17:26:29Z

Hello, my name is Murray Davis. I am a developer with AdTheorent (http://adtheorent.com/). I have just made a POC that creates a Pandas DataFrame in a Jupyter Notebook, converts it to an ArrowRecordBatch with PyArrow, and stores the ArrowRecordBatch in a Plasma Object Store, i.e. shared memory. Then, it invokes through Py4J a Scala/Java application to fetch the ArrowRecordBatch from the Plasma Object Store with the Java Plasma Client, convert it into our own DataFrame format, and learn the data with our own real-time Java machine learner.

We successfully tested it with multiple Python threads after synchronizing the code segment that uses the Python Plasma Client, since it is not thread-safe. (We are using Arrow 0.9.0.)

Our motivation was to improve the performance of a Python Jupyter Notebook when integrated with our Java machine learner. We had found that transmitting serialized Pandas DataFrames through Py4J is a significant bottleneck.

Our POC has been successful in significantly improving the speed of our integration.

We have very recently become aware of BeakerX, and I have only started to try it out this week. We can see that BeakerX may have the potential to host our Python/Java integration with Arrow Plasma.

Does it sound like our work could contribute to the implementation of autotranslation with Arrow?

SemanticBeeng · 2018-11-09T14:07:11Z

"found that transmitting serialized Pandas DataFrames through Py4J is a significant bottleneck."

(context outside beakerx) I used jep https://github.com/ninia/jep NDArray to share zero copy java NIO buffers between JVM (off-heap memory) getting data from spark and calling Python which is doing no IO.

So, again, used in-memory integration and no remoting or copying large data as with with py4j.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

implement autotranslation with arrow #7577

implement autotranslation with arrow #7577

scottdraves commented Jun 26, 2018

dclong commented Jun 26, 2018

MurrayAdth commented Aug 17, 2018

SemanticBeeng commented Nov 9, 2018 •

edited

Loading

implement autotranslation with arrow #7577

implement autotranslation with arrow #7577

Comments

scottdraves commented Jun 26, 2018

dclong commented Jun 26, 2018

MurrayAdth commented Aug 17, 2018

SemanticBeeng commented Nov 9, 2018 • edited Loading

SemanticBeeng commented Nov 9, 2018 •

edited

Loading