Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add Apache Phoenix connector #672

Merged
merged 2 commits into from
May 20, 2019
Merged

Add Apache Phoenix connector #672

merged 2 commits into from
May 20, 2019

Conversation

vincentpoon
Copy link
Member

@vincentpoon
Copy link
Member Author

vincentpoon commented Apr 25, 2019

This is continuing from #76 since the GitHub thread there got too large

The deprecated ConnectorTableLayout usage is still there because from discussion in slack, it was determined that there isn't currently a way to get the desiredColumns during split generation.

I also added support for ARRAY type, and added a bunch of tests in TestPhoenixSqlTypeMapping

@dain dain requested review from kokosing and electrum April 25, 2019 21:55
Copy link
Member

@electrum electrum left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm still reviewing, but overall it's looking good. Here's the first set of comments.

presto-docs/src/main/sphinx/connector/phoenix.rst Outdated Show resolved Hide resolved
presto-docs/src/main/sphinx/connector/phoenix.rst Outdated Show resolved Hide resolved
presto-docs/src/main/sphinx/connector/phoenix.rst Outdated Show resolved Hide resolved
presto-docs/src/main/sphinx/connector/phoenix.rst Outdated Show resolved Hide resolved
presto-phoenix/pom.xml Show resolved Hide resolved
presto-phoenix/pom.xml Outdated Show resolved Hide resolved
presto-phoenix/pom.xml Outdated Show resolved Hide resolved
presto-phoenix/pom.xml Outdated Show resolved Hide resolved
Copy link
Member

@electrum electrum left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Still reviewing. Adding another round of mostly minor comments. I still need to review PhoenixMetadata and most of the test code. Overall looking great and should be ready to merge soon.

@vincentpoon
Copy link
Member Author

Is there a known issue with the builds? I'm not able to repro the test failure on my local

Copy link
Member

@electrum electrum left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have completed the review. This is ready to merge after the comments are addressed. Thanks for your patience and all your work on this.

@electrum
Copy link
Member

We have been seeing spurious test failures for presto-tests and PRODUCT_TESTS_ lately. These are not related to your change.

@electrum
Copy link
Member

electrum commented May 16, 2019

The ConcurrentModificationException failure for the Phoenix tests is legitimate. Here is the problem:

  • PhoenixSplit is deserialized on a worker.
    • It contains PhoenixInputSplit via WrappedPhoenixInputSplit.
  • PhoenixInputSplit contains a mutable Scan object.
    • This object is modified via the PhoenixClient.getResultSet() which calls Scan.setAttribute().
  • Before processing the table scan using the split, the engine calls ConnectorSplit.getInfo(), and holds onto it for later usage in logging/stats.
    • The implementation of getInfo() in JdbcSplit returns this, meaning the PhoenixSplit object.
  • At some later point, in a different thread, the engine then JSON serializes the returned info object. In other words, it serializes the PhoenixSplit object.
    • The serialization code in WrappedPhoenixInputSplit calls the Hadoop Writable serialization code in PhoenixInputSplit.
    • This code ends up hitting ConcurrentModificationException, since the attributes map was modified concurrently.

I think the easiest fix is to have PhoenixClient.getResultSet() make a copy of Scan using the copy constructor before modifying it.

@vincentpoon
Copy link
Member Author

Thanks for the review @electrum, I've pushed a new revision that addresses the comments, and the tests are passing now. Really appreciate all your help, I've learned a lot through the whole process.

@electrum electrum merged commit 06a3778 into trinodb:master May 20, 2019
@electrum
Copy link
Member

Merged, thanks!

@electrum electrum added this to the 312 milestone May 20, 2019
@martint
Copy link
Member

martint commented May 20, 2019

Woohoo! Thanks @vincentpoon for your contribution!

@findepi
Copy link
Member

findepi commented May 20, 2019

🎉 🚀

@electrum
Copy link
Member

@vincentpoon Thanks for all your hard work on this. It was a pleasure to work with you and I look forward to seeing future contributions.

@vincentpoon
Copy link
Member Author

Thank you @electrum, and thanks again @findepi and @kokosing for prior reviews. The quality of the codebase speaks to the level of attention to detail in the code review process here. As a result, it's been edifying and a pleasure to work on this, and I eagerly look forward to more contributions.

FYI @lhofhansl @JamesRTaylor @aertoria

@vincentpoon
Copy link
Member Author

Coincidentally, I am giving a talk at NoSql day tomorrow on this connector. Cool that I'll be able to announce that it's been merged!
https://dataworkssummit.com/nosql-day-2019/session/integrating-apache-phoenix-with-distributed-query-engines/

@electrum
Copy link
Member

electrum commented May 21, 2019

@vincentpoon Very cool! You can mention it will be available in the upcoming 312 release at https://prestosql.io/ (within the next week)

@lhofhansl
Copy link
Member

Yeah! Thanks all!

@electrum electrum mentioned this pull request May 29, 2019
6 tasks
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Development

Successfully merging this pull request may close these issues.

5 participants