New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Spring data hdfs #242
Spring data hdfs #242
Conversation
The build issue
|
Some of the dependency stuff can be tricky. Is everything up-to-date on your branch? I can try to make some suggestions by looking at it locally. |
Yes it's up to date, no changes have been made post the PR! Please have a look! |
Here's an initial refinement: shawkins@3385a02 If possible dependencies should be managed in the root pom - this makes it easy to track all versions in use and to share dependencies. @rareddy just aligned things to guava 20, so I've reverted back to that - if hadoop common actually requires version 27, then we need to make sure that's good across everything. Finally I think we're good without the managed version of commons-net, but if that something like that is needed we should be able to just upgrade the Teiid version. |
3385a02
to
3dfcebe
Compare
Thanks for the help!
I used the latest version for Hadoop commons i.e 3.2.1 but version 3.1.2 and below use guava 11.0.2. Can downgrading help? |
To prevent any possible mix up later, add explicit dependencies in the teiid spring data hdfs to the actual jars we want to use:
any of the these we don't actually directly use in our code should be marked with a runtime scope. The configuration we can simplify to just the uri and an optional configuration file resource. That config file will need to get added to the Configuration you construct as a resource - most likely as a classpath resource for spring boot, but in wildfly it may make sense to also have it be on the filesystem. For testing see if you can use one of the mini fs options to create a cluster and test out your basic operations. You'll probably need to make the createfs method protected so that you can inject the local/mock filesystem instead of trying to create a real instance. |
modifying dependencies making code checks pass
50e406f
to
23d2b7b
Compare
@shawkins please have a look and suggest changes! Look at the glob search part carefully, rest seems fine to me! I wasn't able to do a recursive glob search, "**" is not working! |
This seems to resolve all the dependency issues for me: shawkins@006b1f7 It also moves the test class into a package and removes the exception handling - generally in tests you won't need to wrap exceptions.
Can you workaround with multiple explicit directory searches ///*.txt If that's the case we can just call this a known issue. |
@shawkins as suggested I've done the changes. Though the log4j warning still pops up. I would add the moustache tomorrow and carry on with the s3 source |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM. Good work Aditya.
There are a couple of follow on tasks. One is to get this functionality to Teiid Wildfly - https://issues.redhat.com/browse/TEIID-3647 - I will handle that likely by pulling the necessary code from here over to Teiid and have Aditya review the changes. We'll also need a .mustache file and sample project. |
I'll do this part. What would this sample look like? |
some time ago I added the following text for task to be done for every source we add to the Spring Boot https://github.com/teiid/teiid-spring-boot/blob/master/docs/CustomSource.adoc#house-keeping-tasks |
Thanks @rareddy, the doc explains things well. |
The way I had for others is written example, but there may need a step sthat user needs to do like installing the HDFS etc, so we won't be able to test it using Junit, but if someone wants they can follow it they can follow and set it up. |
1.) I did not find the filesystem class to be thread-safe. Source as mentioned in the doc "The implementations of FileSystem shipped with Apache Hadoop do not make any attempt to synchronize access to the working directory field." and was also mentioned on a stack overflow answer.
2) I did not understand the wildcard part of the FIleConnectionImpl (the backward compatibility) and also a little more detail can help, thus I just did a simple glob matching there
3)When listing files I by the understanding of how things are done for fileConnectionImpl made the recursive boolean to false by default.
4)The build will possibly fail as I was getting duplicate classes on the classpath on the build. I'll see as to how to resolve this or would ask for your help in case I'm not able to
Submitting the first draft so that we can have an idea of things ahead!