Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Read and writable partitioned sources #969

Merged
merged 3 commits into from Jul 28, 2014
Merged

Read and writable partitioned sources #969

merged 3 commits into from Jul 28, 2014

Conversation

stephanh
Copy link
Contributor

This is an initial implementation of partitioned versions of the TypeDelimited and TextLine sources.

I'll add the tests next.

}
}

// Create the underlying scrooge-parquet scheme and explicitly set the sink fields to be only the
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

looks like these comments are out of sync.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed.

@johnynek
Copy link
Collaborator

This looks great! Thanks for sharing this. Once we have tests and address some minor issues we will merge.

@stephanh
Copy link
Contributor Author

I have added some tests for writing. I'm not sure how to set up tests for reading though. I don't really know how the JobTest works for reading. Does it actually create files on disk?

@johnynek
Copy link
Collaborator

I think you will want to use hadoop-platform test:

https://github.com/twitter/scalding/blob/develop/scalding-hadoop-test/src/test/scala/com/twitter/scalding/platform/PlatformTest.scala

This actually spins up a minicluster and behaves very closely to hadoop.

There are methods in HadoopPlatformJobTest to initialize the sources (and feel free to add one of two if you think it is needed).

By default, JobTest does mocking of the sources and sinks, so it is only testing job logic.

@johnynek
Copy link
Collaborator

Actually, your test looks great. Merging this.

Thanks a ton!

johnynek added a commit that referenced this pull request Jul 28, 2014
Read and writable partitioned sources
@johnynek johnynek merged commit c4e6b82 into twitter:develop Jul 28, 2014
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants