Skip to content

Conversation

@morazow
Copy link
Contributor

@morazow morazow commented Nov 12, 2014

Hello all,

I have a data source partitioned as /base/path/year/month/day/state/. And I want to run Scalding job for one week of data only for a specific state. Reading a week of data using DailySuffixTsv and then filtering would be waste of resources. Therefore, it would be great to have DailyPrefixSuffixTsv.

Moreover, there should be extra "/" between TimePathedSource.YEAR_MONTH_DAY and suffixTemplate in DailyPrefixSuffixSource. Otherwise, it tries to read /base/path/year/month/daysuffixTemplate/, which is not intended path.

* Add DailyPrefixSuffixTsv tap. It is useful for partitions such as /base/path/year/month/day/suffix/.
@ianoc
Copy link
Collaborator

ianoc commented Nov 14, 2014

The suffixTemplate should just have a "/" to start it, then it will operate fine. (Its how the source is used generally). We should put a require statement into the constructor to enforce this though to avoid future issues.

Source addition itself looks fine to me though

@morazow
Copy link
Contributor Author

morazow commented Nov 17, 2014

@ianoc Thanks for the feedback.

Yes, I agree. It makes perfect sense to start suffix with "/". However, I am not quite sure how to put require/assert for that. I saw, there is check in TimePathedSource,

//Write to the path defined by the end time:
override def hdfsWritePath = {
// TODO this should be required everywhere but works on read without it
// maybe in 0.9.0 be more strict
assert(pattern.takeRight(2) == "/*", "Pattern must end with /* " + pattern)
...

I was thinking something like, require(suffixTemplate.charAt(0) == '/', "suffixTemplate should start with /"), but I do not know where to put it because it goes concatenated as a pattern to TimePathedSource.

@ianoc
Copy link
Collaborator

ianoc commented Dec 2, 2014

you could add the require to the body of those classes. It will do the check as part of instanciating it then I think

@CLAassistant
Copy link

CLAassistant commented Jul 18, 2019

CLA assistant check
Thank you for your submission! We really appreciate it. Like many open source projects, we ask that you sign our Contributor License Agreement before we can accept your contribution.
You have signed the CLA already but the status is still pending? Let us recheck it.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants