Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

Already on GitHub? Sign in to your account

uncomment the "register" line in the pig script #4

Open
jendap opened this Issue Aug 31, 2012 · 5 comments

Comments

Projects
None yet
3 participants

jendap commented Aug 31, 2012

Otherwise the ExtractSizes udf function is not found / resolved.

Contributor

traviscrawford commented Aug 31, 2012

Hey jendap, thanks for taking a look at HDFS-DU!

The commented-out register line is intentional, actually, because this allows the unit-test to use that script directly, instead of having a copy somewhere that could get out of date. Also, when users run this I don't know where they will have copied the UDF jar, and that likely need to be set per-environment.

Can you just uncomment the line and set to whatever is an appropriate path for your environment?

Hm. We could have a parametrized register, with a default value. The unit test would be able to reset that value, and we could tell users to set it I they move the jar -- that way the script doesn't need modification.

On Aug 31, 2012, at 8:41 AM, Travis Crawford notifications@github.com wrote:

Hey jendap, thanks for taking a look at HDFS-DU!

The commented-out register line is intentional, actually, because this allows the unit-test to use that script directly, instead of having a copy somewhere that could get out of date. Also, when users run this I don't know where they will have copied the UDF jar, and that likely need to be set per-environment.

Can you just uncomment the line and set to whatever is an appropriate path for your environment?


Reply to this email directly or view it on GitHub.

Contributor

traviscrawford commented Aug 31, 2012

Since the unit test has the UDF class already on the classpath the register is not needed.

Any clue how Pig behaves if registering either a fake path, or no path at all?

dvryaboy commented Sep 4, 2012

You can register '/dev/null', seems to work ok :).

D

On Fri, Aug 31, 2012 at 8:49 AM, Travis Crawford
notifications@github.comwrote:

Since the unit test has the UDF class already on the classpath the
register is not needed.

Any clue how Pig behaves if registering either a fake path, or no path at
all?


Reply to this email directly or view it on GitHubhttps://github.com/twitter/hdfs-du/issues/4#issuecomment-8196264.

Contributor

traviscrawford commented Sep 5, 2012

This fails with ERROR 4002: Can't read file: /doesnotexist.pig

register /doesnotexist.pig;
a = load '/etc/hosts' using PigStorage();
dump a;

For now I'd like to keep this as-is, and ask users to uncomment that line, setting to whereever they put the UDF jar.

What would be an awesome pull request is removing the need for this Pig script entirely, instead adding an OfflineImageViewer-based tool that generates the dataset directly. This pig script was super useful in development when we didn't know what data we needed, but now that we know what dataset to produce we could simply dump it directly when parsing the fsimage.

Thoughts?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment