Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

HDFS Filesystem Dump in CSV Output #30

Closed
ldbiz opened this issue Jan 12, 2018 · 2 comments
Closed

HDFS Filesystem Dump in CSV Output #30

ldbiz opened this issue Jan 12, 2018 · 2 comments
Assignees

Comments

@ldbiz
Copy link
Contributor

ldbiz commented Jan 12, 2018

Currently the filesystem dump is in JSONL. We need CSV for some uses; so either a rejig of the original, or an alternative output.

Processes outside the target analysis procedure that use the current format will have to be updated if the format changes, e.g. the Turing load.

@anjackson
Copy link
Contributor

On this point, I switched some of my file list generator tasks to emit CSV using the built-in Python libs: https://github.com/ukwa/ukwa-manage/blob/ingest-ng-phase-2/ukwa/tasks/hadoop/hdfs_tasks.py#L109-L132`

@anjackson
Copy link
Contributor

Done. ListAllFilesOnHDFSToLocalFile now emits CSV.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants