Skip to content

HTTPS clone URL

Subversion checkout URL

You can clone with
or
.
Download ZIP

Compare: Home

Showing with 2 additions and 2 deletions.
  1. +2 −2 Home.md
View
4 Home.md
@@ -107,11 +107,11 @@ This is a hacky tool that I wrote and was adopted before I could make it lot mor
### 1. Avoid the second hadoop job which generates gz from bz2 splits.
-As this was more of a hack, I just concatenated the smaller bz2 files (rec...) to generate a larger one and the metadata of the smaller file has to be set appropriately. The Unix tools (bunzip2 etc) can read the entire bz2 file though and not Hadoop. A quicker workaround is to bunzip and gzip and the second job does it. The ideal way is to construct the split file like a proper bz2 file. Thats in works.
+As this was more of a hack, I just concatenated the smaller bz2 files (rec...) to generate the split file and the metadata has to be generated appropriately for the split file which I didn't. The Unix tools (bunzip2 etc) can read the entire bz2 file though and not Hadoop. A quicker workaround is to bunzip and gzip and the second job does it. The ideal way is to construct the split file like a proper bz2 file and eliminate the second job and also making the first job faster. Thats in works.
### 2. Write it entirely in Java.
-Currently lots of files are being produced on the local file system. Writing it entirely in Java would ensure smaller files with appropriate bz2 metadata (see #1) can be generated without writing those temporary files to disk. From conceptualization to implementation took about 2 hours and shell makes it damn easy to prototype and its just been running with that. Writing in Java will also run this on Windows.
+Currently lots of files are being produced on the local file system. Writing it entirely in Java would ensure smaller files with appropriate bz2 metadata (see #1) can be generated without writing those temporary files to disk. This tool from conceptualization to implementation took about 2 hours and shell script makes it damn easy to prototype and has been running with that. Writing in Java will also run this on Windows.
## FAQ:
Something went wrong with that request. Please try again.