Skip to content

Compare: Home

Showing with 2 additions and 2 deletions.
  1. +2 −2 Home.md
View
4 Home.md
@@ -107,11 +107,11 @@ This is a hacky tool that I wrote and was adopted before I could make it lot mor
### 1. Avoid the second hadoop job which generates gz from bz2 splits.
-As this was more of a hack, I just concatenated the smaller bz2 files (rec...) to generate a larger one and didnt fix the header of the split file. Without that the Unix tools (bunzip2 etc) can read the entire bz2 file and not Hadoop. Another hack is to bunzip and gzip. The ideal way is to generate the bz2 header for split file and append smaller files, but this was just a start.
+As this was more of a hack, I just concatenated the smaller bz2 files (rec...) to generate the split file and the metadata has to be generated appropriately for the split file which I didn't. The Unix tools (bunzip2 etc) can read the entire bz2 file though and not Hadoop. A quicker workaround is to bunzip and gzip and the second job does it. The ideal way is to construct the split file like a proper bz2 file and eliminate the second job and also making the first job faster. Thats in works.
### 2. Write it entirely in Java.
-Currently lots of files are being produced on the local file system. Writing it entirely in Java would ensure smaller files with bz2 header (see #1) can be generated without writing those temporary files to disk. From conceptualization to implementation took about 2 hours and shell makes it damn easy to prototype and its just been running with that. Writing in Java will also run this on Windows.
+Currently lots of files are being produced on the local file system. Writing it entirely in Java would ensure smaller files with appropriate bz2 metadata (see #1) can be generated without writing those temporary files to disk. This tool from conceptualization to implementation took about 2 hours and shell script makes it damn easy to prototype and has been running with that. Writing in Java will also run this on Windows.
## FAQ:
Something went wrong with that request. Please try again.