GitHub - yjk01/FileCompression: Repository of two compression methods (Huffman, LZW) added with the ability to decompress and archive files.

I. Design

A. Huffman Encoding

Algorithm Theory:
- Algorithm used to compress data without losing information (i.e., lossless compression).
- Begin by counting the frequency of characters in the input data.
- Create a binary tree where each leaf node represents a character, and the path from the root to the leaf corresponds to the binary code of the character.
- Once the tree is completed, traverse the tree and assign a binary code to each character. Traversing left is '0', and right is '1'.
- Encode input data by replacing each character with its corresponding Huffman code.
- The Huffman tree is stored first, followed by the encoded information.
Trade-offs:
- Compression process can be computationally intensive for large datasets.
- Requires knowledge of the entire input data before constructing the tree.

B. LZW Encoding

Algorithm Theory:
- Algorithm used to compress data without losing information (i.e., lossless compression).
- Build a dictionary of strings encountered in the input data and replace recurring strings with shorter codes.
- The dictionary begins with single-character data for all possible characters (i.e., all ASCII characters).
- Scan the input data from left to right, building substrings, and checking if they are already present in the dictionary.
- During the process, encoded output is generated by replacing substrings with their corresponding codes from the dictionary.
Trade-off:
- Compression is dependent on the dictionary size. A larger dictionary can capture more patterns but may use more memory for storage.

C. Tar Archive

Algorithm Theory:
- Bundling multiple files into a single file.
- Archived file contains information about the bundled files in a structured format.
- Archived file contains the following information in order:
  1. A 4-byte integer ending in 10011 (decimal 19), representing the length of the filename.
  2. A separator character "11111111".
  3. The filename, a string.
  4. Another separator character "11111111".
  5. A 64-bit number (Java long) ending in 00001100 (decimal 12), the length of the file.
  6. Another separator character "11111111".
  7. Contents of the file.

II. Installation

Clone Repository: git clone <repository_url>
Navigate into project directory: cd Project_File_Compression_yxk19a
Ensure you are in the root project directory: pwd
- "pwd" should result in something like /Users/username/Desktop/Project_File_Compression
Ensure the following Java programs and classes are in the same directory as the main files:
- SchubsL.java, SchubsH.java, SchubsArc.java, Deschubs.java
  1. BinaryOut.java, BinaryIn.java
  2. BinaryStdIn.java, BinaryStdOut.java
  3. StdIn.java, StdOut.java
  4. Queue.java, MinPQ.java
  5. TST.java

III. Test Instructions

Ensure you are in the same level as the "pom.xml" file.
Run: mvn compile // mvn Test

IV. Run Examples

Ensure you are in the root project directory.
Ensure all Java programs are compiled: javac example.java
Run Programs:
1. Huffman Compression: java SchubsH <filename>
2. LZW Compression: java SchubsL <filename>
3. Archive using Tar: java SchubsArc archive-name <file1name> <file2name> ...
4. Decompress files: java Deschubs <filename.ll|hh|hz>

Name		Name	Last commit message	Last commit date
Latest commit History 25 Commits
src		src
.gitignore		.gitignore
README.md		README.md
pom.xml		pom.xml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

I. Design

A. Huffman Encoding

B. LZW Encoding

C. Tar Archive

II. Installation

III. Test Instructions

IV. Run Examples

About

Releases

Packages

Languages

yjk01/FileCompression

Folders and files

Latest commit

History

Repository files navigation

I. Design

A. Huffman Encoding

B. LZW Encoding

C. Tar Archive

II. Installation

III. Test Instructions

IV. Run Examples

About

Topics

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages