Skip to content

Repository of two compression methods (Huffman, LZW) added with the ability to decompress and archive files.

Notifications You must be signed in to change notification settings

yjk01/FileCompression

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

25 Commits
 
 
 
 
 
 
 
 

Repository files navigation

I. Design

A. Huffman Encoding

  1. Algorithm Theory:

    • Algorithm used to compress data without losing information (i.e., lossless compression).
    • Begin by counting the frequency of characters in the input data.
    • Create a binary tree where each leaf node represents a character, and the path from the root to the leaf corresponds to the binary code of the character.
    • Once the tree is completed, traverse the tree and assign a binary code to each character. Traversing left is '0', and right is '1'.
    • Encode input data by replacing each character with its corresponding Huffman code.
    • The Huffman tree is stored first, followed by the encoded information.
  2. Trade-offs:

    • Compression process can be computationally intensive for large datasets.
    • Requires knowledge of the entire input data before constructing the tree.

B. LZW Encoding

  1. Algorithm Theory:

    • Algorithm used to compress data without losing information (i.e., lossless compression).
    • Build a dictionary of strings encountered in the input data and replace recurring strings with shorter codes.
    • The dictionary begins with single-character data for all possible characters (i.e., all ASCII characters).
    • Scan the input data from left to right, building substrings, and checking if they are already present in the dictionary.
    • During the process, encoded output is generated by replacing substrings with their corresponding codes from the dictionary.
  2. Trade-off:

    • Compression is dependent on the dictionary size. A larger dictionary can capture more patterns but may use more memory for storage.

C. Tar Archive

  1. Algorithm Theory:
    • Bundling multiple files into a single file.
    • Archived file contains information about the bundled files in a structured format.
    • Archived file contains the following information in order:
      1. A 4-byte integer ending in 10011 (decimal 19), representing the length of the filename.
      2. A separator character "11111111".
      3. The filename, a string.
      4. Another separator character "11111111".
      5. A 64-bit number (Java long) ending in 00001100 (decimal 12), the length of the file.
      6. Another separator character "11111111".
      7. Contents of the file.

II. Installation

  1. Clone Repository: git clone <repository_url>
  2. Navigate into project directory: cd Project_File_Compression_yxk19a
  3. Ensure you are in the root project directory: pwd
    • "pwd" should result in something like /Users/username/Desktop/Project_File_Compression
  4. Ensure the following Java programs and classes are in the same directory as the main files:
    • SchubsL.java, SchubsH.java, SchubsArc.java, Deschubs.java
      1. BinaryOut.java, BinaryIn.java
      2. BinaryStdIn.java, BinaryStdOut.java
      3. StdIn.java, StdOut.java
      4. Queue.java, MinPQ.java
      5. TST.java

III. Test Instructions

  1. Ensure you are in the same level as the "pom.xml" file.
  2. Run: mvn compile // mvn Test

IV. Run Examples

  1. Ensure you are in the root project directory.
  2. Ensure all Java programs are compiled: javac example.java
  3. Run Programs:
    1. Huffman Compression: java SchubsH <filename>
    2. LZW Compression: java SchubsL <filename>
    3. Archive using Tar: java SchubsArc archive-name <file1name> <file2name> ...
    4. Decompress files: java Deschubs <filename.ll|hh|hz>

About

Repository of two compression methods (Huffman, LZW) added with the ability to decompress and archive files.

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Languages