Skip to content

yonatanlou/Decrypting-Classical-Cipher-Text-Using-MCMC

Repository files navigation

Decrypting Ciphertext Using Markov Chain Monte Carlo

Installation

cd ../repo_path #after cloning this repo
pyenv virtualenv mcmc_decrypton
pyenv activate mcmc_decrypton
pip install -r requirements.txt

Data

  • To create the unigram and bigram transition matrix, we used a chunk of values from the Hebrew wikipedia. You can "train" the MCMC process with any data.
    There is not much Hebrew text on the Internet, but there is some great resources in NNLP-IL.
    We used the MILA resource .(מרכז ידע לתקשוב בשפה העברית)

  • The program will use the desired text file and will also save a pickle file, so you can use your transition matrix later.

  • Here is a quick guide to use any of the corpuses of MILA:

    • Navigate to your project directory:
       cd .../project_directory 
      
    • Download to the file which contains the text data of the relevant corpus (some of the files are .tar and not .zip):
       wget -c https://yeda.cs.technion.ac.il:8443/corpus/software/corpora/HeWiki_2013/plain/plain.zip | unzip -d ../project_directory
      
    • Concatenate all the corpus to one big text file:
       find ../project_directory/zip_folder_name -type f -name '*.txt' -exec cat {} + > corpus_merged.txt
      
    • Most of the corpuses are very big, so if you wish to use a chunk of the code:
       head -1000000  corpus_merged.txt >../project_directory/text_files/corpus_merged_1000000.txt`  
      

This script uses a Markov chain Monte Carlo (MCMC) algorithm to decrypt a message.

Usage

python decrypt_message.py --seed=<seed> --language=<language> --text_file=<text_file> --message=<message> --plot=<plot> --iterations=<iterations>

The following are the arguments that you can pass to the script:

  • --seed: The random seed.
  • --language: The language of the message (either "hebrew" or "english").
  • --text_file: The path to the text file that contains the plaintext.
  • --message: The encrypted message.
  • --plot: Whether to plot the results.
  • --iterations: The number of iterations for the MCMC algorithm.

For example, to decrypt a message in Hebrew using the text file war-and-peace.txt with a seed of 42 and 10,000 iterations, you would use the following command:

python main.py --seed=42 --language=english --text_file=war-and-peace.txt --message='They told me that when i will be old the war will over' --plot=0 --iterations=25000

Output

The script will output the decrypted message to the console. If you have enabled plotting, the script will also create a plot of the MCMC results.

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages