A new version of phraug, which is a set of simple Python scripts for pre-processing large files
Python
Fetching latest commit…
Cannot retrieve the latest commit at this time.

README.md

phraug2

A new version of phraug (pron. frog) with improved command line arguments parsing, thanks to jofusa.

This is a set of simple Python scripts for pre-processing large files, things like splitting and format conversion. The names phraug comes from a great book, Made to Stick, by Chip and Dan Heath.

See http://fastml.com/processing-large-files-line-by-line/ for the basic idea.

There's always at least one input file and usually one or more output files. An input file always stays unchanged.

For documentation:

Example:

>python split.py
usage: split.py [-h] [-p PROBABILITY] [-r RANDOM_SEED] [-s] [-c]
                input_file output_file1 output_file2
split.py: error: too few arguments

>python split.py -h
usage: split.py [-h] [-p PROBABILITY] [-r RANDOM_SEED] [-s] [-c]
                input_file output_file1 output_file2

split a file into two randomly, line by line.

positional arguments:
  input_file            path to an input file
  output_file1          path to the first output file
  output_file2          path to the second output file

optional arguments:
  -h, --help            show this help message and exit
  -p PROBABILITY, --probability PROBABILITY
                        probability of writing to the first file (default 0.9)
  -r RANDOM_SEED, --random_seed RANDOM_SEED
                        random seed
  -s, --skip_headers    skip the header line
  -c, --copy_headers    copy the header line to both output files