Skip to content

wanghalan/dspg22_pyarrow-example

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

12 Commits
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

dspg22_pyarrow-example

Demonstrate > 50 MB limits storage on GitHub using PyArrow

Usage

usage: divider.py [-h] -i INPUT [-s SIZE] -o OUTPUT [-v | --verbose | --no-verbose]

Take a file and divide it into partitions of specific sizes

options:
  -h, --help            show this help message and exit
  -i INPUT, --input INPUT
                        The large file to be partitioned
  -s SIZE, --size SIZE  Maximum size of the partitioned file in MB
  -o OUTPUT, --output OUTPUT
  -v, --verbose, --no-verbose

Example

To generate the files in this repository, I did:

python divider.py -i output_2019_q1.parquet -o ookla-dataset

References

Acknowledgement

This project was built as part of the 2022 Data Science for the Public Good (DSPG) internship program