s3 helpers for reading files to/from pandas dataframes, moving files between buckets, and persisting scikit-learn classifiers.. all in s3.
Switch branches/tags
Nothing to show
Clone or download
Fetching latest commit…
Cannot retrieve the latest commit at this time.
Permalink
Failed to load latest commit information.
s3
.gitignore
LICENSE
README.md
environment.yml
requirements.txt
setup.py
tutorial.ipynb

README.md

S3 helper

This is a module that is helpful both in a development notebooks and deployed production pipelines that work with unstructured s3 files.

The main use of this module is to programmatically, preview, process, and edit files around s3 by:

listing contents of s3 buckets using glob-like RegEx patterns.
moving or copying files between buckets (filedrop -> archives).
streaming csv and json files into Pandas dataframes on your local machine, without manually downloading them to disk.
writing Pandas dataframes to csv and json files on s3.
loading and unloading scikit-learn models from s3.

Pandas and Scikit-Learn and useful tools in the Python Data ecosystem.
Check out the tutorial and see the module in action.

Installation

Configure s3 as you would for boto3. read here
TLDR; Environment Variables or configuring AWS CLI work best.

Usage

Install requirements pip install s34me

Note that this only works with Pandas 0.19.1 and below.
See: https://github.com/boto/botocore/pull/1195
See: https://github.com/pandas-dev/pandas/issues/17135

When either of these are resolved, this will work with the latest distribution of Pandas.

import s3

df = s3.read_csv('s3://bucket_name/key_name/file_name.tsv.gz', 
                 sep='\t', compression='gzip')

For continued use, the $PATH should be added to the iPython startup script

cd ~/.ipython/profile_default/startup
vim first.py
sys.path.append("PATH")

Contributing

  1. Fork it!
  2. Create your feature branch: git checkout -b my-new-feature
  3. Commit your changes: git commit -am 'Add some feature'
  4. Push to the branch: git push origin my-new-feature
  5. Submit a pull request :D

Credits

Written by Leon Yin

License

MIT