Skip to content
master
Switch branches/tags
Go to file
Code

Latest commit

 

Git stats

Files

Permalink
Failed to load latest commit information.
Type
Name
Latest commit message
Commit time
s3
 
 
 
 
 
 
 
 
 
 

README.md

S3 helper

This is a module that is helpful both in a development notebooks and deployed production pipelines that work with unstructured s3 files.

The main use of this module is to programmatically, preview, process, and edit files around s3 by:

listing contents of s3 buckets using glob-like RegEx patterns.
moving or copying files between buckets (filedrop -> archives).
streaming csv and json files into Pandas dataframes on your local machine, without manually downloading them to disk.
writing Pandas dataframes to csv and json files on s3.
loading and unloading scikit-learn models from s3.

Pandas and Scikit-Learn and useful tools in the Python Data ecosystem.
Check out the tutorial and see the module in action.

Installation

Configure s3 as you would for boto3. read here
TLDR; Environment Variables or configuring AWS CLI work best.

Usage

Install requirements pip install s34me

Note that this only works with Pandas 0.19.1 and below.
See: https://github.com/boto/botocore/pull/1195
See: https://github.com/pandas-dev/pandas/issues/17135

When either of these are resolved, this will work with the latest distribution of Pandas.

import s3

df = s3.read_csv('s3://bucket_name/key_name/file_name.tsv.gz', 
                 sep='\t', compression='gzip')

For continued use, the $PATH should be added to the iPython startup script

cd ~/.ipython/profile_default/startup
vim first.py
sys.path.append("PATH")

Contributing

  1. Fork it!
  2. Create your feature branch: git checkout -b my-new-feature
  3. Commit your changes: git commit -am 'Add some feature'
  4. Push to the branch: git push origin my-new-feature
  5. Submit a pull request :D

Credits

Written by Leon Yin

License

MIT

About

s3 helpers for reading files to/from pandas dataframes, moving files between buckets, and persisting scikit-learn classifiers.. all in s3.

Topics

Resources

License

Releases

No releases published

Packages

No packages published