This is a module that is helpful both in a development notebooks and deployed production pipelines that work with unstructured s3 files.
The main use of this module is to programmatically, preview, process, and edit files around s3 by:
listing contents of s3 buckets using glob-like RegEx patterns.
moving or copying files between buckets (filedrop -> archives).
streaming csv and json files into Pandas dataframes on your local machine, without manually downloading them to disk.
writing Pandas dataframes to csv and json files on s3.
loading and unloading scikit-learn models from s3.
Pandas and Scikit-Learn and useful tools in the Python Data ecosystem.
Check out the tutorial and see the module in action.
Configure s3 as you would for boto3.
TLDR; Environment Variables or configuring AWS CLI work best.
pip install s34me
When either of these are resolved, this will work with the latest distribution of Pandas.
import s3 df = s3.read_csv('s3://bucket_name/key_name/file_name.tsv.gz', sep='\t', compression='gzip')
For continued use, the
$PATH should be added to the iPython startup script
cd ~/.ipython/profile_default/startup vim first.py sys.path.append("PATH")
- Fork it!
- Create your feature branch:
git checkout -b my-new-feature
- Commit your changes:
git commit -am 'Add some feature'
- Push to the branch:
git push origin my-new-feature
- Submit a pull request :D
Written by Leon Yin