A self-contained Python workbench for scientific programming, data mining, maths, stats and visualization
Switch branches/tags
Nothing to show
Clone or download
Pull request Compare This branch is 71 commits behind snake-charmer-devs:master.
Fetching latest commit…
Cannot retrieve the latest commit at this time.
Permalink
Failed to load latest commit information.
notebooks
salt
.gitignore
CUSTOMIZING.md
FAQ.md
README.md
TODO
Vagrantfile

README.md

Snake Charmer

A portable Python workbench for data science, built with Vagrant, VirtualBox and Salt.

Introduction

Wouldn't it be great if you could magic up a local IPython Notebook server, complete with SciPy, Pandas, Matplotlib, PyMC, scikit-learn, R and Octave integration, and all the usual goodness, and running the latest version of Python, just by typing one line?

vagrant up charmed34

And wouldn't it be great if you could do that from pretty much any Windows, Mac or Linux machine, and know that you'd get the exact same environment every time?

Well, read on.

What is included

Snake Charmer provides an out-of-the-box workbench for data analysis, statistical modelling, machine learning, mathematical programming and visualization.

It is designed to be used primarily via IPython Notebook.

The environment is based on Ubuntu 12.04 and Python 3.4, with the following data science tools included. You are of course free to install any other Python or Ubuntu packages -- or anything else that fits your need.

Packages marked 'alpha' or 'dev' should be considered experimental, although in many cases they are largely problem-free. We will endeavour to discover and document any known issues here.

† Non-Python tools

‡ Non-Python tools usable via Python wrapper packages

You are, of course, free to remove or upgrade these packages via pip or apt-get as usual, or experiment with additional ones. Please feel free to send pull requests when you get another package working.

Coming soon: Other Python versions. Ubuntu 14.04 LTS.

Potential future additions include: Parakeet, pattern, CrossCat, BayesDB, ggplot, Bokeh, Blaze, numdifftools, PuLP, CVXPY, SysCorr, bayesian, PEBL, libpgm, BayesPy, BayesOpt, mpld3, Pylearn2, nimfa, py-earth, Orange, NeuroLab, PyBrain, annoy, Zipline, Quandl, BNFinder, Alchemy API, xlrd/xlwt, NetworkX, OpenCV, boto, gbq, SQLite, PyMongo, mpi4py, Jubatus, and one or more Hadoop clients.

If you have suggestions for any other packages to add, please submit them by raising an issue.

Requirements

Snake Charmer runs IPython and all the associated tools in a sandboxed virtual machine. It relies on Vagrant for creating and managing these, and VirtualBox for running them -- so please go and install those now.

Experienced users of other virtualization platforms can edit the Vagrantfile to use one of these instead, if they prefer.

Everything else is installed automatically.

Installation

Check out this git repository:

git clone git@github.com:andrewclegg/snake-charmer.git
cd snake-charmer

Start the VM:

vagrant up charmed34

If you're already a Vagrant user, be aware that Snake Charmer's Vagrantfile will attempt to install the vagrant-vbguest plugin automatically.

This command currently takes around an hour to download and install all the necessary software. When this completes, it will run some tests and then display a message like this:

Your VM is up and running: http://localhost:8834/tree

Testing your installation

The link above will take you to a fully-kitted-out IPython Notebook server. Open the "Hello World" notebook to see a full list of installed packages and other system information. N.B. The notebook server is started with inline graphics enabled for matplotlib, but not the --pylab option, as this is considered harmful.

There is also a "Snake Charmer QA" notebook supplied. This allows you to run the test suites of the major components, but don't run this now! It's a slow process and only needs to be performed if you've customized your VM. See the customization guide for more information.

Vagrant essentials

On a VM that's already been fully configured, vagrant up will just restart it, without going through the full install process.

You can log into the server via

vagrant ssh charmed34

from the same directory, for full command-line control. It's an Ubuntu 12.04 box, under the covers. But you can do most things through the IPython Notebook anyway, so this is rarely essential.

Some more useful commands:

vagrant reload charmed34  # reboot the VM (same as "vagrant up" if it's not running)
vagrant halt charmed34    # shut down the VM, reclaim the memory it used
vagrant destroy charmed34 # wipe it completely, reclaiming disk space too
vagrant suspend charmed34 # 'hibernate' the machine, saving current state
vagrant resume charmed34  # 'unhibernate' the machine

See the Vagrant docs for more details.

Folder structure

The notebook server runs from within the notebooks subdirectory of the current snake-charmer directory, and initially contains a single "Hello World" notebook.

Snake Charmer uses IPython 2 so any subdirectories of notebooks will be visible and navigable as folders in the IPython web interface. However, you can't actually create directories from the web interface yet, so you'd need to log in via ssh, or just enter a shell command into IPython with !.

Vagrant sets up a number of synced folders, which are directories visible to both the VM and the host (your computer). Files placed in these will be visible to both the VM and the host, so this is a good way to make data available to the VMs. If you create more than one VM (feature coming soon!), files in synced folders will be visible to all of them -- apart from /srv/log which is specific to one VM only.

The paths in the left-hand column are relative to the snake-charmer install directory -- your local copy of this repo.

Folder on your computer   Folder within VM         Contents
------------------------  -----------------------  --------
notebooks                 /home/vagrant/notebooks  Any notebooks
data                      /home/vagrant/data       Data you wish to share (initially empty)
.cache                    /srv/cache               Cache for downloaded files
log/charmed34             /srv/log                 Certain setup logs, useful for debugging only
salt/roots/salt           /srv/salt                Config management information (ignore this)
salt/roots/pillar         /srv/pillar              Config management information (ignore this)

These are all configurable via environment variables -- see the customization guide.

Data persistence

If you get your VM into a mess somehow, you can just type

vagrant destroy charmed34
vagrant up charmed34

to build a new one. Files in synced folders will not be affected if you do this, so you won't lose any data or notebooks. However, any data stored on the VM but outside these synced folders will be lost.

The virtual disk on each VM is configured with an 80GB limit -- it grows to take up real disk space on the host up to this limit, and then stops. But data stored in synced folders does not count towards this. So you will likely never reach the 80GB limit.

If you want to make another folder available to the VM, for example if your datasets are stored on another disk, see the customization guide.

Troubleshooting

If a VM starts behaving strangely, the golden rule is: Don't waste time fixing it.

This may sound strange, but the advantage of Snake Charmer is that you can create a factory-fresh VM with almost no effort at all.

The first thing to try is to reboot the VM:

vagrant reload 

Option two is reprovisioning the machine. This runs through the install process and ensures all required packages are installed. First, delete the package cache in case anything in there is messed up:

# On OS X or Linux:
rm -rf .cache

# Or on Windows:
rd /s /q .cache

# N.B. Make sure you're in the snake-charmer directory first!

Then reboot and reprovision:

vagrant reload --provision charmed34

If this doesn't fix the problem, then delete it completely, and recreate it:

vagrant destroy charmed34
vagrant up charmed34

If this still doesn't fix it, you may have found a bug. Please open a Github issue describing it in as much detail as possible, preferably with instructions on how to reproduce it.

The VirtualBox admin GUI can of course be used to check on the status of VMs, inspect their hardware and network configuration, manually start or stop them, attach via the console, etc.

Important reminder

Only use the host filesystem to store data, notebooks etc. -- that is, the data and notebooks folders which are synced to the VM. If you store files in other places on a VM, they will be lost forever when you destroy it.

Sharing your VMs

Snake Charmer VMs are Vagrant VMs, and Vagrant VMs can be published, shared and remotely accessed via various mechanisms. This is discussed in the Snake Charmer F.A.Q..

Customizing your VMs

Even if you don't know much about VirtualBox, Vagrant or Salt, you can customize your VMs in several ways -- and if you want to tinker with the configuration for these programs directly, the sky's the limit. See the separate customization guide.

F.A.Q.

See the separate Snake Charmer F.A.Q..

Credits

Developed by Andrew Clegg (Twitter: @andrew_clegg), tested at Pearson.

Thanks to the authors and contributors of all the world-class open source components included, whose hard work has made this possible.

License

Snake Charmer does not include bundled distributions of its components (Python, Ubuntu, Python libraries, other libraries and packages etc.). Rather, it provides a set of machine-readable instructions for obtaining these components from third-party open-source repositories. Please refer to each individual component's documentation for license details.

Snake Charmer itself is distributed under the Apache License:

Copyright 2014 Andrew Clegg

Licensed under the Apache License, Version 2.0 (the "License");
you may not use this file except in compliance with the License.
You may obtain a copy of the License at

   http://www.apache.org/licenses/LICENSE-2.0

Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License.