- create an empty project with a base structure
.
+-- data
| +-- raw
| +-- processed
|
+-- src
| +-- PythonModules
| +-- tests
|
+-- notebooks
| +-- exploratory
| +-- expositionary
|
+-- references
| +-- papers
| +-- tutorials
|
+-- results
+-- README.md
+-- LICENSE.txt
Comprehensive Project Templates:
- Data Science Cookiecutter
- Shablona - Python Package Template
- Make paths independent of platform and all relative to directory structure
import os
# current path
current_path = os.getcwd()
# join paths for Windows and Unix
code_path = os.path.join(current_path, "src")
# make sure paths/files exist before reading
os.path_exists()
os.path.isfile()
-
Locally
pip install nose
For each function write a test function:
+-- src | +-- function1.py | +-- function2.py | +-- tests | +-- test_function1.py | +-- test_function2.py
Use
numpy.testing
module.Example:
ArraySum.py
:def ArraySumFunction(array1,array2): # function which sums two arrays return(array1 + array2)
testArraySum.py
:import numpy as np from numpy import testing as npt import ArraySum def test_ArraySumFunction(): # testing ArraySum function array1 = 2*np.ones(100) array2 = np.ones(100) res = ArraySum.ArraySumFunction(array1,array2) npt.assert_equal(res, 3*np.ones(100))
Run the tests:
nosetests
-
Remotely:
- Travis-CI (free for public repos)
- specification by a travis.yml
- AppVeyor (for Windows)
- CircleCI
- Wercker (based on Docker containters)
- Jenkins - need to configure it
Types of tests:
- unit testing
- integration testing
- regression testing
- functional testing
Test Coverage - Coveralls
- Travis-CI (free for public repos)
-
Testing for Data Scientists - (PyData talk)
Conda vs pip
What is Conda?
-
Anaconda is a Python distribution slightly different from the default Python distribution, and comes with its own package manager (conda).
-
Conda packages come in the form of .whl files (wheel files). They are precompiled packages: i.e. they are compiled for each specific operating system. They are fast to install. (Installing Numpy from scratch takes forever compiling C code) Miniconda is even faster to install as it is bare bones: better for deploying: have only what you need.
What is pip?
Package manager for Python. Install packages from PyPi. There are packages in pip which are not in conda.
pip install
vs conda install
pip freeze
conda list
What is a virtual environment?
A folder with all python executables and libraries and a link to them. Virtual environments take space!
Pure Python: virtualenv
If using anaconda distribution create envs by:
conda create --name newEnv python=2 extra_packages
View environments:
conda env list
On Unix:
source activate newEnv
do stuff
conda install more_packages
source deactivate
On Windows:
activate newEnv
do stuff
conda install more_package
deactivate
Saving environments:
conda env export -f exported_env.yml
Load an environment from .yml file:
conda env create -f exported_env.yml
You can do the same thing with pip:
pip freeze > requirements.txt
pip install -r requirements.txt
- Make sure to instal Jupyter within virtual environment
- move functions from notebooks to a module
- paths for modules
- reloading modules
-
python 2:
reload(module_name)
-
python 3:
from imp import reload reload(module_name)
-
- install module as a package
-
create a setup.py file
-
run the setup.py file
python setup.py install package_name
and you will be able to import the package from anywhere!
-
- submodules
- put
__init__.py
in every folder
- put
- git submodules - add external github repos to your github project
-
PyCharm - integration with GitHub
-
Atom - coloring in Github (extra packages)
-
JupyterLab (web based -> can run on server)
-
Spyder Matlab-like IDE
Linters
Plugins exist for most editors: e.g. atom flake8 linter.
- Nbconvert - to pdf, to html
- Reveal.js: Jupyter notebook -> slides (Instructions)
- css styles for notebook
- Sphinx, readthedocs, ... (automatically generate documentation, integrate with CI)
- gh-pages - project website based on Jekyll
- Binder (of notebooks) (free sharing of github jupyter notebooks)
- Jupyter Hub + Kubernetes - sharing reliably with many people
- SageMathCloud - CoCalc