Skip to content

uwescience/ReproduciblePythonTutorial

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

24 Commits
 
 
 
 

Repository files navigation

Tips and Tools for Reproducible Python

Directory Structure


  • create an empty project with a base structure
.
+-- data
|   +-- raw
|   +-- processed
|
+-- src
|   +-- PythonModules
|   +-- tests
|   
+-- notebooks
|   +-- exploratory
|   +-- expositionary
|
+-- references
|   +-- papers
|   +-- tutorials
|
+-- results 
+-- README.md
+-- LICENSE.txt

Comprehensive Project Templates:

Cross-platform Directory Paths


  • Make paths independent of platform and all relative to directory structure
	import os
	
	# current path
	current_path = os.getcwd()
	
	# join paths for Windows and Unix
	code_path = os.path.join(current_path, "src")
	
	# make sure paths/files exist before reading
	os.path_exists() 
	os.path.isfile()

Testing


  • Locally

     	pip install nose
    

    For each function write a test function:

     +-- src
     |   +-- function1.py
     |   +-- function2.py
     |   +-- tests
     |       +-- test_function1.py
     |       +-- test_function2.py
    

    Use numpy.testing module.

    Example:

    ArraySum.py:

     	def ArraySumFunction(array1,array2):
     	   # function which sums two arrays
     		return(array1 + array2)	
    

    testArraySum.py:

     import numpy as np
     from numpy import testing as npt
     import ArraySum
    
     def test_ArraySumFunction():
     	# testing ArraySum function
     	array1 = 2*np.ones(100)
     	array2 = np.ones(100)
     	res = ArraySum.ArraySumFunction(array1,array2)
     	npt.assert_equal(res, 3*np.ones(100))
    

    Run the tests:

     	nosetests
  • Remotely:

    Types of tests:

    • unit testing
    • integration testing
    • regression testing
    • functional testing

    Test Coverage - Coveralls

  • Testing for Data Scientists - (PyData talk)

Distributions & Package Managers


Conda vs pip

What is Conda?

  • Anaconda is a Python distribution slightly different from the default Python distribution, and comes with its own package manager (conda).

  • Conda packages come in the form of .whl files (wheel files). They are precompiled packages: i.e. they are compiled for each specific operating system. They are fast to install. (Installing Numpy from scratch takes forever compiling C code) Miniconda is even faster to install as it is bare bones: better for deploying: have only what you need.

What is pip?

Package manager for Python. Install packages from PyPi. There are packages in pip which are not in conda.

pip install vs conda install

pip freeze

conda list

Virtual Environments


What is a virtual environment?

A folder with all python executables and libraries and a link to them. Virtual environments take space!

Pure Python: virtualenv

If using anaconda distribution create envs by:

	conda create --name newEnv python=2 extra_packages

View environments:

	conda env list

On Unix:

source activate newEnv
do stuff
conda install more_packages
source deactivate

On Windows:

activate newEnv
do stuff
conda install more_package
deactivate

Saving environments:

	conda env export -f exported_env.yml

Load an environment from .yml file:

	conda env create -f exported_env.yml

You can do the same thing with pip:

	pip freeze > requirements.txt
	pip install -r requirements.txt
  • Make sure to instal Jupyter within virtual environment

More Virtualization


Modules & Packages


  • move functions from notebooks to a module
  • paths for modules
  • reloading modules
    • python 2:

       	reload(module_name)
      
    • python 3:

       	from imp import reload
       	reload(module_name)
      
  • install module as a package
    • create a setup.py file

    • run the setup.py file

       	python setup.py install package_name
      

      and you will be able to import the package from anywhere!

  • submodules
    • put __init__.py in every folder
  • git submodules - add external github repos to your github project

Editors


Plugins exist for most editors: e.g. atom flake8 linter.

Documentation


Extra Resources


Assessing Reproducibility

R Reproducible Curriculum

Hitchhikers Guide for packaging

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages