This repository contains
- installation instructions for a minimal python environment
- Jupyter notebooks to introduce the basic concepts of python
- Quizzes to test the understanding of said concepts
- access to a computing environment with installation rights
- programming experience, i.e. familiarity with basic concepts like control flow and data structures
Leitfaden for a possible course
+--Motivation Installation Instruction Programming Environments +--Data Types Control Flow Modules +--package NumPy +--package SciPy +--package scikit-learn +--package matplotlib (Python plotting, object-oriented) +--package pandas (Python Data Analysis Library)
Choosing the proper installation candidate
There are currently two major versions of Python. The older Python2 and the newer Python3. We use the latter, where the latest stable release is 3.6.5 (as of 28 Mar 2018).
download the latest (64-bit) Anaconda3-installer from http://continuum.io/download and launch it with
$ bash Anaconda3-1.9.1-Linux-x86_64.sh
You need to agree to the license agreement and may (optionally)
specify a target directory (default is
~/anaconda3, my choice is
the installer then suggests to prepend the path in .bashrc. (You may already have this, e.g. via .profile)
Next, we'll add three channels to the default one (in this order) :
$ conda config --add channels conda-forge $ conda config --add channels defaults $ conda config --add channels r $ conda config --add channels bioconda
biocondais for bioinformatics (what's your requirement?) and will receive the highest priority.
ris required for bioinformatics and contains moduls for the GNU R programming language. The
defaultschannel already contains plenty of packages (?TODO list?). Finally,
conda-forgecontains several community-build packages that are not already in the
The installer is for the full package coming with the
meta-package. Let's update it with :
$ conda update anaconda
Here's the full package list.
Anaconda is not only the name of the python-distribution, but also the name of its largest meta-package. To maximize compatibility (and minimize maintenance effort), we have the following priorities
- packages from the standard library (see below)
- packages from the anaconda meta-package (?packages marked as "In Installer" [here](https://docs.anaconda.com/anaconda/packages/py3.6_linux-64/ ?)
- packages from conda's
keras(but not in the default installer's full list of packages?)
- ONLY IF NECESSARY packages from selected additional conda channels
Minimal Package List
(for reference, when you have to reproduce your environment outside
of anaconda, e.g. in
favorites from the standard library
non-standard packages in conda's default installation
- data analysis
non-standard packages in the default channel
$ conda update conda $ conda update anaconda
this confuses me (and others), so the current recommendation in anaconda's blog1 for "What 95% of People Want" is :
$ conda update --all $ conda
$ conda remove anaconda
And if things break :
$ conda clean --all
- (default) python interactive shell
- MYCHOICE ipython shell
web-application for interactive python worksheets
Integrated Development Environment
- MYCOICE spyder, a quick introduction by Joey Bernard
- Eclipse with PyDev plugin
- Emacs with ...
Note: Spyder is already available in the standard installation, but if we want/need more advanced profiling, there's
$ conda config --add channels spyder-ide $ conda install -c spyder-ide spyder-line-profiler $ conda install -c spyder-ide spyder-memory-profiler
are called modules in python
Favorite Non-Standard Packages
fast numerical computing, in particular with large arrays and matrices; is part of SciPy, but can also be loaded individually
- Nicolas P. Rougier, From Python to Numpy
(large) scientific computing library, based on NumPy arrays (and including NumPy)
machine learning, built to work well with NumPy and SciPy
Pandas is a Python package providing fast, flexible, and expressive data structures designed to make working with “relational” or “labeled” data both easy and intuitive
Matplotlib is a Python 2D plotting library which produces publication quality figures in a variety of hardcopy formats and interactive environments across platforms. Here's a short introduction.
Keras is a high-level neural networks API, written in Python and capable of running on top of either TensorFlow or Theano. It was developed with a focus on enabling fast experimentation. Being able to go from idea to result with the least possible delay is key to doing good research.
Further Non-Standard Libraries
to simulate quantum systems. Very short introduction by Joey Bernard.
Standard Library (Batteries Included)
see the complete list at https://docs.python.org/3/library/
This module defines an object type which can compactly represent an array of basic values: characters, integers, floating point numbers. Arrays are sequence types and behave very much like lists, except that the type of objects stored in them is constrained.
This module implements specialized container datatypes providing alternatives to Python’s general purpose built-in containers, dict, list, set, and tuple.
import/export of csv-files
This module implements a number of iterator building blocks inspired by constructs from APL, Haskell, and SML. Each has been recast in a form suitable for Python.
It provides access to the mathematical functions defined by the C standard.
multiprocessing is a package that supports spawning processes using an API similar to the threading module.
This module provides a portable way of using operating system dependent functionality.
This module provides access to some variables used or maintained by the interpreter and to functions that interact strongly with the interpreter. It is always available.
This module provides various time-related functions.
This module provides a standard interface to extract, format and print stack traces of Python programs. It exactly mimics the behavior of the Python interpreter when it prints a stack trace.
- Scipy Lecture Notes for Python, NumPy, Matplotlib, Scipy (with exercises)
- Code Challenge
- Python Tutorial at python.org (quite complete and extensive; probably to read selectively)
- basic Python Class by Google
- DataCamp has two Python Courses (for Data Science): Intro and Intermediate
- After Hours Programming has an interactive Python Tutorial
- LearnPython.org has an interactive Python Tutorial
- [https://unsupervisedmethods.com/over-150-of-the-best-machine-learning-nlp-and-python-tutorials-ive-found-ffce2939bd78](Over 150 of the Best Machine Learning, NLP, and Python Tutorials I’ve Found)
- Swaroop, A Byte of Python, CC-BY-SA. (Free pdf and epub download, audience: programming beginners)
- Allen B. Downey, Think Python, 2nd edition, 2015. (Cave: 1st edition uses python2. Free pdf and html download, sample code available on webpage and github, audience: python beginners with programming experience)
- Codecademy's Python track
- Udacity's Intro to Computer Science
- MIT's 6.001 Introduction to Computer Science and Programming in Python by Ana Bell, Eric Grimson, and John Guttag; "for students with little or no programming experience"; available on edX
- MIT's 6.002 Introduction to Computational Thinking and Data Science by Ana Bell, Eric Grimson, and John Guttag; continuation of MIT 6.001; archived on edX
- 15 single-choice questions by TripleByte
- Mega Project List by Karan Goel
- 100 days of algorithms by Tomáš Bouda with github repository
- check the setup part for anaconda & friends (let's say as number 0)
- split every notebook into mandatory/optional part
- for numpy: improve didactical structures. Parts are redundant (e.g. operations), parts dont follow perfect logic order (broadcasting)