# DNDS6013 Scientific Python
## Central European University, Winter 2019/2020

Instructor: Márton Pósfai, TA: Luis Natera Orozco

Emails: posfaim@ceu.edu, natera_luis@phd.ceu.edu

## Goal of the course

* To familiarize you with some of the most common scientific python tools
* You will be able to conduct your own data intensive research

## Assessment

### 1 Attendance
* 30% of the final grade
* You can miss up to 3 sessions
* If you have to miss a class, home work deadlines still apply

### 2 Assignments
* 30% of the final grade
* 5 home works, you will have one week to submit them

### 3 Final project
* 40% of the final grade
* Your project should perform a self-contained analysis of some empirical dataset, making use of the Python tools we have learned in this course.
* You are free to choose the origin and nature of the data you will use
* The final deadline April 15th

You are allowed to drop the course until the second session.

1. First Task: Open (or download) anaconda (https://www.anaconda.com/download/). Make sure that it is with python 3.7 version
2. Download this notebook from moodle
2. Open it

# Short story of python

* Small home hobby project of Guido van Rossum. First interpreter written over Christmas holiday
* Aim: script language with minimal core, highly extendible with modules, and "batteries included"
* Each module has its own small developer and support group
* Python is object-oriented, has dynamic typing, and memory management
* Focus is on readibility and not optimization
* Two main version: 2 and 3. Many people use both. On many systems the default is 2. Newest versions are 2.7 and 3.6. They are in mostly compatible (2.7 evolved to be more compatible with 3) but there are some differences. The support to Python 2 is guaranteed only untill <a href="http://python3statement.org/">January 1, 2020</a>. We will use Python 3.
* Check which version are you running:

In [None]:
from platform import python_version
print(python_version())

Our tools:
* **Anaconda**: cross-platform package and environment manager
* **python**: interpreter that takes code as input and runs it
* **Ipython**: interactive python, python + some user friendly features
* **Jupyter notebook**: combines code and formatted text, runs Ipython

### Markdown
1. This is a <i>markdown</i> cell, you can change it in the menu
2. It is a rich text and it understands both html and mediawiki marks
3. Find out more about markdown [ here](https://en.wikipedia.org/wiki/Markdown) and [ here](https://jupyter-notebook.readthedocs.io/en/stable/examples/Notebook/Working%20With%20Markdown%20Cells.html).
4. You can also typeset equations like in wikipedia or latex. e.g. $\sqrt{x^2}=|x|$, or
$$\sum_{i=1}^\infty \frac{1}{2^i}=1$$
5. Double click to edit

### Help!
The most helpful IPython functions:

|Command|Description
|:---|:---
|?|Introduction and overview of IPython's features.
|%quickref|Quick reference.
|help|Python's own help system.
|object?|Details about the object

In [None]:
?

In [None]:
%quickref

In [None]:
help(print)

In [None]:
c=2
c?

### Most useful way to get help:
* [google.com](https://lmgtfy.com/?q=why+doesn%27t+my+python+code+work)
* [stackoverflow.com](https://stackoverflow.com/questions/27156381/python-creating-a-2d-histogram-from-a-numpy-matrix)

### types
|Type|Name|Example
|---|---|---
|int|Integer|30,-4
|float|Floating point|1.5,1e10
|str|String|"alma","c"
|bool|Boolean|True,False
|list|List|\[1,"alma",2\]
|dict|Dictionary|\{'course': "SciPy", 'teacher': "JT"\}

In [None]:
a = 7
b = 3.14
c = True
d = "alma"
e = [1, 2]
f = {'course': "SciPy", 'teacher': "JT", }
type(a),type(b),type(c),type(d),type(e),type(f)

### Integer numbers
division(division, integer division, remainder, rounding):

In [None]:
print(14 / 4)
print(14 // 4)
print(14 % 4)
print(round(14 / 4))
print(int(14 / 4))

Please, note that division of integers in python 2 returns an integer as in most languages.
In python 2: <br>
2 / 3 = 0 <br>
float(2 / 3) = 0<br>
2 / 3. = 0.66666<br>
1.0 * 2 / 3 = 0.666666<br>

Importance of the order of the operators. Precedence:
1. ()
2. **
3. *, /, //, %
4. +, -

In [None]:
print(1 + 2 * 3 + 2 / 2)
print(2 / 2 * 3)
print(2 / (2 * 3))

print()
print(2 / 2**3)
print( (2 / 2)**3 )

### Floating point number
$$1.2345 = \underbrace{12345}_\text{significand} \times \underbrace{10}_\text{base}\!\!\!\!\!\!^{\overbrace{-4}^\text{exponent}}$$

In [None]:
import sys # this module allows us to extract information about the system you are using 
sys.float_info

In [None]:
print(1)
print(float(1))
print(1.5)
print(1e4)
print(1e-4)
print(15e2)
print(1.5e3)

In [None]:
print(1 + 1e-10 - 1)
print(1 + 1e-20 - 1)  # too small to add

Please note that type of a variable may change in runtime

In [None]:
a = 2
print(a,type(a))
a /= 2
print(a,type(a))

### Functions

Most of the functionality in Python is provided by *modules*, such as access to the operating system, file I/O, string management, network communication, and much more. We will use some of these modules along the course.

<b>To use a module</b> in a Python program it first has <b>to be imported</b>. A module can be imported using the `import` statement. For example, to import the module `math`, which contains many standard mathematical functions, we can do:

In [None]:
import math

print(math.sqrt(4))
print(math.log10(1000))
print(math.log(math.e**5))
print(math.pi)
print(math.cos(math.pi * 0.5))
print(math.pow(3,10))

In [None]:
from math import sqrt
#from math import *

print(sqrt(9))

### Boolean
It can only be True or False

In [None]:
a = True
b = (0 == 0)
c = (2**3 == 8.0)
d = bool(0)
e = bool(3.14)
print(a,b,c,d,e)

Comparison operators:

In [None]:
print(1 == 1) # equal
print(1 < 1) # less
print(1 <= 1) # less or equal
print(1 < 1 or 1 == 1) # less OR equal
print(1 != 1 and 1 == 1) # not equal AND equal|
print((5 == 5)*2 + 3, (5 == 4)*2 + 3)

### Strings
Escape sequences, operations, conversions

In [None]:
s = "apple"
print(s)
s= 'apple'
print(s)

#### Escape sequences

In [None]:
print("\talma\n\"körte\"\\'")

In [None]:
print("'",'"',"\"",'\'')

<table border align="center" style="border-collapse: collapse">
  <thead>
    <tr class="tableheader">
      <th align="left"><b>Escape Sequence</b>&nbsp;</th>
      <th align="left"><b>Meaning</b>&nbsp;</th>
    </thead>
  <tbody valign='baseline'>
    <tr><td align="left" valign="baseline"><code>&#92;<var>newline</var></code></td>
        <td align="left">Ignored</td>
    <tr><td align="left" valign="baseline"><code>&#92;&#92;</code></td>
        <td align="left">Backslash (<code>&#92;</code>)</td>
    <tr><td align="left" valign="baseline"><code>&#92;'</code></td>
        <td align="left">Single quote (<code>'</code>)</td>
    <tr><td align="left" valign="baseline"><code>&#92;"</code></td>
        <td align="left">Double quote (<code>"</code>)</td>
    <tr><td align="left" valign="baseline"><code>&#92;a</code></td>
        <td align="left">ASCII Bell (BEL)</td>
    <tr><td align="left" valign="baseline"><code>&#92;b</code></td>
        <td align="left">ASCII Backspace (BS)</td>
    <tr><td align="left" valign="baseline"><code>&#92;f</code></td>
        <td align="left">ASCII Formfeed (FF)</td>
    <tr><td align="left" valign="baseline"><code>&#92;n</code></td>
        <td align="left">ASCII Linefeed (LF)</td>
    <tr><td align="left" valign="baseline"><code>&#92;r</code></td>
        <td align="left">ASCII Carriage Return (CR)</td>
    <tr><td align="left" valign="baseline"><code>&#92;t</code></td>
        <td align="left">ASCII Horizontal Tab (TAB)</td>
    <tr><td align="left" valign="baseline"><code>&#92;v</code></td>
        <td align="left">ASCII Vertical Tab (VT)</td>
    <tr><td align="left" valign="baseline"><code>&#92;<var>ooo</var></code></td>
        <td align="left">ASCII character with octal value <i>ooo</i></td>
    <tr><td align="left" valign="baseline"><code>&#92;x<var>hh...</var></code></td>
        <td align="left">ASCII character with hex value <i>hh...</i></td></tbody>
</table>

In [None]:
print("a\blma",r"a\blma") #r in front of a string means literal
print(r'C:\some\name')

#### Operators, Conversions

In [None]:
"al" + "ma"

In [None]:
a = eval("1+1")
b = a + 2
print(b)
str(9),eval("1+1"),float("4.5")

In [None]:
N = 1500

print("The value of N (=" + str(N) + ") is greater than allowed (1000)")

print("The value of N (=%d) is greater than allowed (1000)" % (N))

In [None]:
d = 3
c = 1
b = 7
N = b + c + d
print("Peters has %d dogs, %d cats and %d birds" % (d, c, b))
print("%2.f%%, %2.f%%, %2.f%% of the animals are dogs, cats and birds respectively." % \
     (100.0 * d / N, 100.0 * c / N, 100.0 * b / N))
print("{:2.1f}%, {:2.1f}%, {:2.1f}% of the animals are dogs, cats and birds respectively."\
      .format(100.0 * d / N, 100.0 * c / N, 100.0 * b / N))

In [None]:
import math

"{1:.1f} {1:.4f} {0:.2f}".format(math.pi, 2*math.pi)

Note the escape for the percentage.<br>
Both formats can be used

Since Python 3.6 string formatting can also be done with "f-strings" (a good howto is here: http://zetcode.com/python/fstring/)

In [None]:
a = "foo"
b = "bar"
c = 3.14159
print(f"{a} {b} {c} {2 * c} {c:.2f} }}")
print(rf"\n {a}") # we can mix literal and f-strings

In [None]:
import sys
print("This program was called with name: %s" % (sys.argv[0]))

In [None]:
print(len('alma'))

print()
print("alma\n","a\blma")
print(len("alma\n"),len("a\blma"))

## Encoding strings

In [None]:
s = 'apple'  #string with some kind of encodinf
b = b'apple' #just a list of bytes
print(type(s),type(b))

In [None]:
s = "körte"
b = b"körte"

In [None]:
s = "Körte"
print(len(s))
print(s.encode("utf-8"))
print(s.encode("latin-1"))
print(str(s.encode("utf-8"),"utf-8"))
print(str(s.encode("utf-8"),"latin-1"))

print()
print(type(s.encode("utf-8")))
print(type(s))

print()
import sys
print(sys.getsizeof("korte".encode('utf-8')),sys.getsizeof("körte"),sys.getsizeof("körte".encode('utf-8')))

In [None]:
"őzláb".encode("latin1")

In [None]:
s = "Hello world!"
print(len(s))
print(s[0],s[1],s[-1],s[-2],s[-12])
print(s[0:4],s[:4])
print(s[0:-1:1])
print(s[0:-1:2],s[::2])

#### Useful string methods

In [None]:
s = "Helló világ!\n"
print(s[6:].capitalize() + "X")
print(s.rstrip() + "X")
print(s.count("l"))
print(s.index("l"))
print("123".isdigit(),"1e3".isdigit(),s.isprintable(),"Körte".isprintable())
print(s.split("l"))
print(s.strip("\n! l"))
print(s.upper())
help("12".isprintable)

#### Lists (more next week)

In [None]:
a = [ 7, 3 ,8, 10, 7, 1, 9, 1, 5, "foo"]
print(a[0], a[0:4], a[-1])
print()

b = []
print(b)
b.append("a")
b.append(5)
b.append(a)
print(b)
b.remove("a")
print(b)
print("Pop",b.pop(-1))
print(b)
del b[0]
print(b)

In [None]:
print(list(range(10)))
print(list(range(3,10,2)))

### Basic python instructions

Let's start learning how to read the documentation. (and let's do a recap of python's control flow statements.)
From the <a href="https://docs.python.org/3/reference/compound_stmts.html">reference guide</a>:

#### The while statement

The while statement is used for repeated execution as long as an expression is true:

<code>
while_stmt ::=  "while" expression ":" suite
                ["else" ":" suite]
</code> 
<code>
suite         ::=  stmt_list NEWLINE | NEWLINE INDENT statement+ DEDENT<br>
statement     ::=  stmt_list NEWLINE | compound_stmt<br>
stmt_list     ::=  simple_stmt (";" simple_stmt)* [";"]
</code>

This repeatedly tests the expression and, if it is true, executes the first suite; if the expression is false (which may be the first time it is tested) the suite of the else clause, if present, is executed and the loop terminates.

A break statement executed in the first suite terminates the loop without executing the else clause’s suite. A continue statement executed in the first suite skips the rest of the suite and goes back to testing the expression.

#### The if statement

<code>
if_stmt ::=  "if" expression ":" suite
                ("elif" expression ":" suite)*
                ["else" ":" suite]
</code>

#### The for statement

<code>
for_stmt ::=  "for" target_list "in" expression_list ":" suite
                 ["else" ":" suite]
</code>

In [None]:
for a in range(10):
    if a < 5:
        print(a)
    else:
        print(10-a)

In [None]:
f = 1.0
v = 1.0
while f * v < 1e10:
    f *= v
    v += 1.0
print("The largest factorial less than 1e10 is: %d! = %g" % (v - 1, f))

### Exercise 1
Write a code which reverses a string. E.g.<br>
"A Santa snaps pans at NASA" -> "ASAN ta snap spans atnaS A"

In [None]:
s1="A Santa snaps pans at NASA"

### Excercise 2
a = \[ 7, 3 ,8 , 10, 7, 1, 9, 1, 5\]<br>
1. Write a code which returns the minimum of the above list
2. Write a code which returns the position of the minima of the above list as a list <br>
Try it the normal way and google too.

In [None]:
a = [ 7, 3 ,8 , 10, 7, 1, 9, 1, 5]

### Exercise 3
Write a code which prints out all second characters after a point in a string only if it is an uppercase letter.


In [None]:
lorem = "Lorem ipsum dolor sit amet, consectetur adipiscing elit, \
sed do eiusmod tempor incididunt ut labore et dolore magna aliqua. Ut enim \
ad minim veniam, quis nostrud exercitation ullamco laboris nisi ut aliquip ex \
ea commodo consequat. Duis aute irure dolor in reprehenderit in voluptate velit esse \
cillum dolore eu fugiat nulla pariatur. Excepteur sint occaecat cupidatat non \
proident. sunt in culpa qui officia deserunt mollit anim id est laborum."


### Exercise 4
Write a small code which checks for divisibility by 3 in the traditional way. First it converts the integer number to string then sums up the digits and checks if the sum is divisible by 3. This time you can use the remainder function (%)

# In case you're faster.. Some more advance details

Python is a dynamic language. This means that one does not have to tell the computer what is the type of the variable that you're going to create.

Normally in a static programming language working with variables has two steps:
* instantiation
* initialization

To instantiate a variable means that the programing language reserves a space in the memory according with the type of the variable. To initialize a variable instead means to use that space and to write in it the value that you want that give to the variable.
In Python the language manages itself to find out the type of the variable that you want to initialize and the two processes happen at the same time.
In order to work with variables and types in an efficient way you need to understand how python manages them.

When we assign a value to a variable, python stores some information in a given portion of the memory. To do so it needs to write the information. The memory looks like a long list of 1s and 0s (bits) and to write an information means to replace some of those 1s and 0s with some other series of 1s and 0s. To retrieve the information later, python needs to know the point in which we wrote our list of 1s and 0s and how long the list was. Basically we need to know an address and a size.

Let's play a little to understand how python does that.

In [None]:
import sys
gso = sys.getsizeof

what we have just done is to assign to the variable gso the same information that was stored in sys.getsizeof. sys.getsizeof is a function ("getsizeof") that is defined within the module sys. We will talk about modules later on in the course. To call gso or sys.getsizeof is now the same thing.

In [None]:
?gso

In [None]:
gso(3)

In [None]:
sys.getsizeof(3)

A byte is a collection of 8 bits and is a unit of digital information. A bit represent a binary number. To learn more ask wikipedia ;)

What an address looks like instead is the following:

In [None]:
def addrof(variable):
    return hex(id(variable))

In [None]:
?id

In [None]:
a = 3
addrof(a)

In [None]:
addrof(3)

In [None]:
b = 3
addrof(b)

In [None]:
b = b+1
addrof(b)

In [None]:
addrof(a)

What happened? What did python do with the addresses?
Python always tries to optimize the memory usage. "a" and "b" are two different variables. Nevertheless their address is the same. Python assign the same address to different variables that have the same value. But it knows that the variables are not the same. Knowing that, it creates a new address for the variable "b" as soon as "b" changes.
We don't have to worry to free the space of the memory used by values that we don't use anymore. Python does that automatically. A technical way of saying this is: a "garbage collector" is implemented in python.

In [None]:
var = 5
addrof(var)

In [None]:
var += 10
addrof(var)

The memory used in the old address is now available for other uses since python knows that var is not reserving it anymore.

## Something on the lists now!

In [None]:
a = [1,2,3,4]

In [None]:
gso(a)

In [None]:
addrof(a)

In [None]:
a.append(5)
a

In [None]:
gso(a)

In [None]:
addrof(a)

The address of the list hasn't changed even if we modified the values inside the list itself. This is really important! We can modify, but its address will stay the same.
This happens because a list is a more complex object. In fact it is a collection of addresses, each of them point to a different element of the list.

In [None]:
addrof(a[0])

In [None]:
a[0] = 2

In [None]:
addrof(a[0])

In [None]:
addrof(a)

The address of a hasn't changed, but the address of the first element of the list changed.