# Python Workshop - Day 1

This workshop will review Python fundamentals and prepare you for Galvanize's DSI.

### Topics

* Functions
* Modules
* Types
* String Formatting
* File I/O

### Before we start:

1. Open a new `Terminal`

2. Replicate the git repository of the week 0:
```
git clone https://github.com/zipfian/python-workshop.git
```

3. Obtain this notebook:
```
git clone https://github.com/zipfian/DSI_Lectures.git
cd DSI_Lectures/python-workshop/jfomhover/day1
```

4. Launch jupyter notebook (in a new terminal):
```
jupyter notebook
```

5. Launch atom:
```
atom
```


<span style="color:#888; font-size:150%; font-weight:bold">[Morning Lecture]</span>

## 1. Functions

A function is a reusable bit of code, defined by:
- a **name**,
- **inputs** also named **parameters** or **arguments**,
- an output or **return value** (can be multiple values in one tuple).

In [None]:
def is_palindrome(word):
    """Returns whether the word is a palindrome (the same forwards and backwards).
    
    INPUT: str
    OUTPUT: bool
    """
    # it is more fun if we strip all spaces and put in lowercase
    word = word.replace(" ", "").lower()

    # loop on indexes between 0 and the middle of the word
    for i in xrange(len(word) / 2):
        # if the caracterer at this index is different
        # from the one on the mirror opposite side
        if word[i] != word[-i - 1]:
            # then it's not a palindrome
            return False
    
    # if this loop is exited, it means it has never returned False
    # so it should return True
    return True

Now let's have fun with it...

In [None]:
print "test the word 'rever' (french)"
print is_palindrome('rever')

print "test 'dream'"
print is_palindrome('dream')

print "test the sentence 'et la marine va venir a Malte' (french)"
# translates as 'and the marine will come to Malta'
print is_palindrome('et la marine va venir a malte')

**Functions are "first class" objects in Python**

* They can be passed as arguments to other functions
* They can be returned as values from other functions
* Can be assigned to variables
* Can be stored in other data structures
* Their type is "function"

**Follow the D.R.Y principle: Don't Repeat Yourself!**

Use functions to avoid repeated code.

In [None]:
def fibonacci(a1, a2, n):
    """Prints the n first elements of a fibonacci suite
    given values a1 and a2 for first two ranks."""

    # eliminating easy cases first
    if (n <= 0):
        return
    if (n >= 1):
        print a1
    if (n >= 2):
        print a2

    # looping on n
    anm2 = a1 # value at rank n-2
    anm1 = a2 # value at rank n-1
    for i in range(2, n):
        an = anm2 + anm1 # value at rank n
        print an
        anm2 = anm1      # updating rank n-2
        anm1 = an        # updating rank n-1
    return

print "first 5 values with 1,1"
fibonacci(1,1,5)

print "first 5 values with 1,4"
fibonacci(1,4,5)

In [None]:
def fibonacci_v2(a1, a2, n, func):
    """Prints the n first elements of a fibonacci suite
    given values a1 and a2 for first two ranks,
    and func the operation for computing rank n."""
    
    # eliminating easy cases first
    if (n <= 0):
        return
    if (n >= 1):
        print a1
    if (n >= 2):
        print a2

    # looping on n
    anm2 = a1 # value at rank n-2
    anm1 = a2 # value at rank n-1
    for i in range(2, n):
        # NOTICE THE CHANGE HERE ???
        an = func(anm2,anm1) # value at rank n
        print an
        anm2 = anm1      # updating rank n-2
        anm1 = an        # updating rank n-1
    return

def fibo_operation_add(anm2, anm1):
    return(anm2 + anm1)

def fibo_operation_prod(anm2, anm1):
    return(anm2 * anm1)

print "first 5 values with 1,1 and addition"
fibonacci_v2(1,1,5,fibo_operation_add)

print "first 5 values with 1,4 and addition"
fibonacci_v2(1,4,5,fibo_operation_add)

print "first 5 values with 1,1 and product"
fibonacci_v2(1,1,5,fibo_operation_prod)

print "first 5 values with 1,4 and product"
fibonacci_v2(1,4,5,fibo_operation_prod)

## 2. Modules

* design and use a module to store functions that you want to reuse
* implement a single "main" module that imports and runs your code
* run your code inside a "main" block

### 2.1. Importing

The file `my_module.py` contains:

```python
def foo(x, y):
    return x+y

def bar(x, y, z):
    return x - y + z
```

We'll use 2 ways for importing:

In [None]:
from my_module import foo, bar

print foo(1, 2)
print bar(3, 4, 5)

In [None]:
import my_module

print my_module.foo(1, 2)
print my_module.bar(3, 4, 5)

### 2.2. Setting up a main block

File `script.py` contains:
```python
def foo():
    print 'foofoo'

def bar():
    print 'barbar'

foo()
bar()
```

Here's what will happen if you `import` it:

In [None]:
# importing from script.py executes the whole file
from script import foo

Instead, in file `script2.py`, we have defined a "main block". This block will be executed only if the file is executed as main. If it is imported it will not execute.

```python
def foo():
    print 'foofoo'

def bar():
     print 'barbar'

if __name__ == '__main__':
    foo()
    bar()
```

In [None]:
# script2.py is still executed, but foo() and bar() aren't called,
# because __name__ != '__main__'
from script2 import foo

# but still I can access those functions...
foo()

## 3. Types

The basic substantive objects you will use to implement your processes:

| TYPE | DESCRIPTION | EXAMPLE VALUE(S) |
|:--|:--|:--|
| `int` | integers | `1, 2, -3` ... |
| `float` | real numbers, floating values | `1.0, 2.5, 102342.32423` ... |
| `str` | strings | `'abc'` |
| `tuple` | an immutable tuple of values, each has its own type | `(1, 'a', 5.0)` |
| `list` | a list defined as an indexed sequence of elements | `[1, 3, 5, 7]` |
| `dict` | a dictionary that maps keys to values | `{'a' : 1, 'b' : 2}` |
| `set` | a set of distinct values | `{1, 2, 3}` |

### 3.1. Testing types of objects

**You can check the type of objects** using:

* `type(some_object)` : returns the type itself
* `isinstance(some_object, some_type)` : returns `True` if the object has the given type

In [None]:
my_string = 'abc'

print 'object:', my_string
print type(my_string)

print 'Is string?', isinstance(my_string, str)

In [None]:
my_integer = 123

print 'object:', my_integer
print type(my_integer)

print 'Is integer?', isinstance(my_integer, int)
print 'Is float?', isinstance(my_integer, float)

### 3.2. Immutable vs Mutable Types

**Immutable** - those can't be changed, only by redefining (replacing, overwriting...)
* int 1, 2, -3
* float 1.0, 2.5, 102342.32423
* str 'abc'
* tuple (1, 'a', 5.0)

**Mutable** - can be changed, usually they are structured from which elements can be modified, replaced, exchanged.
* list [1, 3, 5, 7]
* dict {'a' : 1, 'b' : 2}
* set {1, 2, 3}


In [None]:
example_list = [1, 2, 3]
example_list[0] = 100
print example_list

In [None]:
example_tuple =  (1, 2, 3)
example_tuple[0] = 100
print example_tuple

Another way of seeing this, is by looking at the memory block allocated for a variable.
- **Immutable** types will need a new block for their value to change (you never modify anything inside that block).
- **Mutable** types will let you modify what's inside that block.

_(Put it another way, you don't have write permission to write in the memory occupied by immutable types)_

Let's test that with the function `id(some_object)` that returns the address of the object in memory.

In [None]:
number = 1
number += 2
print number

In [None]:
number = 1
print id(number)

number += 2
print id(number)

In [None]:
example_list2 = [1, 2, 3]
print id(example_list2)

example_list2[0] = 100
print id(example_list2)


## 4. String Formatting

How to create strings of text using content of variables ?

ex: "My name is ___ " ==> "My name is Mike"

### 4.1. Using `str.format()` [recommended]

This method is now recommended, it will live up to the transition to python 3. You can find the [documentation on string formatting in python 2](https://docs.python.org/2/library/string.html#formatstrings), or as a [tutorial](https://pyformat.info).

The idea is to create a string that will define how the content will be formatted, then to _fill up_ that string with some values, then `print` it.

In [None]:
my_formatstring = 'My name is {name} and my favorite color is {color}.'

my_returnstring = my_formatstring.format(name='Mike', color='Blue')

print my_returnstring

In practice, those three steps are condensed in one line:

In [None]:
print('My name is {name} and my favorite color is {color}.'.format(name='Mike', color='Blue'))

You can use indexes instead of keys.

In [None]:
print('I live in {state} near {city}'.format(state='WA', city='Seattle'))
print('I live in {0} near {1}'.format('WA', 'Seattle'))
print('I live in {} near {}'.format('WA', 'Seattle'))

**Typical use for formatting floats** (you'll use that a lot).

In [None]:
mse = 126.159320642998

# unspecified type, converted to string then injected
print('Mean Square Error: {}'.format(mse))

# specifying it's a float (auto cut to 6 digits)
print('Mean Square Error: {:f}'.format(mse))

# specifying how many significants digits
print('Mean Square Error: {:.2f}'.format(mse))

### 4.2. Using '%s' - String Formatting Operator

In [None]:
my_name = 'Mike'

print 'Hi ! My name is %s' % my_name

In [None]:
print 'My name is %s and my favorite color is %s.' % ('Mike', 'Blue')

In [None]:
mse = 126.159320642998

# unspecified, converted to string then injected
print('Mean Square Error: %s' % mse)

# specifying it's a float (auto cut to 6 digits)
print('Mean Square Error: %f' % mse)

# specifying how many significants digits
print('Mean Square Error: %.2f' % mse)

See other variants here: https://docs.python.org/2/library/stdtypes.html#string-formatting

<span style="color:#888; font-size:150%; font-weight:bold">[Morning Assignment]</span>

Demonstration:
- workflow
- programming environment
- unit testing your functions

<span style="color:#888; font-size:150%; font-weight:bold">[Afternoon Lecture]</span>

## 5. File I/O

We often want to use the contents of a file in our code.

For example, data is sometimes stored in files (.csv, .txt, etc.)

This section explains how to work with files.

Here's a dump of the file `sample_file.txt`:
```
1234

5678

ABCD

EFGH

```

### 5.1. Opening and reading files

Use the 'open' built-in method. It returns an object with type `file`.

In [None]:
my_file = open('sample_file.txt')

print("object has type: {}".format(type(my_file)))

By default, `open()` opens a file for reading (we'll see how to write later). You can specify it using `'r'` as second argument.

You will have to call functions of this object like, for instance, `.read()`: it returns the whole content of this file as a string.

In [None]:
my_file = open('sample_file.txt', 'r')

my_content = my_file.read()

print("object has type: {}".format(type(my_content)))

print [my_content]

### 5.2. Reading a file line by line using generators [recommended]

In practice, we prefer to read files line by line. It is particularly more efficient because it avoids loading the full content of the file in memory. In data science, you may encounter files (log files, transactions, etc) that may not fit at all into your computer's memory. Reading them line by line to do some operations can be the only way to process them.

In [None]:
my_file = open('sample_file.txt', 'r')

for line in my_file:
    print 'Current line:', line
    # 'line' is discarded from memory
    # after each iteration of loop

**Note**: the object returned by `open()` is a **"generator"**. Once a line is referenced (put into a variable), it is passed and cannot be referenced again without reopening the file.

In [None]:
my_file = open('sample_file.txt', 'r')

print 'first iteration:'
print [line for line in my_file]

print

print 'second iteration'
print [line for line in my_file]

### 5.3. Closing files

You have to take care of closing a file you don't use anymore. This can be done either explicitely using `.close()`, or by putting our I/O process within a `with` statement.

Using **`.close()`** explicitely closes the file:

In [None]:
my_file = open('sample_file.txt', 'r')
contents = my_file.read()
my_file.close()

print my_file

Placing operations inside a `with` statement implicitely closes the file handler once the block is exited.

In [None]:
with open('sample_file.txt', 'r') as my_file:
    my_file.read()
    
print my_file

### 5.4. Reading a file in practice

As a summary, here's how one should do it.

In [None]:
count = 0    # just for the sake of the example below

with open('sample_file.txt', 'r') as my_file:
    for line in my_file:
        # do something with line...
        # below we count the number of non-whitespace chars
        count += len(line.strip())

print count

### 5.5. Reading CSV files

A special case of text files is comma-separated values.

Let's have a look at this file:

In [None]:
with open('sample_csv_easy.csv') as my_file:
    for line in my_file:
        print line.strip()   # remove \n at the end

You might be tempted to do just that...

In [None]:
with open('sample_csv_easy.csv') as my_file:
    for line in my_file:
        line = line.strip()   # remove \n at the end
        print line.split(',') # returns a list by splitting by ','

**<span style="color:red">JUST DON'T</span>** (please `^^;`). Use module `csv` instead, it has everything you need to face the easy and the hard scenarios.

In [None]:
import csv

with open('sample_csv_easy.csv') as my_file:
    reader = csv.reader(my_file)
    for line in reader:
        print line

In some (real life) cases, those files may contain anything.

In [None]:
with open('sample_csv_hard.csv') as my_file:
    for line in my_file:
        print line.strip()   # remove \n at the end

Even the best can fail...

In [None]:
import csv
with open('sample_csv_hard.csv') as my_file:
    reader = csv.reader(my_file)
    for line in reader:
        print line

You can provide many parameters to that `csv.reader`, to handle special cases.

In [None]:
import csv

with open('sample_csv_hard.csv') as csvfile:
    my_reader = csv.reader(csvfile,
                           delimiter=',',         # specifies the delimiter
                           quotechar='"',         # char for delimiting quotes
                           skipinitialspace=True) # ignores leading spaces in fields
    for row in my_reader:
        print row

### 5.6. Writing files

Use the second argument of `open()`: `open(filename, 'w')`

See details: https://docs.python.org/2/library/functions.html#open  
Review difference between 'r', 'rb', 'w', 'wb' in open()

In [None]:
lines_to_write = ['abcd\n', 'blabla\n', '1234\n']

with open('file_to_write.txt', 'w') as my_file:
    for line in lines_to_write:
        my_file.write(line)
    
with open('file_to_write.txt') as my_file:
    print my_file.read()

In the block above, we had to add `'\n'` at the end of our strings before using write. There is a simple way to cope with that:

In [None]:
lines_to_write = ['abcd', 'blabla', '1234']

with open('file_to_write.txt', 'w') as my_file:
    for line in lines_to_write:
        my_file.write(line)
        my_file.write('\n')
    
with open('file_to_write.txt') as my_file:
    print my_file.read()

You can also append to end of files using `open(filename, 'a')` instead.

In [None]:
lines_to_append = ['1357', '2468']

with open('file_to_write.txt', 'a') as my_file:
    for line in lines_to_append:
        my_file.write(line)
        my_file.write('\n')
    
with open('file_to_write.txt') as my_file:
    print my_file.read()