# Introduction to Data Science - Lecture 3: Basic Python II

*COMP 5360 / MATH 4100, University of Utah, http://datasciencecourse.net/*


In this lecture we'll continue to see what Python can do and learn more about data types, operators, conditions, basic data structures, and loops. 

Here is a nice [Python Cheat Sheet](https://drive.google.com/open?id=0ByIrJAE4KMTtWGZmQXBPai1NQWM) that is also printable. 

## 1. More on Data Types and Operators

We've already covered the basic data types and operators. Now we'll recap and go into some more details. 

Also, make sure to check out the [complete documentation of standard types and operations](https://docs.python.org/3/library/stdtypes.html).

### Boolean

Boolean values represent truth values `True` and `False`. Booleans can be used as any other variable:

In [1]:
my_true_var = True
print (my_true_var)
my_false_var = False
print (my_false_var)

True
False


`True` and `False` are reserved keywords in their capitalized form. 

There are three operations defined on booleans: and, or, and not. 

| Operation | Result | 
|------|------|
| `x or y`	| if x is false, then y, else x  |
| `x and y`	| if x is false, then x, else y  |
| `not x`	    | if x is false, then True, else False  |




In [2]:
True or False

True

In [3]:
True and False

False

In [4]:
not True

False

In [5]:
not False

True

#### Comparisons

Comparisons are very important in programming: they let us decide on conditional flows, which we will discuss later. To compare two entities, Python provides eight comparison operators: 


| Operation	| Meaning
| - | - |
| <	| strictly less than
|<=	| less than or equal
|> |	strictly greater than
|>= |	greater than or equal
|==	 |equal
|!= |	not equal
|is	| object identity
|is  not |	negated object identity

These operators take two operands and return a boolean. We'll glance over the last two for now, but here are some examples of the others:

In [6]:
1 < 2 

True

In [7]:
1 <= 1

True

In [8]:
14 == 14

True

In [9]:
14 != 14 

False

In [10]:
"my text" == "my text"

True

In [11]:
"my text" == "my other text"

False

In [12]:
"a" > "b"

False

In [13]:
"a" < "b"

True

In [14]:
"aa" < "aba"

True

In [15]:
"aaa" < "aa"

False

We see that the operations work on numbers just as we would expect. 

Strings are also compared as we'd expect. The greater and less than operators use lexicographic ordering. 

### Numerical Data Types

Python supports three built in data types, `int`, `float`, and `complex`. Since Python is dynamically typed, we don't have to define the data types explicitly!

The **int** data type is used to to represent integers $\mathbb{Z}$. Python is special in the way it handles integers as it allows arbitrarily large integers, while most other programming languages reserve a certain chunk of memory for integers, which can lead to a number "overflowing". This, for example, would not work properly in C or Java:

In [16]:
2 ** 200

1606938044258990275541962092341162602522202993782792835301376

However, we can still experience overflows in Python if we work with pandas, a library we will extensively use.

Integers can be **positive, zero, or negative**, as you would expect. 

The **float** datatype is used to represent real numbers $\mathbb{R}$. Floats, however, can not be precisely represented by a computer. Take the example of $1/3$. Representing $1/3$ accurately would require the computer to store an infinitely large number of $0.33333333333333333333....$ (if a computer used a decimal number system). 

Since computers use binary numbers, also seemingly simple numbers such as 0.1 cannot be accurately represented. Check out this example: 

In [17]:
.1 + .1 + .1 == .3

False

What computers do is that they store approximations using a limited chunck of memory to store the number. At the same time, Python rounds the output of numbers:


In [18]:
1 / 10

0.1

This number is in fact not 0.1 but is stored in the computer as: 

`0.1000000000000000055511151231257827021181583404541015625`

This representation, however, is rarely useful, hence the number is rounded. 

The lesson that you should remember is that **you CANNOT compare two float numbers with the `==` operator**. 

In [19]:
a = .1 + .1 + .1 
b = .3
a == b

False

Instead, you can do something like this: 

In [20]:
# Compare for equality up to a constant value
a < b + 0.00001 and a > b - 0.00001

True

This, of course, only compares up to the 5th digit behind the comma. 

A better way to do this is the [isclose](https://docs.python.org/3/library/math.html#math.isclose) function from the math package. 

In [21]:
# this is how we import a package
import math 
# here we call the isclose function that comes with the math package. 
math.isclose(a, b, rel_tol=0.00001)

True

Here we've also used our first package, the package `math`! 

Packages extend the basic functionality of python. We'll work a lot with packages in the future, details will follow.

#### Numerical Operators

Here is a selection of operators that work on numerical data types. 

| Operation | Result
| - | - |
|`x + y`	|sum of x and y	 	 
|`x - y`	|difference of x and y	 	 
|`x * y`	|product of x and y	 	 
|`x / y`	|quotient of x and y	 	 
|`x % y`	| remainder of x / y
|`-x`	| x negated	 	 
|`abs(x)` |	absolute value or magnitude of x	 
|`int(x)` |	x converted to integer	
|`float(x)` |	x converted to floating point	
|`pow(x, y)` |	x to the power y	
| `x ** y` | x to the power y

Most of these should be rather straight-forward.

You might not have heard of the "modulo operator" `%` which returns the remainder of a devision x / y. Here is an example:

In [22]:
7 % 2

1

Also, remember, that many operations have a shorthand assignment version, i.e., instead of:

In [23]:
x = 2
y = 3
x = x+y
x

5

you can also write: 

In [24]:
x = 2
y = 3
x += y
x

5

This works equally for other operators: 

In [25]:
x = 2
y = 3
x -= y
x

-1

In [26]:
x = 2
y = 3
x /= y
x

0.6666666666666666

In [27]:
x = 2
y = 3
x **= y
x

8

### Exercise 1:

**Task 1.1:** Try how capitalization affects string comparison, e.g., compare "datascience" to "Datascience".

**Task 1.2:** Try to compare floats using the `==` operator defined as expressions of integers, e.g., whether 1/3 is equal to 2/6. Does that work?

**Task 1.3:** Write an expression that compares the "floor" value of a float to an integer, e.g., compare the floor of 1/3 to 0. There are two ways to calculate a floor value: using `int()` and using `math.floor()`. Are they equal? What is the data type of the returned values?

In [28]:
"datascience" == "Datascience"

False

In [32]:
.1 + .1 == .2

True

In [35]:
int(1/3) == int(0)

True

## 2. Functions Recap

Functions have a name, take parameters, and can (but must not) provide a return value.

In [36]:
def add(x, y):
    result = x + y
    return result

add(1,9)

10

Also, remember that variables defined inside of a function are not accesible outside of a function:

In [37]:
def scope_test():
    function_scope = "only readable in here"
    # Within the function, we can use the variable we have defined
    print("Within function: " + function_scope)

# calling the function, which will print     
scope_test()

# If we try to use the function_scope variable outse of the function, we will find that it is not defined. 
# This will throw a NameError, because Python doesn't know about that variable here
print("Outside function: " + function_scope)

Within function: only readable in here


NameError: name 'function_scope' is not defined

## 3. Conditions: if-elif-else statements

We've learned how to make comparisons between items and do boolean operations. The result of these operations was usually a boolean value. 

We can now make use of these boolean values to **steer the program flow using conditions**. 

We can do that using if statements. If conditions evaluate an arbitrary expression for its boolean value and execute one branch of code if they are true, and another branch if they are false:

In [38]:
def isOdd(x):
    # the statement within the brackets is evaluated for truth
    if (x % 2 == 1):
        # body, executed if true
        print(str(x) + " is in fact an odd number")
    else:
        # executed if false
        print(str(x) + " is an even number")

isOdd(144)
isOdd(13)

144 is an even number
13 is in fact an odd number


Notice that the **code blocks that are intended form the "body"** of the if statement, just as it did for functions.

In addition to the explicit boolean values that we can use to test for truth, most **programming languages define a range of things to be true or false**. 

By definition, 0 of any numeric type, empty sequences or lists, `none` values, etc., are considered false. Everything else is considered true.

In [None]:
if (0):
    print("This should never happen")
else:
    print("0 is false")

undefined_var = None
if (undefined_var):
    print("This should never happen")
else:
    print("An undefined variable is false")
    
if ([]):
    print("This should never happen")
else:
    print("An empty list is false")


You can also **chain conditions using the `elif` statement**, which is short for else if:

In [39]:
def smallest_factors(x):
    # notice the use of the negation and the use of 0 as false
    if(not x % 2):
        print("2 is a factor of " + str(x))  
    elif(not x % 3):     # only evaluated when if was false
        print("3 is a factor of " + str(x))
    else: # only evaluated when both if and elif were false
        print("Neither 2 nor 3 are factors of " + str(x))

smallest_factors(4)
smallest_factors(9)
smallest_factors(12)

2 is a factor of 4
3 is a factor of 9
2 is a factor of 12


Notice that the elif (or the else) branch is not evaluated when the if branch matches. A function that prints whether both, 2 and 3 is a factor could be written like this: 

In [None]:
def factors(x):
    # notice the use of the negation and the use of 0 as false
    if(not x % 2):
        print("2 is a factor of " + str(x))  
    if(not x % 3):     
        print("3 is a factor of " + str(x))
    if (x % 2) and (x % 3):
        print("Neither 2 nor 3 are factors of " + str(x))

factors(4)
factors(9)
factors(12)
factors(13)

### Exercise 3: If statement

Write a function that takes two integers. If either of the numbers can be divided by the other without a remainder, print the result of the division. If none of the numbers can divide the other one, print an error message.

In [42]:
def ex3(x,y):
    if (x % y) == 0:
        print (x/y)
    elif(y % x) == 0:
        print(y/x)
    else:
        print("error")
ex3(2,4)

2.0


## 4. Lists

Up to know we've worked only with basic data types such as booleans, numbers and strings. Now we'll take a look at a compound data type: lists.

**A list is a collection of items.** Another word commonly used for a list in other programming languages is an array (though there are differences between lists and arrays in many languages). 

**Lists are created with square brackets `[]` and can be accessed via an index:**

In [43]:
beatles = ["Paul", "John", "George", "Ringo"]
# printing the whole array
print(beatles)
# printing the first element of that array, at index 0
print(beatles[0])
# third element, at index 2
print(beatles[2])
# access the last element
print(beatles[-1])
# access the one-but-last element
print(beatles[-2])

['Paul', 'John', 'George', 'Ringo']
Paul
George
Ringo
George


If we try to address an index outside of the range of an array, we get an error: 

In [44]:
beatles[5]

IndexError: list index out of range

Sometimes, it makes sense to pre-initialize an array of a certain size:

In [45]:
[0] * 10

[0, 0, 0, 0, 0, 0, 0, 0, 0, 0]

There is also a handy shortcut for quickly initializing lists. This uses the [range()](https://docs.python.org/3/library/functions.html#func-range) function, which we'll explore in more detail later.

We can also create **slices of an array with the slice operator `:`**

```python
a[start:end] # items start through end-1
a[start:]    # items start through the rest of the array
a[:end]      # items from the beginning through end-1
a[:]         # a copy of the whole array
```

There is also the step value, which can be used with any of the above:

```python
a[start:end:step] # start through not past end, by step
```

See [this post](http://stackoverflow.com/questions/509211/explain-pythons-slice-notation) for a good explanation on slicing.

In [46]:
# Get the slice from 0 (included) to 2 (excluded)
beatles[:2] # this can also be written as [0:2]

['Paul', 'John']

In [47]:
# Sclice from index 2 (3rd element) to end
beatles[2:]

['George', 'Ringo']

In [48]:
# A copy of the array 
beatles[:]

['Paul', 'John', 'George', 'Ringo']

The slice operations return a new array, the original array is untouched: 

In [49]:
beatles

['Paul', 'John', 'George', 'Ringo']

Slicing outside of a defined range returns an empty list:

In [50]:
beatles[4:9]

[]

Strings can be treated similar to arrays with respect to indexing and slicing:

In [51]:
paul = "Paul McCartney"
paul[0:4]

'Paul'

Lists (in contrast to strings) are mutable. 

That means **we can change the elements that are contained in a list**: 

In [52]:
beatles[1] = "JohnYoko"
beatles

['Paul', 'JohnYoko', 'George', 'Ringo']

This does not work with strings, strings are immutable: 

In [53]:
# This will return an error
paul[1] = "o"

TypeError: 'str' object does not support item assignment

Arrays can also be **extended with the `append()` function**:

In [54]:
beatles.append("George Martin")
beatles

['Paul', 'JohnYoko', 'George', 'Ringo', 'George Martin']

Lists can be **concatenated**: 

In [55]:
zeppelin = ["Jimmy", "Robert", "John", "John"]
beatles += zeppelin
beatles

['Paul',
 'JohnYoko',
 'George',
 'Ringo',
 'George Martin',
 'Jimmy',
 'Robert',
 'John',
 'John']

We can **check the length** of a list using the built-in [`len()`](https://docs.python.org/3.3/library/functions.html#len) function:

In [56]:
len(zeppelin)

4

Lists can also be **nested**: 

In [57]:
# let's reset the beatles first
beatles = ["Paul", "John", "George", "Ringo"]
bands = [beatles, zeppelin]
bands

[['Paul', 'John', 'George', 'Ringo'], ['Jimmy', 'Robert', 'John', 'John']]

In fact, lists can be of hybrid data types, which, however, is something that you typically don't want to and shouldn't do:

In [58]:
bad_bands = bands + [1, 0.3, 17, "This is bad"]
# this list contains lists, integers, floats and strings
bad_bands

[['Paul', 'John', 'George', 'Ringo'],
 ['Jimmy', 'Robert', 'John', 'John'],
 1,
 0.3,
 17,
 'This is bad']

### Exercise 4: Lists

* Create a list for the Rolling Stones: Mick, Keith, Charlie, Ronnie.
* Create a slice of that list that contains only members of the original lineup (Mick, Keith, Charlie). 
* Add the stones lists to the the bands list.

In [65]:
stones = ['Mick','Keith','Charlie','Ronnie']
stones = stones[:3]
bands = bands[:2]
bands.append(stones)

In [67]:
bands = bands[:]
bands

[['Paul', 'John', 'George', 'Ringo'],
 ['Jimmy', 'Robert', 'John', 'John'],
 ['Mick', 'Keith', 'Charlie']]

## 5. Loops

So far we have learned about two ways to contorl the flow of a program: functions and if-statements. Now we'll look at another important control structure: loops. A loop has a condition, and as long as that condition is true, it will continue to re-execute its body. 

There are two types of loops. For loops and while loops.

### While loops

While loops use the `while` keyword, a condition, and the loop body:

In [68]:
a = 1

# print numbers 0-100
while (a <= 100):
    print(a, end=", ") 
    # end is a parameter of print that defines how the string to be printed ends. 
    # By default, a newline \n is appended, which we overwrite here
    a += 1

1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47, 48, 49, 50, 51, 52, 53, 54, 55, 56, 57, 58, 59, 60, 61, 62, 63, 64, 65, 66, 67, 68, 69, 70, 71, 72, 73, 74, 75, 76, 77, 78, 79, 80, 81, 82, 83, 84, 85, 86, 87, 88, 89, 90, 91, 92, 93, 94, 95, 96, 97, 98, 99, 100, 

What happens here? The `while` keyword indicates that this is a loop, which is followed by the **terminating condition of `b <= 100`**. As long as that condition is true, the loops body will be called again and again and again ...

Once the terminating condition evaluates to false, the code in the loop body will be skipped and the flow of execution continues below the loop. 

You might rightly guess that it's easy to write loops that don't terminate. Here is one example:
```python 
while True:
    print "Stuck"
```

This program is stuck in the loop forever (or until you interrupt it by interrupting your kernel, your computer goes off, etc.) It is hence important to take care that loops actually reach a terminating condition, and it's not always as obvious as in the previous example that this is not the case. 

But we could also **use the `break` statement to terminate a loop**:

In [69]:
a = 1
while (True):
    print(a, end=", ") 
    a += 1
    if (a > 100):
        break

1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47, 48, 49, 50, 51, 52, 53, 54, 55, 56, 57, 58, 59, 60, 61, 62, 63, 64, 65, 66, 67, 68, 69, 70, 71, 72, 73, 74, 75, 76, 77, 78, 79, 80, 81, 82, 83, 84, 85, 86, 87, 88, 89, 90, 91, 92, 93, 94, 95, 96, 97, 98, 99, 100, 

Here, we've moved the check of the condition into an if statement, and break if the if statement is executed. 

Similar to the `break` statement, there is also a `continue` statement, that ends evaluation of the loop body and goes back to the start of the loop in the next cycle:

In [70]:
a = 0
while (a < 100):
    a +=1;
    # thorw brackets around all numbers divisible by 3
    if (not a % 3):
        print("[" + str(a) + "]", end=", ")
        continue # the next line isn't executed because the flow goes back to the beginning of the loop
    print(a, end=", ")
   
   

1, 2, [3], 4, 5, [6], 7, 8, [9], 10, 11, [12], 13, 14, [15], 16, 17, [18], 19, 20, [21], 22, 23, [24], 25, 26, [27], 28, 29, [30], 31, 32, [33], 34, 35, [36], 37, 38, [39], 40, 41, [42], 43, 44, [45], 46, 47, [48], 49, 50, [51], 52, 53, [54], 55, 56, [57], 58, 59, [60], 61, 62, [63], 64, 65, [66], 67, 68, [69], 70, 71, [72], 73, 74, [75], 76, 77, [78], 79, 80, [81], 82, 83, [84], 85, 86, [87], 88, 89, [90], 91, 92, [93], 94, 95, [96], 97, 98, [99], 100, 

### Exercise 5.1: While

Write a while loop that computes the sum of the 100 first positive integers. I.e., calculate

$1+2+3+4+5+...+n$   
$n=5$

### For loops

In contrast to most other programming language, Python uses for loops mainly to iterate over items of a sequence. 

It uses the following syntax:
```python
for variable in sequence:
    #body
```

The variable is then a accessible within the body of the loop.

Here is an example:

In [71]:
for member in zeppelin: 
    print(member)

Jimmy
Robert
John
John


Of course, that works with arbitrary **slices of lists**: 

In [72]:
for member in zeppelin[:2]:
    print(member)

Jimmy
Robert


We can iterate over **nested lists** with nested for loops: 

In [73]:
for band in bands:
    print("Band Members: ")
    print("-------------")
    for member in band:
        print(member)
    print()

Band Members: 
-------------
Paul
John
George
Ringo

Band Members: 
-------------
Jimmy
Robert
John
John

Band Members: 
-------------
Mick
Keith
Charlie



When you want to iterate over a sequence of numbers, use the [`range()`](https://docs.python.org/3/library/stdtypes.html#range) function. Range generates a sequence of numbers:

In [74]:
# we create a new list with the output of the range function
list(range(5))

[0, 1, 2, 3, 4]

In [75]:
# start at 0, stop at index 10, two steps
list(range(0, 10, 2))

[0, 2, 4, 6, 8]

Using this range function, we can now iterate of a sequence of numbers:

In [76]:
for i in range(10): 
    print (i)

0
1
2
3
4
5
6
7
8
9


The range function also takes other parameters, specifically a "start", "stop" and a "step-size" parameter.

In [77]:
for i in range (0, -20, -3):
    print(i)

0
-3
-6
-9
-12
-15
-18


### Exercise 5.2: for loops

**5.2.1:** Use a for loop to create an array that contains all even numbers in the range 0-50, i.e., an array: [2, 4, 6, ..., 48, 50]  

**5.2.2:** Create a new array for the Beatles main instruments: Ringo played drums, George played lead guitar, John played rhythm guitar and Paul played bass. Use a for loop to print:

```
Paul: Bass
John: Rythm Guitar
George: Lead Guitar
Ringo: Drums
```



In [80]:
for i in range(2,51,2):
    print(i)

2
4
6
8
10
12
14
16
18
20
22
24
26
28
30
32
34
36
38
40
42
44
46
48
50


## 6. Recursion

Another way to control program flow is recursion. The basic idea of recursion is that a function is allowed to call itself. Here is an example for printing the numbers 0-10: 

In [81]:
def printNumber(current, limit):
    print(current)
    if current < limit:
        printNumber(current + 1, limit)

In [82]:
printNumber(0, 10)

0
1
2
3
4
5
6
7
8
9
10


Note that we have implemented looping / iteration behavior without actually using a loop! However, recursion can be used for more than just loops; it is very well suited, for example, to operate on trees and graphs.

We can also use return values in recursive functions. In the following, the recursive call is in the return statement. Here, the evaluation stack goes all the way to 10, after which the return doesn't contain another recursive call, terminating the recursion. Then all the functions return in the order in which they were called and build the string:

In [83]:
def getNumberString(current, limit):
    if current <= limit: 
        return str(current) + "," + getNumberString(current+1, limit)
    
    return ""

In [84]:
getNumberString(0, 10)

'0,1,2,3,4,5,6,7,8,9,10,'

### Exercise 6: Recursion
Write a recursive function that calculates the factorial of a number.

In [96]:
def factorial(n):
    if n < 1:
        return 1
    else:
        return n * factorial(n-1)
factorial(7)

5040

## 7. Revisiting Lists: List Comprehension

Now that we know about loops, we can also take a look at [list comprehension](https://docs.python.org/3.5/tutorial/datastructures.html#list-comprehensions). List comprehension can be used to initialize and transform arrays. 



In [85]:
# _ is customary for a variable name if you don't need it
[0 for _ in range(10)]

[0, 0, 0, 0, 0, 0, 0, 0, 0, 0]

In [86]:
["John" for _ in range(10)]

['John',
 'John',
 'John',
 'John',
 'John',
 'John',
 'John',
 'John',
 'John',
 'John']

In [87]:
# we can also make  use of values we iterate over
[i for i in range(10)]

[0, 1, 2, 3, 4, 5, 6, 7, 8, 9]

We can, for example use functions in place of a variable. Here we initialize an array of random numbers in the unit interval:

In [91]:
import random
rands = [random.random() for _ in range(10)]
rands

[0.842085800082225,
 0.37725595625320285,
 0.9862459400435112,
 0.29470052000548497,
 0.9363976318305844,
 0.8794773086374968,
 0.11839352940495251,
 0.1341902456276418,
 0.38204783274904686,
 0.7973495407638815]

You can also use list comprehension to create a list based on another list:

In [92]:
[x*10 for x in rands]

[8.42085800082225,
 3.7725595625320283,
 9.862459400435112,
 2.94700520005485,
 9.363976318305845,
 8.794773086374969,
 1.1839352940495251,
 1.3419024562764181,
 3.8204783274904686,
 7.973495407638815]