# INFO411 Lab 0 - Getting Started with IPython

## Part A. Review

This lab is for us to get familiar with our lab environment and revise some Python skills. 

We are using Jupyter Notebook, an interactive environment based on IPython - a high-performance command shell for Python. If you want to install Jupyter on your own computer, please choose the **Python 2.7** version from Anaconda: https://www.continuum.io/downloads. 

To proceed, read along and run the code cells along the way. If you want to add comments, use the plus sign button above and change the cell format to "Raw NBConvert", or "Markdown" if you want more fun :). 

Complete the scripts (for "tasks") and verify their outcome. This lab is not assessed, but you can still submit the completed notebook by 10pm TUE 18/7. 

### Basics

In [1]:
print "Hello World!"

Hello World!


Beware Python is a "dynamic type" language, and its arithmetic operators are quite permissive (but outcome can be surprising). Run the following cells and compare the outcome:

In [2]:
1/89

0

In [3]:
1.0/89

0.011235955056179775

In data mining we often deal with floating number calculations (but work sometimes with integer data), so take heed of these subtleties.

"a**2" gives the power of 2 on $a$: $a^2$.

In [5]:
# Change a to 11, 111, 1111, ... each time click back to this cell, make change, and run; repeat
a=11
print 'a=',a,'a^2=',a**2

a= 11 a^2= 121


This is tedious. We can use a "for" loop to save some time. Run the following code and see how the variable 'a' is updated:

In [6]:
a=1
for i in range(9):
    print a
    a=a*10+1

1
11
111
1111
11111
111111
1111111
11111111
111111111


In fact, the function range(.) returns a list: 

In [7]:
range(9)

[0, 1, 2, 3, 4, 5, 6, 7, 8]

**<font color="red">TO-DO</font>**: complete the code below to get all $a^2$ printed for a=1,11,111,....

In [9]:
a=1
for i in range(9):
    print a**2
    a=a*10+1

1
121
12321
1234321
123454321
12345654321
1234567654321
123456787654321
12345678987654321


To automate things, it is often a good idea to wrap some code into a function for reuse. 

Here's an example - a function that calculates the hypotenuse (side opposite the right angle) of a rectangular triangle, given the lengths of its two sides. We import the "math" package for the purpose, and then define a function according to the Pythagoras theorem $c^2=a^2+b^2$:

In [10]:
import math
def pythagoras(a, b):
    return math.sqrt(a**2 + b**2)

pythagoras(3,4)

5.0

In [11]:
# Another version - with a careful check on parameter values
def pythagoras(a, b):
    if (a<0 or b<0):
        print('Wrong parameters: negative side length. ')
        return -1
    return math.sqrt(a**2 + b**2)

pythagoras(-3,4)

Wrong parameters: negative side length. 


-1

### List

Or, we can use a "for ... in ..." loop to iterate through a list and generate the outcome:

In [12]:
# iterate through a list
alist=[1,11,111,1111,11111]
for a in alist:
    print a**2

1
121
12321
1234321
123454321


Interesting, isn't it? 

List is a very useful data structure in Python. Like in C/Java, the index of an N-element list starts from 0, ends with $N-1$. The $N-1$ index has a handy shorthand: -1. 

In [13]:
print 'Length of the list:', len(alist)
print 'First entry:', alist[0]
print 'Third entry:', alist[2]
print 'Last entry:', alist[-1]

Length of the list: 5
First entry: 1
Third entry: 111
Last entry: 11111


An *often-used* trick is to use an empty list to collect data progressively. The following blurb imports the 'random' package and generate 100 floating random number between 0 and 1.0:

In [14]:
# import the 'random' package 
import random
rlst=[]
for i in range(100):
    rlst.append(random.random())
len(rlst)

100

### String

String manipulation in Python is flexible and easy. 

In [15]:
# treated as a list
hello='hello world'
hello[1]

'e'

In [16]:
# split by a specified separator
hello.split(' ')

['hello', 'world']

In [18]:
# query / search etc.
print hello.startswith('he')
print hello.find('world')

True
6


### Dictionary

Another useful, generic data structure in Python is the dictionary, "dict". It is used to connect keys and values into pairs. 

In [20]:
week=dict([('Monday', 1), ('Tuesday', 2)])
week['Tuesday']     # Use key to query value, e.g. check out which day is Tuesday

2

In [21]:
# insert a new pair
week['Wednesday']=3
week

{'Monday': 1, 'Tuesday': 2, 'Wednesday': 3}

In [22]:
# remove a pair
del week['Tuesday']
week

{'Monday': 1, 'Wednesday': 3}

**<font color="red">TO-DO</font>**: insert all weekday-number pairs into 'week' and verify:

In [25]:
week=dict([('Monday', 1), ('Tuesday', 2)])
week['Wednesday']=3
week['Thursday']=4
week['Friday']=5
week['Saturday']=6
week['Sunday']=7
week

{'Friday': 5,
 'Monday': 1,
 'Saturday': 6,
 'Sunday': 7,
 'Thursday': 4,
 'Tuesday': 2,
 'Wednesday': 3}

## Exercise

### Birthday problem
Have you ever been to a party and met a person with exactly the same birthday as yours? How likely this would happen, we wonder. Let's find it out using a bit of Python...

We consider the opposite situation, i.e, every one in the party has a unique birthday. For simplicity we assume that every day in the year can equally be a birthday, i.e., the distribution of birthdays is uniform throughout the year. 

Suppose we have four people in the room. For the person 1, out of 365 days (for sake of simplicity, let's ignore leap years), she can have any one day as her birthday. Note her chance as $p_1=\frac{365}{365}$. For the 2nd person, out of 365 days, she can now only choose one from 364 days (to avoid choosing the day chosen by person 1). Note her chance as $p_2=\frac{364}{365}$. So on and so forth. 

So the chance of everybody having a unique birthday is 
$$P=\frac{365}{365}\times \frac{364}{365}\times \frac{363}{365}\times \frac{362}{365}=0.98.$$
This means that in the 4-people party, the chance of having at least one birthday clash, is 1-0.98=0.02, i.e., only 2 percent. 

<font color="red">**TO-DO**:</font> Write a snippet of Python code to calculate the probability of at least two sharing the same birthday in a 23-people party. (Tips: use a "for" loop. The answer is about 0.5.)

In [34]:
def p(n):
    pool = []
    for i in range(n):
        pool.append(365-i)
    
    p = 1
    for i in range(n):
        p *= pool[i]/365.0    
    
    return 1-p

In [35]:
p(23)

0.4927027656760144

**Task B**: Define a function bithday_clash_prob(num_ppl) that works out the probability given the number of people. Test with 4 and 23. 

In [38]:
# your code here
def birthday_clash_prob():
    pool = []
    for i in range(n):
        pool.append(365-i)
    
    p = 1
    for i in range(n):
        p *= pool[i]/365.0    
    
    return (1-p)

######  End of Lab 0.

*Congratulations!* This lab is not assessed, but you're welcome to submit the completed notebook through Blackboard. 