# Introduction to Python - Lecture 06 (22 October 2018)

### Agenda for today:
- String formatting primer
- Structuring larger programs using Functions
- Example

# String Formatting
+ Provides more control over creation of strings from templates.
+ Allows dynamic insertion of values while maintaining the given structure.
+ Normally used in conjunction with print() or writing formatted strings to files
+ Works with both positional and keyword arguments

### Positional arguments

```python
# Example
print('{} - {}'.format('arg1', 'arg2'))
print('{0} - {1}'.format('arg1', 'arg2'))
print('{1} - {0}'.format('arg1', 'arg2'))
print('{0} - {1} - {0}'.format('arg1', 'arg2'))
print('----{1}-----{2}'.format('arg1', 'arg2', 'arg3'))
```

### Keyword arguments

```python
print('{name} - {age}'.format(name='Dave', age=24))
print('{name} - {age}'.format(age=24, name='Dave'))
```

### Accessing by key/index
```python
list_of_numbers = [1, 2]
print('{0[0]} {0[1]}'.format(list_of_numbers))
a = [5, 2]
b = [0, 3]
print('(x1: {vec1[0]}|x2 : {vec2[0]}) (y1: {vec1[1]}|y2: {vec2[1]})'.format(vec1=a, vec2=b))
person_dictionary = {'name': 'Dave', 'age': 24}
print('{0[name]} - {0[age]}'.format(person_dictionary))
```

### Formatting
+ limiting decimal places
```python
print('{:.2f}'.format(1/3))
print('{0:.2f} {0:.5f} {1:.3f} {1:.5f}'.format(1/3, 1/6))
```
+ exponent notation
```python
print('{:e}'.format(1/1000))
print('{:.2e}'.format(1/1000))
```
+ comma seperating by thousands
```python
print('{:,}'.format(1234567890))
```
+ aligning text
    + left align
```python
print('|{:<30}|'.format('first'))
print('|{:<30}|'.format('second'))
```
    + center align
```python
print('|{:^30}|'.format('first'))
print('|{:^30}|'.format('second'))
```
    + right align
```python
print('|{:>30}|'.format('first'))
print('|{:>30}|'.format('second'))
```

    + numeric alignment
```python
a = '10000'
b = '100'
longest = len(a)
print(longest)
print('{:>{longest}}\n{:>{longest}}'.format(a, b, longest=longest))
```

### Unpacking (\*&lt;sequence_type&gt;) - not limited to string formatting, more on this at the end
```python
print('{0}, {1}, {2}'.format(*'ABC'))
print('{0}, {1}, {2}'.format(*['x', 'y', 'z']))
print('{0}, {1}, {2}'.format(*(1, 2, 3)))
```

# Functions

- As you start writing larger programs, you will benefit from structuring and organizing code into readable, maintainable chunks. Functions are a primary way of organizing your programs (in addition to modules and classes).
- We'll look at the problem of writing function definitions, discuss some coding practices and demonstrate key ideas using simple examples.

## Function Calls
<font color='#0000AA'>function_name</font>(<font color='#00AA00'>arguments</font>)



We have already encountered a number of functions such as:

```python
type(32)
'banana'.find('an')
len([1, 2, 3])
print('I am an argument')
```

## Built In Functions

Python has build in functions which are always available

+ These are functions which solve common problems

```python
number_list = [5, 2, 1, 4, 0, 3]
max(number_list)
min(number_list)
len(number_list)
range(start, end, step)
```

+ There are many more of these than are listed above

## User Defined Functions

```python
# general syntax
def function_name(arguments):    # arguments <=> parameters <=> inputs
    statement_1
    ...               # Function body
    statement_n
    return <value>    # optional; default=None
```
+ <font color='#AA00AA'>**def**</font> is a reserved word, it lets python know that a function is being defined.
You can create new functions using a function definition.
+ **<font color='#0000AA'>function_name</font>** can be any non reserved word
    + Some common naming guidelines:
        - use self explaining names
        - avoid overly long names
+ <font color='#00AA00'>arguments</font> are variables which are required by the function


### Example
```python
def say_hello(name):
    print("Hello",name)
    
say_hello("Dave")
```

## return

**return** is a keyword which indicates that a value that is returned back when the function is invoked
```python
def add(a, b):
    sum_ = a + b
    return sum_

result = add(2, 4) 
print(result)
```

The return keyword allows the results of functions to be used later in the program

If return is not specified the function will return None
```python
def no_return():
    print('I\'m inside the function')  # Notice the escaping mechanism. You may use double quotes for string.
print(no_return())
```

## Order Matters

+ Functions must be defined before they are used
+ The following code will produce a ‘NameError’
+ When add is called in the first statement the function does not yet exist

```python
add2(1, 2)

def add2(a, b):
    sum = a+b
	return sum
```

## Arguments by keyword vs position

```python
def print_full_name(name='John', surname='Doe'):
    print(name, sur)
	
print_full_name()
print_full_name('David', 'Smith')
print_full_name(surname='Jenkins')
print_full_name(surname='Mills', name='Adam')
```
+ Keyword works well for documentation
+ Set default values
+ The order of the arguments does not matter if keywords are used
+ Do not need all arguments as default values are used. Useful when the functions require a lot of inputs and defaults work well for most of them.

### Mixed positional and kw arguments

```python
# Ex. Monte carlo simulation to compute the value of pi
# https://en.wikipedia.org/wiki/Monte_Carlo_method
#
#         area of quarter circle    (1/4)*pi*r^2      pi*r^2
# ratio = ----------------------- = -------------- = --------
#              area of square            r^2          4r^2

#            pi
# ratio =   ---
#            4
# 4 * ratio = pi

```

In [None]:
import random

def get_random_point():
    x = random.random()
    y = random.random()
    return x, y

def in_circle(x, y, r=1):
    return x**2 + y**2 <= r

iterations = 100000
count_in_circle = 0
for idx in range(iterations):
    x, y = get_random_point()
    if in_circle(x, y):
        count_in_circle += 1
print('After {} iterations, value of pi is: {}'.format(iterations, 4* count_in_circle/iterations))



## Scope

The scope of a variable refers to region / location / zone in the code where the variable is valid and can be used.

In this example the variable is defined inside a function. It will not be accessable from outside of the function as the variable is out of scope.

```python
def print_something():
    internal_var = 5
    print(internal_var)
print_something()
print(internal_var)
```
It is possible for a variable that is declared outside of a function, to be used inside function body? Yes, but only if the variable is defined before the function is called.

```python
a = 10
def print_something():
    print('inside function', id(a), a)
print_something()
print('outside function', id(a), a)
```
If a new value is assigned to an external variable inside the function body, a **new local variable** will be created with the same name that temporarily masks external variable while we're within the function scope.

```python
a = 10
def print_something():
    a = 5
    print('inside function', id(a), a)
print_something()
print('outside function', id(a), a)
```
Any attempt to modify the value of a variable declared outside of the function will result in an error as the values are read only.

```python
a = 10
def print_something():
    a += 1
    print('in function', id(a), a)
print_something()
print('out function', id(a), a)
```

### global keyword
- this can be used to refer to external variables and modify them within a function's scope

```python
a = 10
def print_something():
    global a    # declare that we're going to use the global variable a defined outside the function
    a += 1
    print('inside function', id(a), a)
print_something()
print('outside function', id(a), a)
```


## Why Use Functions


+ Code organization: easier to understand and read later
+ Code reusability: supply different inputs to function
+ Easy to update/modify

## Some Design Considerations

+ Name functions appropriately
+ Keep them short and easy to understand
+ Should have a single purpose
    - One function doing 10 things is bad
    - Rather have 10 smaller functions

## Unpacking
\*&lt;sequential_type&gt;

+ Unpacking is the process of converting a sequence into individual values.
+ We have seen examples of this when we covered string formatting at the start of the lecture.
+ Another example was seen when a dictionary item is assigned to two variables in a for loop.

```python
d = {'a': 1, 'b': 2}
for key, value in d.items():
    print(key, value)

# tuple unpacking
key, value = ('a', 1)
```

+ It is possible to force python to unpack a sequential type into individual values.
+ To do this we add a **<font color='blue'>\*</font>** before the sequential type

```python
# Raises TypeError exception
# Check documentation: TypeError?
def print_sum(a, b, c, d):
    print('a: {}, b: {}, c: {}, d: {}'.format(a, b, c, d) )
    print('Sum: {}'.format(a + b + c + d))
li = [1, 2, 3, 4]
print_sum(li)

def print_sum(a, b, c, d):
    print('a: {}, b: {}, c: {}, d: {}'.format(a, b, c, d) )
    print('Sum: {}'.format(a + b + c + d))
li = [1, 2, 3, 4]
print_sum(*li)      # unpacks list into 4 values, that are assigned to individual arguments of function
```


## \*args, \*\*kwargs

Sometimes it is not possible to know in advance how many arguments are going to be passed to a function.
This is solved by using \*args for ordered arguments and \*\*kwargs for keyword arguments.

+ many of the set comparison methods allow you to pass a variable number of sets to the method
+ set.intesection(*[sets])
```python
set_a = set(['a', 'b', 'c', 'd'])
set_b = set(['a', 'z'])
set_c = set(['a', 'b', 'c'])
set_d = set(['a', 'c'])
print(set_a.intersection(set_b))
print(set_a.intersection(set_c, set_d))
print(set_a.intersection(set_b, set_c, set_d))
```
+ \*args
```python
def print_list(v1, v2, *args):
    print(v1, v2, args)
    for arg in args:
        print(arg)
print_list(*[1, 2, 3, 4, 5])
```
+ Lets create a function which adds n numbers together
    + add \*args to the argument list
    + \*args converts all additional arguments into a tuple

```python
def sum_numbers(a, b, *rest):
    sum_ = a+b
    for val in rest:
        print('sum: {} + value: {}'.format(sum_, val))
        sum_ += val
    return sum_
```

+ \*\*kwargs is identical except that it uses keyword arguments and stores them in a dictionary instead of a tuple.

```python
def print_student_grades(**kwargs):
    for key, value in kwargs.items():
        print('{} got {}'.format(key, value))
print_student_grades(amy=5, mark=3, john=4, jackie=4)

# function(positional, kw_args, *args, **kwargs)
```

#### <font color='blue'>Note</font>: while 'args' and 'kwargs' are standard name to receive variable arguments / keyword arguments, you can use any domain-specific, meaningful name

## Example - Similarity Matching

While working with data problems, you will frequently encounter **<font color='blue'>many-to-many relationships</font>** between various domain entities involved. A couple of examples:
- genes <--\> diseases
- users <--\> products

One way to represent this relationship is using nested dictionaries (of course for large data, you need full-scale databases rather than in-memory data structures).
For ex., for different users and items in our online business, pairwise user-item ratings may be represented as
```python
{ 'user1': {'item1': 1, 'item2': 3, …},
  'user2': {'item1': 2, 'item3': 4, …},
  'user3': {'item1': 4, 'item4': 5, …}, 
  …
}
```
So, in this dictionary, user1 gives a rating of 1 to item 1, a rating of 3 to item2 etc. Note that not all users will have ratings for all items in the catalogue (**<font color='blue'>sparse representation</font>** -- there are other ways for sparse representation too).

In this problem, we’re given one such user-item-rating dictionary as above. The top-level keys of this dictionary are individual users. The value for each user is another dictionary (item-ratings) with individual items as keys and their ratings as values. **Our goal is to do the following:**
- **<font color='#AA00AA'>Using a given pairwise similarity / dis-similarity metric, compute most similar user-pair and item-pair.</font>**

These sorts of operations are commonly used for making item recommendations to users using a technique called **<font color='blue'>collaborative filtering</font>**. The idea is that if two users rate the same items (for ex. movies) similarly, then they must have similar tastes as exhibited by their ratings.

For now, we'll compute the dis-similarity as: **Average of Absolute Difference of ratings for common keys**. 
For ex. with inputs {‘item1’: 3, ‘item2’: 4, ‘item3’: 5} and {‘item1’: 4, ‘item3’: 2, ‘item4’: 1}, the following assessments and computations are required:
- Common keys: ‘item1’ and ‘item3’
- Difference in ratings for common keys: 
    - Key ‘item1’: (3 – 4) = -1
    - Key ‘item3’: (5 – 2) = 3
- Absolute difference in ratings for common keys:
    - Key ‘item1’: abs(-1.0) = 1  
    - Key ‘item3’: abs(3) = 3
- Average of absolute difference in ratings for common keys:
    - final (dis)similarity = (1 + 3)/2 = 2.0

The (dis-)similarity metric is an arbitrary heuristic, and may not work well in practice. But one can envision more complex pairwise scoring functions that work well with real datasets. And they exist. 

**So how do we start?**

```python
user_item_rating = {
    'user1': {'item1': 2.5, 'item2': 3.5, 'item3': 3.0,
              'item4': 3.5, 'item5': 2.5, 'item6': 3.0},
    'user2': {'item1': 3.0, 'item2': 3.5, 'item3': 1.5,
              'item4': 5.0, 'item5': 3.5, 'item6': 3.0},
    'user3': {'item1': 2.5, 'item2': 3.0, 'item4': 3.5,
              'item6': 4.0},
    'user4': {'item2': 3.5, 'item3': 3.0, 'item4': 4.0,
              'item5': 2.5, 'item6': 4.5},
    'user5': {'item1': 3.0, 'item2': 4.0, 'item3': 2.0,
              'item4': 3.0, 'item5': 2.0, 'item6': 3.0},
    'user6': {'item1': 3.0, 'item2': 4.0, 'item4': 5.0,
              'item5': 3.5, 'item6': 3.0},
    'user7': {'item2': 4.5, 'item4': 4.0, 'item5': 1.0}
}
```

