# DS2500 Lesson5

Jan 27, 2023

Content:
- numpy & arrays (finishing up)
- designing & writing beautiful software

Admin:




### Computing stats on an array
- `.sum()`
- `.min()`
- `.max()`
- `.mean()`
- `.std()`
    - standard deviation
- `.var()`
    - variance
- `.argmin()`
    - index of item which is smallest
- `.argmax()`
    - index of item which is largest


In [1]:
import numpy as np
x = np.array([4, 3, 5, 4])
x


array([4, 3, 5, 4])

In [2]:
x.min()


3

In [3]:
# get index of smallest item
# (smallest item, 3, is at index 1)
x.argmin()


1

In [4]:
import numpy as np
y = np.arange(100, 112).reshape((3, 4))
y


array([[100, 101, 102, 103],
       [104, 105, 106, 107],
       [108, 109, 110, 111]])

In [5]:
y.sum(), y.min(), y.max(), y.mean(), y.std(), y.var()


(1266, 100, 111, 105.5, 3.452052529534663, 11.916666666666666)

In [6]:
# axis: which of the shape parameters should I operate on? 
# shape = (shape0, shape1)

# axis=0 averages across different rows to give the column average
y.mean(axis=0)

array([104., 105., 106., 107.])

In [7]:
# axis=1 averages across different columns to give the row average
y.mean(axis=1)


array([101.5, 105.5, 109.5])

In [8]:
# axis is an accepted keyword of all methods listed above
y.min(axis=1)


array([100, 104, 108])

In [9]:
y.min(axis=0)


array([100, 101, 102, 103])

## Why did we do this again?

<img src="https://imgur.com/orZWHly.png" width=300 />

Data isn't just "information", but a story.  When we are able to manipulate these python objects gracefully we'll observe across a population in a way not possible when going sample-by-sample (or feature-by-feature) through a dataset.


In [10]:
import seaborn as sns

# data source: https://github.com/mwaskom/seaborn-data/blob/master/penguins.csv
df_penguin = sns.load_dataset('penguins')
df_penguin.head()


Unnamed: 0,species,island,bill_length_mm,bill_depth_mm,flipper_length_mm,body_mass_g,sex
0,Adelie,Torgersen,39.1,18.7,181.0,3750.0,Male
1,Adelie,Torgersen,39.5,17.4,186.0,3800.0,Female
2,Adelie,Torgersen,40.3,18.0,195.0,3250.0,Female
3,Adelie,Torgersen,,,,,
4,Adelie,Torgersen,36.7,19.3,193.0,3450.0,Female


## Array Operations:
- array and a scalar: 
    - apply operation to every element of array
- array and array: 
    - apply operation to corresponding elements of arrays (requires shape or [special](https://numpy.org/doc/stable/user/basics.broadcasting.html) structure)



In [11]:
y1 = np.arange(12).reshape((3, 4))
y1


array([[ 0,  1,  2,  3],
       [ 4,  5,  6,  7],
       [ 8,  9, 10, 11]])

In [12]:
y1 + 3


array([[ 3,  4,  5,  6],
       [ 7,  8,  9, 10],
       [11, 12, 13, 14]])

In [13]:
y1 * 10


array([[  0,  10,  20,  30],
       [ 40,  50,  60,  70],
       [ 80,  90, 100, 110]])

In [14]:
y1 ** 2


array([[  0,   1,   4,   9],
       [ 16,  25,  36,  49],
       [ 64,  81, 100, 121]])

In [15]:
# array and array arithmetic
y2 = np.arange(100, 112).reshape((3, 4))
y1, y2


(array([[ 0,  1,  2,  3],
        [ 4,  5,  6,  7],
        [ 8,  9, 10, 11]]),
 array([[100, 101, 102, 103],
        [104, 105, 106, 107],
        [108, 109, 110, 111]]))

In [16]:
# array and array arithmetic applies operation to corresponding items in arrays
y1 + y2


array([[100, 102, 104, 106],
       [108, 110, 112, 114],
       [116, 118, 120, 122]])

In [17]:
y1 * y2


array([[   0,  101,  204,  309],
       [ 416,  525,  636,  749],
       [ 864,  981, 1100, 1221]])

In [18]:
y1


array([[ 0,  1,  2,  3],
       [ 4,  5,  6,  7],
       [ 8,  9, 10, 11]])

In [19]:
# (++) adding a constant row (x) to all rows of a matrix (y1)
# more details here: https://numpy.org/doc/stable/user/basics.broadcasting.html
x = np.array([1000, 2000, 3000, 4000])
x, y1


(array([1000, 2000, 3000, 4000]),
 array([[ 0,  1,  2,  3],
        [ 4,  5,  6,  7],
        [ 8,  9, 10, 11]]))

In [20]:
y1 + x


array([[1000, 2001, 3002, 4003],
       [1004, 2005, 3006, 4007],
       [1008, 2009, 3010, 4011]])

## In Class Activity A
A horrible autograder assigns students arbitrary grades for the quizzes they take.
Let's assume that:
- `n_student=5` students take each quiz
- `n_quiz=3` quizzes are taken

1. Record all student grades in a two dimensional `np.array`
    - **together:** should a student be a row or a column?
    - make up all the grades, use simple values to allow for easy testing (e.g. 0, 1, 2, 3, ...)
1. Use the functions we just learned to answer:
    1. what's the mean score among all the students and quizzes?
    1. what's the mean score of the student who was had the highest average?
    1. which quiz had the lowest mean score?
        - we don't want the score, we want to know which quiz it was, 0, 1 or 2


In [21]:
import numpy as np
n_student = 5 # students took quiz 
n_quiuz = 3 # quizzes taken - samples

# 1. record all student grades in 2d np.array

x = np.arange(0,15).reshape((5, 3))
x

array([[ 0,  1,  2],
       [ 3,  4,  5],
       [ 6,  7,  8],
       [ 9, 10, 11],
       [12, 13, 14]])

In [22]:
# 2. A - what is the mean
x.mean()

7.0

In [23]:
# 2. B - what is the mean score of student who had thehighest average
# find the average of each student's score and then the highest average
x1 = x.mean(axis=1)
x1.max()

13.0

In [24]:
# 2. C - which quiz has the lowest mean score
x1 = x.mean(axis=1)
x1.argmin()

0

# What is "beautiful" software?

- simple
    - bugs hide in complexity
        - not sure why your code doesn't work?  Try simplifying how it works
- unambiguous
    - every job is done is one place in the code
- well documented
    - DS: its not enough to be correct, you must also be compelling
    - it'll help others understand your code
    - it'll help you write your code quicker


# ... where are my car keys?

- I should clean up the whole house (i.e. make it "simple, unambiguous & well-documented").
    - I'll want to find something else and organizing it all will help me for next time.


# ... Why doesn't my program work?

- I should make my whole program beautiful (i.e. make it "simple, unambiguous & well-documented").
    - I'll want to verify / extend its functionality so organizing it all will help me for next time.


# Interface vs Implementation

# Interface

<img src="https://www.statefarm.com/content/dam/sf-library/en-us/secure/legacy/simple-insights/quick-steps-to-take-if-your-gas-pedal-sticks.jpg" width=300>


An **interface** is the set of inputs and outputs of a part of program:
- a function docstring describes an interface
- a set of test cases also describes an interface
    - ideally, a set of test cases completely describes the interface

# Implementation

<img src="https://cdn.carbuzz.com/gallery-images/840x560/591000/100/591155.jpg" width=300>

An **implementation** is the code which actually accomplishes the work of converting input to desired output


In [25]:
def evaluate_rps(user0_rps, user1_rps):
    """ determining winner in a round of rps
    
    Args:
        user0_rps (str): 'rock', 'paper', 'scissors'
        user1_rps (str): 'rock', 'paper', 'scissors'
        
    Returns:
        idx_win (int): 0 if user0 wins, 1 if user1 wins
            -1 if a tie
    """    
    # validate proper inputs given
    rps_tuple = 'rock', 'paper', 'scissors'
    assert user0_rps in rps_tuple, 'invalid user0 input'    
    assert user1_rps in rps_tuple, 'invalid user1 input'
    
    if user0_rps == user1_rps:
        # tie
        return -1
    
    # test if user0 won
    if (user0_rps, user1_rps) in [('rock', 'scissors'),
                                  ('scissors', 'paper'),
                                  ('paper', 'rock')]:
        return 0
    
    # users gave different inputs, and user0 didn't win
    return 1


# How to write beautiful software

1. Thoughtfully design your program
    - write notes, flowcharts, scribble on a whiteboard, pseudocode ...
    - common mistake: not spending enough time daydreaming about the whole anatomy of a program
        - its easy to change the design now ... later on we'll have invested time in building it some particular way
        
1. Define your interface: 
    - A. Write your function headers & docstrings
    - B. Write your test cases
1. Complete your implementation
    - A. add comments to your functions
    - B. implement functions (per comments) until it passes all test cases


# Step 1. Thoughtfully design your program
Write notes, flowcharts, scribble on a whiteboard, pseudocode ...
<img src="https://i.ibb.co/qDXFDnw/rps-plan.png" width=700>


# Step 2A: Write your function headers & docstrings
    
```python
def evaluate_rps(user0_rps, user1_rps):
    """ determining winner in a round of rps
    
    Args:
        user0_rps (str): 'rock', 'paper', 'scissors'
        user1_rps (str): 'rock', 'paper', 'scissors'
        
    Returns:
        idx_win (int): 0 if user0 wins, 1 if user1 wins
            -1 if a tie
    """   
```


# Step 2B: Write your test cases
    
```python
# paper beats rock
assert evaluate_rps('paper', 'rock') == 0
assert evaluate_rps('rock', 'paper') == 1

# scissors beats paper
assert evaluate_rps('scissors', 'paper') == 0
assert evaluate_rps('paper', 'scissors') == 1

# rock beats scissors
assert evaluate_rps('rock', 'scissors') == 0
assert evaluate_rps('scissors', 'rock') == 1

# ties
assert evaluate_rps('scissors', 'scissors') == -1
assert evaluate_rps('rock', 'rock') == -1
assert evaluate_rps('paper', 'paper') == -1
```


# Step 3A: add comments to your functions
    
```python
def evaluate_rps(user0_rps, user1_rps):
    """ determining winner in a round of rps
    
    Args:
        user0_rps (str): 'rock', 'paper', 'scissors'
        user1_rps (str): 'rock', 'paper', 'scissors'
        
    Returns:
        idx_win (int): 0 if user0 wins, 1 if user1 wins
            -1 if a tie
    """    
    # validate proper inputs given
    
    # test if user's tied (return -1 if so)
    
    # test if user0 won (return 0 if so)
    
    # otherwise, user1 must have won (return 1)
```


# Step 3B: implement (per comments) until passing all test cases
    
```python
def evaluate_rps(user0_rps, user1_rps):
    """ determining winner in a round of rps
    
    Args:
        user0_rps (str): 'rock', 'paper', 'scissors'
        user1_rps (str): 'rock', 'paper', 'scissors'
        
    Returns:
        idx_win (int): 0 if user0 wins, 1 if user1 wins
            -1 if a tie
    """    
    # validate proper inputs given
    rps_tuple = 'rock', 'paper', 'scissors'
    assert user0_rps in rps_tuple, 'invalid user0 input'    
    assert user1_rps in rps_tuple, 'invalid user1 input'
    
    if user0_rps == user1_rps:
        # tie: users gave same input
        return -1
    
    # user0_win_list is a list of all the ways user0 can win
    user0_win_list = [('rock', 'scissors'),
                      ('scissors', 'paper'),
                      ('paper', 'rock')]
    if (user0_rps, user1_rps) in user0_win_list:
        # user 0 won
        return 0
    
    # user1 won (not a tie and user0 didn't win)
    return 1
```


# In Class Activity B

Complete steps 1 and 2 of the process above to build [tic-tac-toe](https://en.wikipedia.org/wiki/Tic-tac-toe), playable via user `input()`s in jupyter.

- know there isn't one "right" answer here, though some designs are certainly more complex than others
- as a guide, it may help to know my function includes 3 functions but you're welcome to build differently
- test cases for user interfaces are tough to write, feel free to describe with a bit of text or show example output (as lab0: part 1 does)


# Step 1. Thoughtfully design your program

User interface:

play:
- create a array; 3 x 3
- for loop --- until row, column, or diagonal is filled
- 2 users
    - one will be X and other will be O - or any other 2 symbols
    - ask user 1 to start first 
        - string & where to place
            - use replace
    - ask user 2 to follow after
        - string & where to replace
            - use replace
- re-query if placed in the filled spot
    - show result of the board
    

evaluate_winner:
- evaluate the board
    - if row, column, or diagonal is filled
- win, lose, or tie
    - if user1 win: announce the winner
    - if user2 win: announce the winner
    - if tied, announce the result = no winner




In [26]:
# Step 2A: Write your function headers & docstrings


def play(board, user1, user2):
    """ show the result of each player's turn

    Args:
        user1 (str): 'X'
        user2 (str): 'O'

    Returns:
        board (array 3x3) - a new board showing each player's play
    """   
    
    # create an array 3x3 with "-" as placeholders
    
    # use a for loop 
        # ask user1 to input "X" on a specific spot in the array
        # re-query if the spot is filled
        
        # ask user2 to input "O" on a specific spot in the array
        # re-query if the spot is filled
        
    # return a new board


In [27]:
def evaluate_winner(play):
    """ show the result of each player's turn

    Args:
        board (array) with each player's moves

    Returns:
        points:
            - if user1 win: print("user1 wins") str
            - if user2 win: print("user2 wins") str
            - if users tied: print("Tied") str
    """   
    
    # conditional statement:
    # if row, column, diagonal on board = X, user1 wins 
    
    # if row, column, diagonal on board = O, user2 wins 
    
    # if row, column, diagonal on board != X and != O, it is a tie
    
    
    # return the winner
    

In [28]:
#Step 2B: Write your test cases¶

# user 1 win
assert winner == "User1 wins!"
# user 2 win
assert winner == "User2 wins!"
# users 1 & 2 tie
assert winner == "Tied!"

NameError: name 'winner' is not defined