In [1]:
%load_ext jupyter_ai_magics

In [27]:
from datascience import *

## Conditional Statements
As you will work with more data, you will see that in many situations, actions and results depends on a specific set of conditions being satisfied. You already use conditional statements on a daily basis! For example, _if_ I complete my assignments on time, I will go to this party. 

A conditional statement is a multi-line statement that allows Python to choose among different alternatives based on the truth value of an expression. You can use conditional statements as part of functions (you'll learn about functions in a bit).

Suggested Reading: https://inferentialthinking.com/chapters/09/1/Conditional_Statements.html

In [9]:
x = 3
if x > 1: # begins with an `if` header, which is a single line followed by an indented body (body executed if condition is True)
    print(str(x)+" is greater than 1")

3 is greater than 1


In [8]:
# add additional conditions
y = 4
if y > 5: # skipped as it is not True
    print(str(y)+" is greater than 5") 
else: # code is executed as the condition above is false
    print(str(y)+" is less than or equal to 5")

4 is less than or equal to 5


In [7]:
# add additional conditions
z = 5
if z > 5: # condition 1, skipped
    print(str(z)+" is greater than 5")
elif z < 5: # condition 2, skipped
    print(str(z)+" is less than 5")
else: # code is executed as all conditions above are false
    print(str(z)+" is equal to 5")

5 is equal to 5


##### chatgpt time! 
Now let us understand what conditional statements are using chatgpt!

In [18]:
%%ai openai-chat:gpt-3.5-turbo
What are conditional statements in Python? Can you explain it like you would to a 5 year old with a simple example?



Sure! 

Conditional statements in Python are like giving the computer instructions based on a certain condition. It's like telling the computer what to do if something is true or false. Let's imagine you have a bowl of apples, and you want to eat an apple only if it's red.

Here's how it works:

```python
apple_color = "red"  # Let's say the apple in your hand is red

if apple_color == "red":  # If the apple is red,
    print("Eat the apple!")  # Say "Eat the apple!"
else:  # Otherwise, if the apple is not red,
    print("Don't eat the apple!")  # Say "Don't eat the apple!"
```

In this code, we have a variable called `apple_color` which is set to "red", indicating the color of the apple. The `if` statement checks whether the apple's color is "red" or not. If the condition is true (meaning the apple is red), it prints "Eat the apple!". If the condition is false (meaning the apple is not red), it prints "Don't eat the apple!".

So, when we run this code, the computer will say:

```
Eat the apple!
```

This is because the apple's color is red, and the condition is true. The computer knows you should eat the apple! Conditional statements help the computer make decisions based on different conditions, just like you do when deciding whether to eat an apple or not based on its color.

## Iteration
Many of the tasks that you will encounter will require repeating a process multiple times. We learned about call expressions (invoking functions), operations, conditional statements, etc. What if you want to perform a task, for example, some calculations, repeatedly? How can you avoid writing verbose code? Enter Iteration!

Python allows you to do that using a `for` statement to loop over the contents of a sequence. A for statement begins with the word `for`, followed by a name we want to give each item in the sequence, followed by the word `in`, and ending with an expression that evaluates to a sequence. The indented body of the for statement is executed once for each item in that sequence.

Suggested Reading: https://inferentialthinking.com/chapters/09/2/Iteration.html

In [15]:
# Say you have a box of candies and you want to eat each candy one by one and show which candy you are eating!
candies = make_array("chocolate", "gummy", "lollipop", "jelly bean")
for candy in candies:
    print("Eating", candy)

Eating chocolate
Eating gummy
Eating lollipop
Eating jelly bean


In [16]:
# without loops, it would be too tedious
print("Eating", candies.item(0)) # Recall .item(index) is used to get the element of an array at that index
print("Eating", candies.item(1))
print("Eating", candies.item(2))
print("Eating", candies.item(3))

Eating chocolate
Eating gummy
Eating lollipop
Eating jelly bean


##### chatgpt time! 
Now let us understand what for loops are using chatgpt!

In [20]:
%%ai openai-chat:gpt-3.5-turbo
What are for loops in Python? Explain it like you would to a 5 year old with a simple example.



Sure!

A for loop in Python is like a magic spell that makes the computer repeat something over and over again. It's like telling the computer to do a task for each item in a list. Let's imagine you have a basket of toys, and you want to play with each toy one by one.

Here's how it works:

```python
toys = ["car", "doll", "ball", "robot"]

for toy in toys:
    print("Playing with", toy)
```

In this code, we have a list of toys: car, doll, ball, and robot. The `for` loop helps us go through each toy in the list. Inside the loop, we print "Playing with" followed by the name of the toy.

When we run this code, the computer will say:

```
Playing with car
Playing with doll
Playing with ball
Playing with robot
```

See, the computer goes through each toy and tells us that it is playing with each one. The for loop is like magic because it automatically plays with each toy without us having to do it one by one. It makes playing with toys (or doing other things) fun and easy!

## Functions 

You learnt a lot of useful techniques using call expressions and built-in functions like `abs`, `round` or `max`! You saw that by importing a module, we are able to use its code in our own notebook (using an `import` statement as shown above) and we don't have to re-invent the wheel and write code that someone else already has!

We will now explore a core feature of the Python programming language: function definition. This will allow you to write your own functions that you can use in your work and that can be potentially used by others!

Suggested Reading: https://inferentialthinking.com/chapters/08/Functions_and_Tables.html

## Why functions? Motivation with an an example (converting currency)

Suppose we planning a trip to Europe. We are looking at prices of hotels, restaurants, travel, etc. in Euros, but we really only have a sense of what prices mean when they are in dollars.

That means that every time we see an amount in Euros, we have to do a conversion. We do a quick web search on "dollars to euros" and "euros to dollars" and we get these results (these are as of 1/27/2025):

* 1 United States Dollar equals 0.95 Euro
* 1 Euro equals 1.05 United States Dollars

We can use Python code to help us convert:


In [7]:
hotel_price_euros = 125.00
hotel_price_dollars = hotel_price_euros * 1.05

restaurant_price_euros = 15.00
restaurant_price_dollars = restaurant_price_euros * 1.05

airport_shuttle_euros = 25.00
airport_shuttle_dollars = 25.00 * 1.05

print("hotel_price_dollars=", hotel_price_dollars)
print("restaurant_price_dollars=", restaurant_price_dollars)
print("airport_shuttle_dollars=", airport_shuttle_dollars)


hotel_price_dollars= 131.25
restaurant_price_dollars= 15.75
airport_shuttle_dollars= 26.25


Notice how we have this repeated operation `* 1.05` on three different lines of code.

The thing is, exchange rates change all the time.  If we needed to update that, we'd have to update it in several places in our notebook, which is not efficient.

In addition, it's not immediately clear where this `* 1.05` is coming from.

## Defining a function 
Python has a way for us to *define our own functions* for any computation that will be repeated over and over.  *Learning to define new functions is one of the most important skills you can take away from this class*.  It's a fundamental computing skill, and in many ways, one of the foundational concepts of *Computer Science* (as compared to *Data Science*.)

Here's an example of defining a function to convert euros to dollars:

In [9]:
def euros_to_dollars(euros):
    return euros * 1.05

A function definition has several important parts:
* It always starts with `def`
* The name of the function (in this case `euros_to_dollars`, comes next)
* Then, there is a list of arguments to the function in parentheses.
  * In this case, there is one argument, `euros`.
  * There can be zero, 1, or more than one argument; when there is more than one, the arguments are separated by commas, e.g. `(height, length, depth)`
* A colon `:` comes after the close paren to indicate that everything indented under this first line is part of the function
* The function will typically have at least one `return` statement; the `return` statement is followed by an expression that computes the *value returned by the function*
  * In this case, `euros * 1.05` is the value returned.

To use the function, we make *function calls* like these:



In [10]:
euros_to_dollars(1)

1.05

In [11]:
euros_to_dollars(25.00)

26.25

In [12]:
hotel_price_euros = 125.00
euros_to_dollars(hotel_price_euros)

131.25

Here's another function definition, one that takes a number a multiplies it by two.  We'll call it `double`

In [14]:

def double(x):
    """ multiply x by two """
    return 2*x

A few notes:
* As noted above, we start any function definition by writing `def`.
* In general, the number and type of input arguments the function takes and the type of the result the function returns refers to its `signature`.
* It's optional, but good practice, to put a string on the first line after the signature that describes what the function does in text for a human.  The text that describes what the function does is referred to as the `docstring`, and can be obtained by typing `help(funciton_name)`, for example `help(double)`. For example:

In [15]:
help(double)

Help on function double in module __main__:

double(x)
    multiply x by two



The code that runs when the function is called is the `body`. 

Note that every line must be _indented_. Any lines that are not indented and left-aligned with the def statement is considered outside the function. The special instruction `return` is part of the function's body and tells Python to make the value of the function call equal to whatever comes right after return.

Now, let's call the function `double`

In [2]:
double(2) # double takes one argument (and the docstring or signature don't specify the type!)

4

In [3]:
double(3)

6

In python, if you multiply a string by an integer, it repeats that string that many times.  So we can use double on a string as well:

In [4]:
double("banana")

'bananabanana'

We can also nest one function call inside another, like this.  The inside is done first, so 
* `double(double(3))` turns into `double(6)` which turns into `12`:


In [5]:
double(double(3))

12

##### chatgpt time! 
Now let us understand what functions are using chatgpt!

In [9]:
%%ai openai-chat:gpt-3.5-turbo
What is a function definition in Python? Explain it like you would to a 5 year old with a very simple example of a function.

A function definition in Python is like a set of instructions that you give to the computer to do something specific. It is like telling your friend how to do a task step by step. 

For example, imagine you want to make a peanut butter and jelly sandwich. You can define a function called "make_sandwich" that tells the computer how to make it. Here is a simple example of how you can define the function:

```python
def make_sandwich():
    print("Take two slices of bread.")
    print("Spread peanut butter on one slice.")
    print("Spread jelly on the other slice.")
    print("Put the slices together to make a sandwich.")
```

Now, whenever you want to make a sandwich, you can just tell the computer to "make_sandwich()" and it will follow the instructions in the function to make the sandwich for you.

Now, let's call the function in chatgpt's response! Note that this function takes no arguments.

In [10]:
# function definition
def make_sandwich():
    print("Take two slices of bread.")
    print("Spread peanut butter on one slice.")
    print("Spread jelly on the other slice.")
    print("Put the slices together to make a sandwich.")

# function call 
make_sandwich() 

Take two slices of bread.
Spread peanut butter on one slice.
Spread jelly on the other slice.
Put the slices together to make a sandwich.


In [11]:
%%ai openai-chat:gpt-3.5-turbo
Now teach me about function definitions with more than 1 argument. Explain it like you would to a 5 year old with a very simple example of a function.

Sometimes, we need a function to do something different based on some extra information we give it. It's like telling your friend what to do but giving them more than one instruction. 

Let's say we want to make a sandwich, but we want to choose the types of fillings for our sandwich. We can define a function called "make_custom_sandwich" that takes two pieces of information: one for the type of bread and another for the type of filling. Here is a simple example:

```python
def make_custom_sandwich(bread, filling):
    print("Take two slices of", bread, "bread.")
    print("Spread", filling, "on one slice.")
    print("Put the slices together to make a sandwich.")
```

Now, when we want to make a sandwich, we can tell the function what kind of bread and filling we want. For example, we can say "make_custom_sandwich('wheat', 'ham')" to make a sandwich with wheat bread and ham filling. Or we can say "make_custom_sandwich('white', 'cheese')" to make a sandwich with white bread and cheese filling.

Now, let's call the function in chatgpt's response! Note that this function takes two arguments.

In [16]:
# function definition
def make_custom_sandwich(bread, filling):
    print("Take two slices of", bread, "bread.")
    print("Spread", filling, "on one slice.")
    print("Put the slices together to make a sandwich.")

In [17]:
# function call - choose your favorite bread and filling
make_custom_sandwich(brown, cream cheese)

SyntaxError: invalid syntax. Perhaps you forgot a comma? (960831606.py, line 2)

In [14]:
%%ai openai-chat:gpt-3.5-turbo
I tried this function with arguments brown and cream cheese but got the following error - 
Cell In[13], line 8
    make_custom_sandwich(brown, cream cheese)
                                ^
SyntaxError: invalid syntax. Perhaps you forgot a comma?

Could you help me correct my mistake?

Certainly! The mistake is that Python treats `brown` and `cream cheese` as two separate variables, rather than as the actual values. To fix this, you need to put quotation marks (`'` or `"`) around the arguments like this:

```python
make_custom_sandwich('brown', 'cream cheese')
```

By adding the quotation marks, Python understands that `brown` and `cream cheese` are the specific values you want to use for the `bread` and `filling` arguments.

In [18]:
make_custom_sandwich('brown', 'cream cheese')

Take two slices of brown bread.
Spread cream cheese on one slice.
Put the slices together to make a sandwich.


### Applying functions to a column/array

Its great to have to learned about arrays and tables, and to be able to read data and make new columns. But you must be wondering how we can manipulate all this data. We will learn about methods that allow you to do that for each of the entries in a column (or an array)!

##### apply 
The `apply` method calls a function on each element of a column, forming a new array of return values. To indicate which function to call, just name it (without quotation marks or parentheses). The name of the column of input values is a string that must still appear within quotation marks.

In [39]:
cones = Table.read_table("data/cones.csv")
cones

Flavor,Color,Price
strawberry,pink,3.55
chocolate,light brown,4.75
chocolate,dark brown,5.25
bubblegum,pink,4.75


In [40]:
cones.apply(double, 'Price')

array([  7.1,   9.5,  10.5,   9.5])

If the prices here are in dollars, and we wanted to convert to Euros, we could write a function:

In [41]:
def dollars_to_euros(dollars):
    """ convert dollars to euros """
    return dollars * 0.95

And then apply that to a column:

In [42]:
cones.apply(dollars_to_euros, 'Price')

array([ 3.3725,  4.5125,  4.9875,  4.5125])

In [None]:
And then add that to our table:

In [43]:
euros_column = cones.apply(dollars_to_euros, 'Price')
cones_with_euros = cones.append_column("Price in Euros", euros_column)
cones_with_euros

Flavor,Color,Price,Price in Euros
strawberry,pink,3.55,3.3725
chocolate,light brown,4.75,4.5125
chocolate,dark brown,5.25,4.9875
bubblegum,pink,4.75,4.5125


# Grouping Values in a table
The `group`method with a single argument counts the number of rows for each category in a column. The result contains one row per unique value in the grouped column.

In [27]:
cones.group('Flavor')

Flavor,count
bubblegum,1
chocolate,2
strawberry,1


The optional second argument of `group` names the function that will be used to aggregate values in other columns for all of those rows. For instance, `sum` will sum up the prices in all rows that match each category. This result also contains one row per unique value in the grouped column, but it has the same number of columns as the original table.

In [35]:
cones.group('Flavor', list)

Flavor,Color list,Price list
bubblegum,['pink'],[4.75]
chocolate,"['light brown', 'dark brown']","[4.75, 5.25]"
strawberry,['pink'],[3.5499999999999998]


In [28]:
cones = cones.drop('Color') # otherwise it will attempt to sum Color (which doesn't make sense)

In [15]:
cones.group('Flavor', sum)

Flavor,Price sum
bubblegum,4.75
chocolate,10.0
strawberry,3.55


In [16]:
cones.group('Flavor', double)

Flavor,Price double
bubblegum,[ 9.5]
chocolate,[ 9.5 10.5]
strawberry,[ 7.1]


In [18]:
more_cones = Table.read_table("data/cones.csv")
more_cones.group(['Flavor', 'Color'])

Flavor,Color,count
bubblegum,pink,1
chocolate,dark brown,1
chocolate,light brown,1
strawberry,pink,1


In [19]:
more_cones.group(['Flavor', 'Color'], sum)

Flavor,Color,Price sum
bubblegum,pink,4.75
chocolate,dark brown,5.25
chocolate,light brown,4.75
strawberry,pink,3.55


##### pivot 
The same data can be displayed differenly using the Table method `pivot`.

The `pivot` method is closely related to the `group` method: it groups together rows that share a combination of values. It differs from group because it organizes the resulting values in a grid. The first argument to pivot is the label of a column that contains the values that will be used to form new columns in the result. The second argument is the label of a column used for the rows. The result gives the count of all rows of the original table that share the combination of column and row values.

Like group, pivot can be used with additional arguments to find characteristics of each paired category. An optional third argument called values indicates a column of values that will replace the counts in each cell of the grid. All of these values will not be displayed, however; the fourth argument collect indicates how to collect them all into one aggregated value to be displayed in the cell.

In [20]:
more_cones.pivot('Flavor', 'Color')

Color,bubblegum,chocolate,strawberry
dark brown,0,1,0
light brown,0,1,0
pink,1,0,1


In [21]:
more_cones.pivot('Flavor', 'Color', values='Price', collect=sum)

Color,bubblegum,chocolate,strawberry
dark brown,0.0,5.25,0.0
light brown,0.0,4.75,0.0
pink,4.75,0.0,3.55


##### chatgpt time!
Now, let us learn more about how to apply functions to arrays/table columns using chatgpt!

In [6]:
%%ai openai-chat:gpt-3.5-turbo --format code
I just learnt about apply that can be used on tables (a function in the datascience module). Explain it like you would to a 5 year old and give me a very simple code example.

In [7]:
from datascience import *

# Create a table with two columns: "Toys" and "Count"
toys_table = Table().with_columns(
    "Toys", ["Car", "Doll", "Train", "Ball"],
    "Count", [3, 5, 2, 4]
)

# Define a function that doubles the count of each toy
def double_count(count):
    return count * 2

# Apply the function to the "Count" column of the table
doubled_count_table = toys_table.apply(double_count, "Count")
doubled_count_table

array([ 6, 10,  4,  8])

In [8]:
%%ai openai-chat:gpt-3.5-turbo --format code
I just learnt about group that can be used on tables (a function in the datascience module). Explain it like you would to a 5 year old and give me a very simple code example.

In [9]:
from datascience import *

# Create a table with two columns: "Toys" and "Color"
toys_table = Table().with_columns(
    "Toys", ["Car", "Doll", "Car", "Doll", "Train"],
    "Color", ["Red", "Blue", "Green", "Red", "Blue"]
)

# Group the toys based on their color
grouped_table = toys_table.group("Color")
grouped_table

Color,count
Blue,2
Green,1
Red,2


In [10]:
%%ai openai-chat:gpt-3.5-turbo --format code
I just learnt about pivot that can be used on tables (a function in the datascience module). Explain it like you would to a 5 year old and give me a very simple code example.

In [11]:
from datascience import *

# Create a table with three columns: "Toys", "Category", and "Color"
toys_table = Table().with_columns(
    "Toys", ["Car", "Doll", "Train", "Car", "Doll"],
    "Category", ["Vehicle", "Doll", "Vehicle", "Vehicle", "Doll"],
    "Color", ["Red", "Blue", "Green", "Red", "Blue"]
)

# Pivot the table to organize toys by category and color
pivoted_table = toys_table.pivot("Category", "Color")
pivoted_table

Color,Doll,Vehicle
Blue,2,0
Green,0,1
Red,0,2


### Joining tables by columns!

Often, data about the same individuals is maintained in more than one table. For example, one university office might have data about each student’s time to completion of degree, while another has data about the student’s tuition and financial aid.

We have been working with the `more_cones` table, let's make another table called `ratings`.

In [41]:
ratings = Table().with_columns(
    'Kind', make_array('strawberry', 'chocolate', 'vanilla', 'bubblegum'),
    'Stars', make_array(2.5, 3.5, 4, 3)
)
ratings

Kind,Stars
strawberry,2.5
chocolate,3.5
vanilla,4.0
bubblegum,3.0


##### join
The method `join` creates a new table in which each cone in the `more_cones` table is augmented with the Stars information in the `ratings` table. For each cone in cones, join finds a row in ratings whose Kind matches the cone’s Flavor. We have to tell join to use those columns for matching. 

In [42]:
rated = more_cones.join('Flavor', ratings, 'Kind')
rated

Flavor,Color,Price,Stars
bubblegum,pink,4.75,3.0
chocolate,light brown,4.75,3.5
chocolate,dark brown,5.25,3.5
strawberry,pink,3.55,2.5


In [44]:
ratings.join('Kind', cones, 'Flavor') # the order in which we list the two tables changes the order that the columns appear in, and can potentially changes the order of the rows, but it doesn’t make any fundamental difference

Kind,Stars,Color,Price
bubblegum,3.0,pink,4.75
chocolate,3.5,light brown,4.75
chocolate,3.5,dark brown,5.25
strawberry,2.5,pink,3.55


##### chatgpt time! 
Now let us understand how to join two tables using chatgpt!

In [12]:
%%ai openai-chat:gpt-3.5-turbo --format code
I just learnt about join that can be used on tables (a function in the datascience module). Explain it like you would to a 5 year old and give me a very simple code example.

In [13]:
from datascience import *

# Create two tables with a common column: "Toy"
toys_table = Table().with_columns(
    "Toy", ["Car", "Doll", "Train"],
    "Count", [3, 5, 2]
)

colors_table = Table().with_columns(
    "Toy", ["Car", "Doll", "Train"],
    "Color", ["Red", "Blue", "Green"]
)

# Join the two tables based on the "Toy" column
joined_table = toys_table.join("Toy", colors_table)
joined_table

Toy,Count,Color
Car,3,Red
Doll,5,Blue
Train,2,Green


In [51]:
%%ai openai-chat:gpt-3.5-turbo
I want to learn more about joining two tables. I just learnt about join method from datascience module in Python. Can you explain it like you would to a 5 year old with very simple examples?

Certainly! Let's understand joining two tables using the `join` method from the datascience module in a simple way:

Imagine you have two sets of toy blocks, one set has shapes and the other set has colors. You want to combine these sets and find blocks that have both a specific shape and color. 

The `join` method helps you do that by matching the blocks based on a common attribute, like a label. Let's say you have the following sets of blocks:

Set 1 (Shapes):

| Block  | Shape    |
|--------|----------|
| Block1 | Circle   |
| Block2 | Triangle |
| Block3 | Square   |

Set 2 (Colors):

| Block  | Color |
|--------|-------|
| Block1 | Red   |
| Block2 | Green |
| Block3 | Blue  |

Now, you can use the `join` method to combine these sets and find blocks that have both a specific shape and color. Suppose you want to find blocks that are both "Circle" shaped and "Red" colored. You can do it like this:

```python
shapes_table = Table().with_columns('Block', make_array('Block1', 'Block2', 'Block3'),
                                    'Shape', make_array('Circle', 'Triangle', 'Square'))

colors_table = Table().with_columns('Block', make_array('Block1', 'Block2', 'Block3'),
                                    'Color', make_array('Red', 'Green', 'Blue'))

combined_table = shapes_table.join('Block', colors_table)
matched_blocks = combined_table.where('Shape', 'Circle').where('Color', 'Red')
```

After doing the join and filtering the desired shape and color, the `matched_blocks` table will look like this:

| Block  | Shape    | Color |
|--------|----------|-------|
| Block1 | Circle   | Red   |

Here, you have joined the two tables based on the common attribute "Block" and found the block that is both "Circle" shaped and "Red" colored.

So, the `join` method helps you bring together information from different tables based on a common attribute, allowing you to find matching elements or combine relevant data. Just like when you combine your shape blocks with your color blocks to find the blocks you need!

Let's try chatgpt's code response!

In [52]:
shapes_table = Table().with_columns('Block', make_array('Block1', 'Block2', 'Block3'),
                                    'Shape', make_array('Circle', 'Triangle', 'Square'))

colors_table = Table().with_columns('Block', make_array('Block1', 'Block2', 'Block3'),
                                    'Color', make_array('Red', 'Green', 'Blue'))

combined_table = shapes_table.join('Block', colors_table)
combined_table

Block,Shape,Color
Block1,Circle,Red
Block2,Triangle,Green
Block3,Square,Blue
