# Lab 1: Expressions

Welcome to the first COMPSCI/STATS 190F: Foundations of Data Science lab!  Each week you will complete a lab assignment like this one.  You can't learn technical subjects without hands-on practice, so labs are an important part of the course.

Come to your lab section and work on the lab as directed. Before you leave, you need to **submit the lab and have a staff member check you off to confirm** that you came to lab section and attempted to make progress while you were there. You do not need to finish the lab or answer every question correctly to receive full credit. However, you do need to try; that's why we check you off. 

Collaborating on labs is more than okay -- it's encouraged! You should rarely be stuck for more than a few minutes on questions in labs, so ask a neighbor or an instructor for help. (Explaining things is beneficial, too -- the best way to solidify your knowledge of a subject is to explain it.) Please don't just share answers, though. You should leave the lab feeling confident that you understand the answers, as you will solve similar problems in the homework assignments, which are individual work.

You can read more about [course policies](https://umass-data-science.github.io/190fwebsite/policies/) on the [course website](https://umass-data-science.github.io/190fwebsite/).

#### Today's lab

In today's lab, you'll learn how to:

1. Navigate Jupyter notebooks (like this one);
2. Write and evaluate some basic *expressions* in Python, the computer language of the course;
3. Call *functions* to use code other people have written; and
4. Break down Python code into smaller parts to understand it.

This lab covers parts of [Chapter 3](https://umass-data-science.github.io/190fwebsite/textbook/03/programming-in-python/) of the online textbook. You should read the book, but not right now. Instead, let's get started!

# 1. Jupyter notebooks
This webpage is called a Jupyter notebook. A notebook is a place to write programs and view their results.

## 1.1. Text cells
In a notebook, each rectangle or block containing text or code is called a *cell*.

Text cells (like this one) can be opened and edited by double-clicking on them. They're written in a simple format called [Markdown](http://daringfireball.net/projects/markdown/syntax) to add formatting and section headings.  You don't need to learn Markdown, but you might want to.

After you open text cell, click the "run cell" button at the top that looks like ▶| to confirm any changes or to just re-display the formatted cell. (Try not to delete the instructions of the lab.)

## 1.2. Code cells
Cells can also contain code in the Python 3 language. Running a code cell will execute all of the code it contains.

To run the code in a code cell, first click on that cell to activate it.  It'll be highlighted with a little green or blue rectangle.  Next, either press ▶| or hold down the `shift` key and press `return` or `enter`.

Try running the next cell:

In [None]:
print("Hello, World!")

The fundamental building block of Python code is an expression. Cells can contain multiple lines with multiple expressions. When you run a cell, the lines of code are executed in the order in which they appear. Every `print` expression prints a line. Run the next cell and notice the order of the output.

In [None]:
print("First this line is printed,")
print("and then this one.")

## 1.3. Coding Errors
Python is a language, and like natural human languages, it has rules.  It differs from natural language in two important ways:
1. The rules are *simple*.  You can learn most of them in a few weeks and gain reasonable proficiency with the language in a semester.
2. The rules are *rigid*.  If you're proficient in a natural language, you can understand a non-proficient speaker, glossing over small mistakes.  A computer running Python code is not smart enough to do that.

Whenever you write code, you'll make mistakes.  When you run a code cell that has errors, Python will sometimes produce error messages to tell you what you did wrong.

Errors are okay; even experienced programmers make many errors.  When you make an error, you just have to find the source of the problem, fix it, and move on.

We have made an error in the next cell.  Run it and see what happens.

In [None]:
print("This line is missing something."

You should see something like this (minus our annotations):

<img src="error.jpg"/>

The last line of the error output attempts to tell you what went wrong.  The *syntax* of a language is its structure, and this `SyntaxError` tells you that you that Python can not interpret your expression because it does not follow Python's syntax.  "`EOF`" means "end of file," so the message is saying Python expected you to write something more (in this case, a right parenthesis) before finishing the cell.

There's a lot of terminology in programming languages, but you don't need to know it all in order to start programming effectively. If you see a cryptic message like this, you can often get by without deciphering it.  (Of course, if you're frustrated, ask a neighbor or a TA for help.)

Try to fix the code above so that you can run the cell and see the intended message instead of an error.

## 1.4. Submitting your work
All assignments in the course will be distributed as notebooks like this one, and you will submit your work from the notebook. We will use a system called OK that checks your work and helps you submit. You need to be connected to the Internet to connect to the OK server. At the top of each assignment, you'll see a cell like the one below that prompts you to identify yourself. Run it and follow the instructions. Please use your @umass.edu address when logging in. 

After running this cell, the last line of output should indiciate that you are "Successfully logged in." Ask a staff member if you have any problems.

In [None]:
# Don't change this cell; just run it. 
# The result will give you directions about how to log in to the submission system, called OK.
# Once you're logged in, you can run this cell again, but it won't ask you who you are because
# it remembers you. However, you will need to log in once per assignment.
! pip install -U okpy

from client.api.notebook import Notebook
ok = Notebook('lab01.ok')
_ = ok.auth(inline=True)


When you finish an assignment, you need to submit it by running the submit command below. It's OK to submit multiple times, OK will only try to grade your final submission for each assignment. Don't forget to submit your lab assignment at the end of section, even if you haven't finished everything. 

After running this cell, the second to last line of output should indiciate "Submission successful." Remember that you need to be connected to the Internet to submit to the OK server. Ask a staff member if you have any problems.

In [None]:
_ = ok.submit()

# 2. Numbers

Quantitative information arises everywhere in data science. In addition to representing commands to print out lines, expressions can represent numbers and methods of combining numbers. The expression `3.2500` evaluates to the number 3.25. (Run the cell and see.)

In [None]:
3.2500

Notice that we didn't have to `print`. When you run a notebook cell, if the last line has a value, then Jupyter helpfully prints out that value for you. However, it won't print out prior lines automatically.

In [None]:
print(2)
3
4

Above, you should see that 4 is the value of the last expression, 2 is printed, but 3 is lost forever because it was neither printed nor last.

You don't want to print everything all the time anyway.  But if you feel sorry for 3, change the cell above to print it.

## 2.1. Arithmetic
The line in the next cell subtracts two numbers.  Its value is what you'd expect.  Run it.

In [None]:
3.25 - 1.5

Many basic arithmetic operations are built in to Python.  The textbook section on [Expressions](https://umass-data-science.github.io/190fwebsite/textbook/03/1/expressions/) describes all the arithmetic operators used in the course.  The common operator that differs from typical math notation is `**`, which raises one number to the power of the other. So, `2**3` stands for $2^3$ and evaluates to 8. 

The order of operations is what you learned in elementary school, and Python also has parentheses.  For example, compare the outputs of the cells below. Using parentheses, we get a different answer. It's always a good idea to use parentheses to make complicated expressions as easy as possible for someome else to read.

In [None]:
2+6*5-6*3**2*2**3/4*7

In [None]:
2+(6*5-(6*3))**2*((2**3)/4*7)

In standard math notation, the first expression is

$$2 + 6 \times 5 - 6 \times 3^2 \times \frac{2^3}{4} \times 7,$$

while the second expression is

$$2 + (6 \times 5 - (6 \times 3))^2 \times (\frac{(2^3)}{4} \times 7).$$

<br><br>

**Question 2.1** Write a Python expression in this next cell that's equal to $1+5 \times (3 \frac{10}{11}) - 50 \frac{1}{3} + 2^{.5 \times 22} - \frac{7}{33}$.  That's one plus five times three and ten elevenths, minus fifty and a third, plus two to the power of half 22, minus 7 33rds.  By "$3 \frac{10}{11}$" we mean $3+\frac{10}{11}$, not $3 \times \frac{10}{11}$.

Replace the ellipses (`...`) with your expression.  

*Hint:* The correct output should start with a familiar number.

In [None]:
1+5*(3+10/11.0) -(50+1/3.0) + (2**(0.5*22)) -(7/33.0)

# 3. Names
In natural language, we have terminology that lets us quickly reference very complicated concepts.  We don't say, "That's a large mammal with brown fur and sharp teeth!"  Instead, we just say, "Bear!"

Similarly, an effective strategy for writing code is to define names for data as we compute it, like a lawyer would define terms for complex ideas at the start of a legal document to simplify the rest of the writing.

In Python, we do this with *assignment statements*. An assignment statement has a name on the left side of an `=` sign and an expression to be evaluated on the right.

In [None]:
ten = (3 * 2) + 4

When you run that cell, Python first evaluates the first line.  It computes the value of the expression `(3 * 2) + 4`, which is the number 10.  Then it gives that value the name `ten`.  At that point, the code in the cell is done running.

After you run that cell, the value 10 is bound to the name `ten`:

In [None]:
ten

The statement `ten = 3 * 2 + 4` is not asserting that `ten` is already equal to `3 * 2 + 4`, as we might expect by analogy with math notation.  Rather, that line of code sets or changes what the name `ten` means; it now refers to the value 10, whereas before it meant nothing at all.

If the designers of Python had been ruthlessly pedantic, they might have made us write

    define the name ten to hereafter have the value of (3 * 2) + 4 

instead.  You will probably appreciate the brevity of "`=`"!  But keep in mind that this is the real meaning.

<br><br>

**Question 3.1.** Try writing code in the cell below that prints a name (like `eleven`) that hasn't been assigned to anything.  You'll see an error because the name has not been bound to a value yet.

In [None]:
eleven

A common pattern in Jupyter notebooks is to assign a value to a name and then immediately evaluate the name in the last line in the cell so that the value is displayed as output. 

In [None]:
close_to_pi = 355/113
close_to_pi

Another common pattern is using a series of lines in a single cell to build up a more complex computation in stages, naming the intermediate results.

In [None]:
bimonthly_salary = 840
monthly_salary = 2 * bimonthly_salary
number_of_months_in_a_year = 12
yearly_salary = number_of_months_in_a_year * monthly_salary
yearly_salary

Names in Python can have letters (upper- and lower-case letters are both okay and count as different letters), underscores, and numbers.  The first character can't be a number (otherwise a name might look like a number).  And names can't contain spaces, since spaces are used to separate pieces of code from each other.

Other than those rules, what you name something doesn't matter *to Python*.  For example, the cell below does the same thing as the above cell (try running it), except everything has a different name. **However**, names are very important for making your code *readable* to yourself and others.  The cell above is shorter, but it's totally useless without an explanation of what it does.

In [None]:
a = 840
b = 2 * a
c = 12
d = c * b
d

**Question 3.2.** Assign the name `seconds_in_a_decade` to the number of seconds between midnight January 1, 2010 and midnight January 1, 2020.

*Hint:* If you're stuck, the next section shows you how to get hints.

In [None]:
# Change the next line so that it computes the number of
# seconds in a decade and assigns that number the name
# seconds_in_a_decade.
seconds_in_a_decade = 60*60*24*(365*8 + 366*2)

# We've put this line in this cell so that it will print
# the value you've given to seconds_in_a_decade when you
# run it.  You don't need to change this.
seconds_in_a_decade

## 3.1. Checking your code
Now that you know how to name things, you can start using the built-in *tests* to check whether your work is correct. Try not to change the contents of the test cells. Running the following cell will test whether you have assigned `seconds_in_a_decade` correctly in Question 3.2. If you haven't, this test will tell you the correct answer. Resist the urge to just copy it, and instead try to adjust your expression. (Sometimes the tests will give hints about what went wrong...). Your answer is write if you get the output `[ooooooooook] 100.0% passed` as the last line.

In [None]:
# Test cell; please do not change!
_ = ok.grade('q32')

## 3.2. Comments
You may have noticed this line in the cell above:

    # Test cell; please do not change!

That is called a *comment*.  It doesn't make anything happen in Python; Python ignores anything on a line after a #.  Instead, it's there to communicate something about the code to you, the human reader.  Comments are extremely useful.


## 3.3. Application: Exponential and Linear Growth

Popular science literature and business news sometimes use the phrase "growing exponentially" to describe a process that is gorwing quickly, like the number of users on a new online platform. However, "exponential growth" has a very specific mathematical definition. 

Let $r$ be the exponential growth rate per unit time and $x_t$ be the value of the process at time $t$. Exponential growth  predicts that the value of the process at time $t + d$ will be $$x_{t+d} = x_t \times (1+r)^{d}$$ 

**Question 3.3.1.** Suppose a new online service has 50 users at the end of their first month and 200 users at the end of their second month. From these data, the company claims that their user base is growing exponentially with a 300% growth rate (r=3). Complete the cell below using the formula above to predict how many users the company will have at the end of its first year if it really is following an exponential growth model with these parameters. Assign the name `x_td` to the result. 

In [None]:
x_t = 200
r   = 3
d   = 10
x_td = x_t * (1+r)**d
x_td

In [None]:
_ = ok.grade('q331')

**Question 3.3.2.** Suppose a skeptical analyst thinks that the company's growth rate is probably just linear. A linear growth model predicts that $x_{t+d} = x_t + r\times d$ where $r$ is the linear growth rate per unit time. Increasing from 50 to 200 users in one month implies the linesr growth rate is $r=150$ users per month. Complete the code below to make a prediction for the number of users at the end of 12 months using this information. Assign the name `x_td_linear` to the result. 

In [None]:
t   = 2
x_t = 200
r   = 150
d   = 10
x_td_linear = x_t + r*d
x_td_linear

In [None]:
_ = ok.grade('q332')

## 4. Calling functions

When a piece of code is useful, it tends to get re-used a lot. Instead of copying and pasting the same code into multiple places in a program, the code can be packaged into a function and that function can be called (or applied) multiple times. A function has a name and expects one or more inputs. Python functions are very analagous to mathematical functions.

For example, the `abs` function takes a single number as its argument and returns the absolute value of that number.  The absolute value of a number is its distance from 0 on the number line, so `abs(5)` is 5 and `abs(-5)` is also 5.

In [None]:
abs(5)

In [None]:
abs(-5)

## 4.1. Application: Computing walking distances
Chunhua is on the corner of 7th Avenue and 42nd Street in Midtown Manhattan, and she wants to know far she'd have to walk to get to Gramercy School on the corner of 10th Avenue and 34th Street.

She can't cut across blocks diagonally, since there are buildings in the way.  She has to walk along the sidewalks.  Using the map below, she sees she'd have to walk 3 avenues (long blocks) and 8 streets (short blocks).  In terms of the given numbers, she computed 3 as the difference between 7 and 10, *in absolute value*, and 8 similarly.  

Chunhua also knows that blocks in Manhattan are all about 80m by 274m (avenues are farther apart than streets).  So in total, she'd have to walk $(80 \times |42 - 34| + 274 \times |7 - 10|)$ meters to get to the park.

<img src="map.jpg"/>
<br><br>
**Question 4.1.1.** Finish the line `num_avenues_away = ...` in the next cell so that the cell calculates the distance Chunhua must walk and gives it the name `manhattan_distance`.  Everything else has been filled in for you.  **Use the `abs` function.**

In [None]:
# Here's the number of streets away:
num_streets_away = abs(42-34)

# Compute the number of avenues away in a similar way:
num_avenues_away = abs(7-10)

street_length_m = 80
avenue_length_m = 274

# Now we compute the total distance Chunhua must walk.
manhattan_distance = street_length_m*num_streets_away + avenue_length_m*num_avenues_away

# We've included this line so that you see the distance
# you've computed when you run this cell.  You don't need
# to change it, but you can if you want.
manhattan_distance

Be sure to run the next cell to test your code.

In [None]:
_ = ok.grade('q411')

##### Multiple arguments
Some functions take multiple arguments, separated by commas. For example, the built-in `max` function returns the maximum argument passed to it. Try it out!

In [None]:
max(2, -3, 4, -5)

# 5. Understanding nested expressions 
Function calls and arithmetic expressions can themselves contain expressions.  You saw an example in the last question:

    abs(42-34)

has 2 number expressions in a subtraction expression in a function call expression.  And you probably wrote something like `abs(7-10)` to compute `num_avenues_away`.

Nested expressions can turn into complicated-looking code. However, the way in which complicated expressions break down is very regular.

Suppose we are interested in heights that are very unusual.  We'll say that a height is unusual to the extent that it's far away on the number line from the average human height.  [An estimate](http://press.endocrine.org/doi/full/10.1210/jcem.86.9.7875?ck=nck&) of the average adult human height (averaging, we hope, over all humans on Earth today) is 1.688 meters.

So if Aditya is 1.21 meters tall, then his height is $|1.21 - 1.688|$, or $.478$, meters away from the average.  Here's a picture of that:

<img src="numberline_0.png">

And here's how we'd write that in one line of Python code:

In [None]:
abs(1.21 - 1.688)

What's going on here?  `abs` takes just one argument, so the stuff inside the parentheses is all part of that *single argument*.  Specifically, the argument is the value of the expression `1.21 - 1.688`.  The value of that expression is `-.478`.  That value is the argument to `abs`.  The absolute value of that is `.478`, so `.478` is the value of the full expression `abs(1.21 - 1.688)`.

Picture simplifying the expression in several steps:

1. `abs(1.21 - 1.688)`
2. `abs(-.478)`
3. `.478`

In fact, that's basically what Python does to compute the value of the expression.

**Question 5.1.** Say that Botan's height is 1.85 meters.  In the next cell, use `abs` to compute the absolute value of the difference between Botan's height and the average human height.  Give that value the name `botan_distance_from_average_m`.

<img src="numberline_1.png">

In [None]:
# Replace the ... with an expression to compute the absolute
# value of the difference between Botan's height (1.85m) and
# the average human height.
botan_distance_from_average_m = 0.162

# Again, we've written this here so that the distance you
# compute will get printed when you run this cell.
botan_distance_from_average_m

In [None]:
_ = ok.grade('q51')


**Question 5.2.** Now say that we want to compute the most 'unusual' height among Aditya's and Botan's heights.  We'll use the function `max`, which (again) takes two numbers as arguments and returns the larger of the two arguments.  Combining that with the `abs` function, complete the cell below to compute the biggest distance from the average among the two heights.

In [None]:
aditya_height_m = 1.21
botan_height_m = 1.85
average_adult_human_height_m = 1.688

# The biggest distance from the average human height, among the two heights:
biggest_distance_m = max(abs(aditya_height_m - average_adult_human_height_m), abs(botan_height_m - average_adult_human_height_m))

# Print out our results in a nice readable format:
print("The biggest distance from the average height among these two people is", biggest_distance_m, "meters.")

In [None]:
_ = ok.grade('q52')

# Lab Complete!

You're done with Lab 1!  Be sure to run the tests and verify that they all pass, then choose **Save and Checkpoint** from the **File** menu, then run the final cell (two below this one) to submit your work.  If you submit multiple times, your last submission will be counted.

In [None]:
# For your convenience, you can run this cell to run all the tests at once!
import os
_ = [ok.grade(q[:-3]) for q in os.listdir("tests") if q.startswith('q')]

**Important.** Before you leave lab, run this final cell to submit your work.

In [None]:
_ = ok.submit()