<a href="https://colab.research.google.com/github/tay4real/datascience/blob/main/01_variables.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

Printed copies of *Elements of Data Science* are available now, with a **full color interior**, from [Lulu.com](https://www.lulu.com/shop/allen-downey/elements-of-data-science/paperback/product-9dyrwn.html).

# Welcome

This is the Jupyter notebook for Chapter 1 of [*Elements of Data Science*](https://greenteapress.com/wp/elements-of-data-science), by Allen B. Downey.

If you are not familiar with Jupyter notebooks,
[click here for a short introduction](https://colab.research.google.com/github/AllenDowney/ElementsOfDataScience/blob/v1/jupyter_intro.ipynb).

Then, if you are not already running this notebook on Colab, [click here to run this notebook on Colab](https://colab.research.google.com/github/AllenDowney/ElementsOfDataScience/blob/v1/01_variables.ipynb).

The following cell downloads a file and runs some code that is used specifically for this book.
You don't have to understand this code yet, but you should run it before you do anything else in this notebook.
Remember that you can run the code by selecting the cell and pressing the play button (a triangle in a circle) or hold down `Shift` and press `Enter`.

In [1]:
from os.path import basename, exists

def download(url):
    filename = basename(url)
    if not exists(filename):
        from urllib.request import urlretrieve

        local, _ = urlretrieve(url, filename)
        print("Downloaded " + str(local))
    return filename

download('https://raw.githubusercontent.com/AllenDowney/ElementsOfDataScience/v1/utils.py')

import utils

Downloaded utils.py


# Variables and Values

The topics in this chapter are:

* Basic programming features in Python: variables and values.

* Translating formulas from math notation to Python.

You don't need a lot of math to do data science, but and the end of this chapter I'll review one topic that comes up a lot: logarithms.

## Numbers

Python provides tools for working with many kinds of data, including numbers, words, dates, times, and locations (latitude and longitude).
Let's start with numbers.  Python can work with several types of numbers, but the two most common are:

* `int`, which represents integer values like `3`, and

* `float`, which represents numbers that have a fraction part, like `3.14159`.

Most often, we use `int` to represent counts and `float` to represent measurements.

Here's an example of an `int`:

In [2]:
3

3

When you run a cell that contains a value like this, Jupyter displays the value.

Here's an example of a `float`:

In [3]:
3.14159

3.14159

 `float` is short for "floating-point", which is the name for the way these numbers are stored.
Floating-point numbers can also be written in a format similar to scientific notation:

In [None]:
1.2345e3

This value is equivalent to $1.2345 \times 10^{3}$, so the result is `1234.5`.
The `e` in `1.2345e3` stands for "exponent".

If you are not familiar with scientific notation, you can read about it at <https://en.wikipedia.org/wiki/Scientific_notation>.

## Arithmetic

Python provides operators that perform arithmetic.  The operators that perform addition and subtraction are `+` and `-`:

In [4]:
3 + 2 - 1

4

The operators that perform multiplication and division are `*` and `/`:

In [5]:
2 * 3

6

In [6]:
2 / 3

0.6666666666666666

And the operator for exponentiation is `**`:

In [7]:
2 ** 3

8

Unlike math notation, Python does not allow implicit multiplication.  For example, in math notation, if you write $3 (2 + 1)$, that's understood to be the same as $3 \times (2+ 1)$.
In Python, that's an error.

NOTE: The following cell uses `%%expect`, which is a Jupyter "magic command" that means we expect the code in this cell to produce an error.

For more about this magic command, see the
[Jupyter notebook introduction](https://colab.research.google.com/github/AllenDowney/ThinkPython/blob/v3/chapters/jupyter_intro.ipynb).

In [8]:
%%expect TypeError

3 (2 + 1)

  3 (2 + 1)


TypeError: 'int' object is not callable

In this example, the error message is not very helpful, which is why I am warning you now.
If you want to multiply, you have to use the `*` operator.

The arithmetic operators follow the rules of precedence you might have learned as "PEMDAS":

* Parentheses before
* Exponentiation before
* Multiplication and division before
* Addition and subtraction.

So in this expression:

In [9]:
1 + 2 * 3

7

The multiplication happens first.  If that's not what you want, you can use parentheses to make the order of operations explicit:

In [10]:
(1 + 2) * 3

9

**Exercise:** Write a Python expression that raises `1+2` to the power `3*4`.  The answer should be `531441`.

Note: in the cell below, it should say

```
# Solution goes here
```

Lines like this that begin with `#` are "comments" -- they provide information, but they have no effect when the program runs.
When you do this exercise, you should delete the comment and replace it with your solution.

In [11]:
# Solution goes here
(1 + 2) ** (3*4)

531441

## Math Functions

Python provides functions that compute mathematical functions like `sin` and `cos`, `exp` and `log`.
However, they are not part of Python itself, but they are available from a **library**, which is a collection of values and functions.
The one we'll use is called NumPy, which stands for "Numerical Python", and is pronounced "num pie".
Before you can use a library, you have to **import** it.
Here's how we import NumPy:  

In [None]:
import numpy as np

This line of code imports `numpy` as `np`, which means we can refer to it by the short name `np` rather than the longer name `numpy`.
Names like this are case-sensitive, which means that `numpy` is not the same as `NumPy`.
So even though the name of the library is NumPy, when we import it we have to call it `numpy`.  

In [None]:
%%expect ModuleNotFoundError

import NumPy as np

This error message might be confusing if you don't pay attention to the difference between uppercase and lowercase.
But assuming we import `np` correctly, we can use it to read the value `pi`, which represents the mathematical constant $\pi$.

In [None]:
np.pi

The result is a `float` with 16 digits.  As you might know, we can't represent $\pi$ with a finite number of digits, so this result is only approximate.

NumPy provides `log`, which computes the natural logarithm

In [None]:
np.log(100)

NumPy also provides `exp`, which raises the constant `e` to a power.

In [None]:
np.exp(1)

**Exercise:** Use these functions to check the identity $\log(e^x) = x$.
Mathematically, it is true for any value of $x$.
With floating-point values, it only holds for values of $x$ between -700 and 700.
What happens when you try it with larger and smaller values?

In [None]:
# Solution goes here

Floating-point numbers are finite approximations, which means they don't always behave like math.
As another example, let's see what happens if we add up `0.1` three times:

In [None]:
0.1 + 0.1 + 0.1

The result is close to `0.3`, but not exact.
When you work with floating-point numbers, remember that they are only approximately correct.

## Variables

A **variable** is a name that refers to a value.
The following statement assigns the value `5` to a variable named `x`:

In [None]:
x = 5

The variable we just created has the name `x` and the value `5`.

If a variable name appears at the end of a cell, Jupyter displays its value.

In [None]:
x

If we use `x` as part of an arithmetic operation, it represents the value `5`:

In [None]:
x + 1

In [None]:
x ** 2

We can also use a variable when we call a function:

In [None]:
np.exp(x)

Notice that the result from `exp` is a `float`, even though the value of `x` is an `int`.

**Exercise:** If you have not programmed before, one of the things you have to get used to is that programming languages are picky about details. Natural languages, like English, and semi-formal languages, like math notation, are more forgiving.

As an example, in math notation, parentheses and square brackets mean the same thing, you can write $\sin (\omega t)$ or $\sin [\omega t]$ -- either one is fine. And you can leave out the parentheses altogether, as long as the meaning is clear, as in $\sin \omega t$.
In Python, every character counts. For example, the following are all different, and only the first one works:

```
np.exp(x)
np.Exp(x)
np.exp[x]
np.exp x
```

While you are learning, I encourage you to make mistakes on purpose to see what goes wrong.  Read the error messages carefully.  Sometimes they are helpful and tell you exactly what's wrong.  Other times they can be misleading.  But if you have seen the message before, you might remember some likely causes.

In the next cell, try out the different versions of `np.exp(x)` above, and see what error messages you get.

In [None]:
np.exp(x)

**Exercise:** The NumPy function that computes square roots is `sqrt`.
Use it to compute a floating-point approximation of the golden ratio,
$\phi = \frac{1}{2}(1 + \sqrt{5})$. Hint: The result should be close to `1.618`.

In [None]:
# Solution goes here

## Save your work

If you are running on Colab and you want to save your work, now is a good time to press the "Copy to Drive" button (near the upper left), which saves a copy of this notebook in your Google Drive.

If you want to change the name of the file, you can click on the name in the upper left.
If you don't use Google Drive, look under the File menu to see other options.

Once you make a copy, any additional changes you make will be saved automatically, so now you can continue without worrying about losing your work.

## Calculating with Variables

Now we'll use variables to solve a problem involving compound interest.
It might not be the most exciting example, but it uses everything we have done so far, and it reviews exponentiation and logarithms, which we are going to need.

If we invest an amount of money, $P$, in an account that earns compounded interest, the total accumulated value, $V$, after an interval of time, $t$, is:

$V=P\left(1+{\frac {r}{n}}\right)^{nt}$

where $r$ is the annual interest rate and $n$ is the compounding frequency.
For example, if you deposit \$2,100 in a bank paying an annual interest rate of 3.4\% compounded four times a year, we can compute the balance after 7 years by defining these variables:

In [None]:
P = 2100
r = 0.034
n = 4
t = 7

And computing the total accumulated value like this.

In [None]:
P * (1 + r/n)**(n*t)

**Exercise:** Continuing the previous example, suppose you start with the same principle and the same interest rate, but interest is compounded twice per year, so `n = 2`.
What would the total value be after 7 years?  Hint: we expect the answer to be a bit less than the previous answer.

In [None]:
# Solution goes here

**Exercise:** If interest is compounded continuously, the value after time $t$ is $V=P~e^{rt}$. Translate this equation into Python and use it compute the value of the investment in the previous example with continuous compounding.  Hint: we expect the answer to be a bit more than the previous answers.

In [None]:
# Solution goes here

**Exercise** If we solve the previous equation for $r$, we get $r = \log(V/P)~/~t$,
where $\log$ is the logarithm base $e$, also known as the natural logarithm.

Harvard's tuition in 1970 was \\$4,070 (not including room, board, and fees).
In 2019 it was \\$46,340.
What was the annual rate of increase over that period, treating it as if it had compounded continuously?"
For comparison, the average annual rate of inflation over the same period, based on the consumer price index (CPI) was 3.84%.

In [None]:
# Solution goes here

## Summary

This chapter introduces variables, which are names that refer to values, and two kinds of values, integers and floating-point numbers.

It presents mathematical operators, like `+` for addition and `*` for multiplication, and mathematical functions like `log` for logarithms and `exp` for raising `e` to a power.

In the next chapter, we'll see additional data types for representing letters and words, dates and times, and latitude and longitude.

## A little more Jupyter

Here are a few tips on using Jupyter to compute and display values.

Generally, if there is a single expression in a cell, Jupyter computes the value of the expression and displays the result.
For example, we've already seen how to display the value of `np.pi`:

In [None]:
np.pi

Here's a more complex example with functions, operators, and numbers:

In [None]:
1 / np.sqrt(2 * np.pi) * np.exp(-3**2 / 2)

If you put more than one expression in a cell, Jupyter computes them all, but it only display the result from the last:

In [None]:
1
2 + 3
np.exp(1)
(1 + np.sqrt(5)) / 2

If you want to display more than one value, you can separate them with commas:

In [None]:
1, 2 + 3, np.exp(1), (1 + np.sqrt(5)) / 2

That result is actually a tuple, which you will learn about in the next chapter.

Here's one last Jupyter tip: when you assign a value to variable, Jupyter does not display the value:

In [None]:
phi = (1 + np.sqrt(5)) / 2

So it is idiomatic to assign a value to a variable and immediately display the result:

In [None]:
phi = (1 + np.sqrt(5)) / 2
phi

**Exercise:** Display the value of $\phi$ and its inverse, $1/\phi$, on a single line. Do you notice anything about the relationship between the values?

In [None]:
# Solution goes here

*Elements of Data Science*

Copyright 2021 [Allen B. Downey](https://allendowney.com)

License: [Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International](https://creativecommons.org/licenses/by-nc-sa/4.0/)