# Calculating a line

### Introduction

Previously, we saw how a regression line can help us describe our relationship between an input variable like a movie bud

![](./plot-intersect.png)

Well one of the benefits of using a line is that we can see how much money will be brought in for any point on this line.  Spend 50 million, and expect to bring in about 63 million.  Spend 10 million, and expect to bring in 17 million.  This approach of modeling a relationship between an input and an output is called **linear regression**. 

Let's see if we can translate this line into a function that will tell us the y-value that corresponds to any given value of x-along that line.

Let's take an initial (wrong) guess as to how to turn this into a function.

$y = x$

Here is how we write it as a function.

In [20]:
def y(x):
    return x

y(0)

0

In [21]:
y(10000000)

10000000

This is pretty nice.  I just wrote a function that will automatically calculate the expected revenue given a movie budget.  This function says that for every value of $x$ that I input to the function, I will get back an equal value $y$.  So according to the function, if the movie has a budget of 30 million, it will earn 30 million.  

But take a look at the line that we drew.  Our line says something different.  The line says that spending 30 million brings predicted earnings of 40 million.  

So we need to change our function so that it lines up with our line.  In fact, we need a consistent way to turn lines into functions, and vice versa.  Ok, let's get to it.

We can start by taking a look at our chart below, which shows how our line relates x-values and y-values -- that is budget, and revenue.

| X (budget)       | Y (revenue)           | 
| ------------- |:-------------:| 
| 0      |0 | 
| 30 million      |40 million | 
| 60 million      |80 million | 

Ok, so now we need an equation that will allow us to input 0 and get back 0, input 30 million and get back 40 million, and input 60 million and get back 80 million?  What equation is that.

Well it's $y = 4/3*x$  Don't believe me?  Take a look.

* 0 = 4/3 * 0
* 40 million =  4/3 * 30 million 
* 80 million = 4/3 * 60 million 

Let's see it in the code, and then in the next section we'll show how we figured this out. 

Ok, this is what this formula looks like in code.

In [16]:
def y(x):
    return 4/3*x

y(30000000)

40000000.0

In [17]:
y(0)

0.0

Progress! So we added a number to multiply each value of $x$ by, 4/3.  And now, we can describe the line in our chart with a function that given an value of $x$, corresponds the value of $y$ along our graphed line.  

This is a common technique for describing a line.  You will see it described as 

$y = mx$ 

With the variables standing for the following: 

* $y$: the value that is returned, also called the **response variable**, as it responds to values of $x$
* $x$: the input variable, also called the **explanatory variable**, as it explains the value of $y$
* $m$: the **slope variable**, determines how vertical or horizontal the line will be

In our movie example, these terms make sense.  The $y$ value is our revenue earned from the movie, which we say is in response to our budget.  Our explanatory variable of our budget, $x$, explains our revenue, and the $m$ corresponds to our value of 1.33, which describes how much money is earned for each dollar spent.  So with this value of $m$, our line is saying that for every dollar spent expect to earn 1.33 dollars in return.  A $m$ of 2.0 would say that two dollars is earned for every dollar spent.

The variable $m$ is referred to as the slope variable because it refers to the slope of our line.  So a higher value of $m$ means a steeper line.  It also means that we expect a more money earned per dollar spent on our movies.  Imagine the line pivoting to a steeper tilt as we guess a higher amount of money earned per dollar spent.  

![](./plot-intersect.png)

### Calculating the slope variable 

This is our mechanism for calculating the slope $m$.  Take any two points along the straight line, then $m$ is **the ratio of the vertical distance travelled to the horizontal distance travelled**.  Or, in math, it's:

$m = \Delta y \div \Delta x $
> The $\Delta$ is the Greek letter Delta.  In math, Delta means change.  So you can the read the above formula as $m$ equals change in y divided by change in x.

For example, let's take another look of our graph, and our line.  Let's travel the distance from x being equal to zero to 10 million.  Plugging the numbers into our formula, we see that for that segment:

* $\Delta x$ = 10 million
* $\Delta y$ = 13.3 million

Notice that another way to word change in x is really our ending x value, 10 million, minus our starting x value, 0.  And that change in y also means our ending y value, 13 million, minus our y initial value 0.  

So this means: 

* $\Delta y = y_1-  y_0$
* $\Delta x = x_1 - x_0$

And therefore we can say $m$ is the following: given a beginning point (x0, y0) and an ending point (x1, y1) along any segment of a straight line, the slope of that line $m$ equals the following:  

$m = (y_1 - y_0) \div (x_1 - x_0)$

Ok, let's apply this formula to our line.  We can choose any two points for the formula, so let's have a starting point of (30 million, 40 million) and an ending point of (60 million, 80 million). Then plugging these coordinates into our formula, we have the following:

* $m =(y_1 - y_0)\div(x_1 - x_0) =  (80,000,000 - 40,000,000) \div (60,000,000 - 30,000,000) = 4/3 = 1.33$

![](./m-calc.png)

So that is how we calculate the slope of a line, take any two points along that line and divide distance travelled vertically from the distance travelled horizontally.

### The y intercept

Ok, there is just one more thing that we need to be able to learn before being able to describe every straight line in a two dimensional world.  That is the y-intercept.

The y-intercept is the y value of the line when it intersects the y-axis.  Or to put it another way, the y-intercept is the value of y when x equals zero. 

![](plot-add.png)

So looking at the graph, what is the y intercept of the blue line?  Well it's the value of y when the blue line crosses the y-axis.  The value is zero.  Now you can imagine shifting up the entire line up, so that the y intercept increases to to 20 million, and that for every value of x, the corresponding value of y increases by 20 million.  So our formula is no longer y = 4/3 x.  It is y = 4/3 x + 20 million. 

In statistics, you will see this as $y = mx + b$ where b is the y-intercept.  Taking a look at our chart of points on the line, we can see that 20 million is our y-intercept.

| X        | Y           | 
| ------------- |:-------------:| 
| 0      |20 million | 
| 30 million      |60 million | 
| 60 million      |100 million | 

And translating our formula into a function, we have:

In [19]:
def y(x):
    return 4/3*x + 20000000

In [20]:
y(30000000)

60000000.0

In [21]:
y(60000000)

100000000.0

The formula $y = mx + b$ can describe any line in a two dimensional space.  The $m$ value will change how flat or vertical the line is, and the $b$ value changes the starting point of the line. 

### Summary