# Project: Linear Regression - Bouncing Balls⚽

In this project, we aim to optimize a ball pit for a local fast food joint's play area by investigating the bounciness of different sizes of balls. To achieve this, we conduct an experiment and record data points which we then analyze using the statistical technique of linear regression. This method allows us to understand and model the relationship between ball size and bounciness. Our challenge involves implementing linear regression from scratch in Python, which requires careful coding and consideration of various slopes (m values) and intercepts (b values) to find the line of best fit that minimizes the error for our dataset. Ultimately, our goal is to provide insights into how different sizes of balls will behave in the ball pit, thereby optimizing the play area for the best possible user experience.

## Part 1: Calculating Error ❓


The line we will end up with will have a formula that looks like:
```
y = m*x + b
```
`m` is the slope of the line and `b` is the intercept, where the line crosses the y-axis.

Fill in the function called `get_y()` that takes in `m`, `b`, and `x`. It should return what the `y` value would be for that `x` on that line!


In [1]:
def get_y(m, b, x):
  return m*x+b

print(get_y(1, 0, 7) == 7)
print(get_y(5, 10, 3) == 25)


True
True


To find the distance:
1. Get the x-value from the point and store it in a variable called `x_point`
2. Get the y-value from the point and store it in a variable called `y_point`
3. Use `get_y()` to get the y-value that `x_point` would be on the line
4. Find the difference between the y from `get_y` and `y_point`
5. Return the absolute value of the distance (you can use the built-in function `abs()` to do this)

The distance represents the error between the line `y = m*x + b` and the `point` given.


In [2]:
#Write your calculate_error() function here

def calculate_error(m,b,point):
    x,y = point
    x_point = x
    y_point = y
    distance = abs(get_y(m,b,x)-y_point)
    return distance

In [4]:
#this is a line that looks like y = x, so (3, 3) should lie on it. thus, error should be 0:
print(calculate_error(1, 0, (3, 3)))
#the point (3, 4) should be 1 unit away from the line y = x:
print(calculate_error(1, 0, (3, 4)))
#the point (3, 3) should be 1 unit away from the line y = x - 1:
print(calculate_error(1, -1, (3, 3)))
#the point (3, 3) should be 5 units away from the line y = -x + 1:
print(calculate_error(-1, 1, (3, 3)))

0
1
1
5


Great! Reggie's datasets will be sets of points. For example, he ran an experiment comparing the width of bouncy balls to how high they bounce:


In [16]:
datapoints = [(1, 2), (2, 0), (3, 4), (4, 4), (5, 3)]

Let's test this function!

The first datapoint, `(1, 2)`, means that his 1cm bouncy ball bounced 2 meters. The 4cm bouncy ball bounced 4 meters.

As we try to fit a line to this data, we will need a function called `calculate_all_error`, which takes `m` and `b` that describe a line, and `points`, a set of data like the example above.

`calculate_all_error` iterates through each `point` in `points` and calculate the error from that point to the line (using `calculate_error`). It keeps a running total of the error, and then returns that total after the loop.


In [5]:
#Write your calculate_all_error function here
def calculate_all_error(m,b,points):
    result = 0
    for point in points:
        result+=calculate_error(m,b,point)
    return result

print(calculate_all_error(1, 1, datapoints))

7


Let's test this function!

In [13]:
#every point in this dataset lies upon y=x, so the total error should be zero:
datapoints = [(1, 1), (3, 3), (5, 5), (-1, -1)]
print(calculate_all_error(1, 0, datapoints))

#every point in this dataset is 1 unit away from y = x + 1, so the total error should be 4:
datapoints = [(1, 1), (3, 3), (5, 5), (-1, -1)]
print(calculate_all_error(1, 1, datapoints))

#every point in this dataset is 1 unit away from y = x - 1, so the total error should be 4:
datapoints = [(1, 1), (3, 3), (5, 5), (-1, -1)]
print(calculate_all_error(1, -1, datapoints))


#the points in this dataset are 1, 5, 9, and 3 units away from y = -x + 1, respectively, so total error should be
# 1 + 5 + 9 + 3 = 18
datapoints = [(1, 1), (3, 3), (5, 5), (-1, -1)]
print(calculate_all_error(-1, 1, datapoints))

0
4
4
18


In [32]:
datapoints = [(1, 2), (2, 0), (3, 4), (4, 4), (5, 3)]

Great! It looks like we now have a function that can take in a line and Reggie's data and return how much error that line produces when we try to fit it to the data.

Our next step is to find the `m` and `b` that minimizes this error, and thus fits the data best!


## Part 2: Slopes and intercepts 📈


Here we try a bunch of different slopes (`m` values) and a bunch of different intercepts (`b` values) and see which one produces the smallest error value for our dataset.

Here we generate a list of possible m & b values.

In [38]:
possible_ms = [m*.1 for m in range(-100,101)]

In [40]:
possible_bs = [b*.1 for b in range(-200,201)]

We are going to find the smallest error. First, we will make every possible `y = m*x + b` line by pairing all of the possible `m`s with all of the possible `b`s. Then, we will see which `y = m*x + b` line produces the smallest total error with the set of data stored in `datapoint`.

In [41]:
smallest_error = float("inf")
best_m = 0
best_b = 0

In [42]:
datapoints = [(1, 2), (2, 0), (3, 4), (4, 4), (5, 3)]
for m in possible_ms:
    for b in possible_bs:
        current_error = calculate_all_error(m, b, datapoints)
        if current_error < smallest_error:
            best_m, best_b = m, b
            smallest_error = current_error
        
print(best_m)
print(best_b)
print(smallest_error)

0.30000000000000004
1.7000000000000002
4.999999999999999


## Part 3: What does our model predict?🔎

Now we have seen that for this set of observations on the bouncy balls, the line that fits the data best has an `m` of 0.3 and a `b` of 1.7:

```
y = 0.3x + 1.7
```

This line produced a total error of 5.

Using this `m` and this `b`, what does your line predict the bounce height of a ball with a width of 6 to be?
In other words, what is the output of `get_y()` when we call it with:
* m = 0.3
* b = 1.7
* x = 6

In [43]:
m = .3
b = 1.7
x= 6
print(get_y(m, b, x))

3.5


Our model predicts that the 6cm ball will bounce 3.5m.