<a href="https://colab.research.google.com/github/yuli139304/Airbnb_app/blob/main/Fernanda_Toledo_Week_1_Project_PFDI.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

> 1. DUPLICATE THIS COLAB DOCUMENT TO START WORKING ON IT: On the top-left of this page, go to File > Save a copy to drive.
> 2. SHARE SETTINGS: In the new notebook, set the sharing settings to "Anyone with the link" by clicking "Share" on the top-right corner.

<center>
  <img src=https://teamleader.fra1.cdn.digitaloceanspaces.com/corporate/production/header/_1200x630_crop_center-center_75_none/HQ_Blog_TheUltimateProjectmanager_Header.png width="500" align="center" />
</center>
<br/>

# Week 1: Clean the Airbnb Dataset (and Deploy It!)

Welcome to the first week's project for *Python for Data Science*!

This week's lecture and material on CoRise showed you how to effectively use NumPy to clean, read, and process data. Now having looked at the data, you might be thinking to yourself, "Why were these these latitude and longitude columns included in the dataset for Week 1?" 🤷 We'll put this data to use in this project, and also get a preview of how you will be expected to make forecasts of your own with this data at the end of this course! For this week's project, we are going to use that location data to make an interactive app. We hope this project gets you excited for what's to come in following weeks 🙌🙌! 

But first, let's process our data! 

---

*All the information required to finish this week's project can be found by clicking on the **"Related section on CoRise"**-link. If you are unable to do so, please reach out to us on Slack!*

## Downloading the Dataset

You will need to download some prerequisite packages in order to run all the code below. Let's install them!

In [None]:
%%capture
!pip install numpy pandas streamlit gdown currencyconverter

In [None]:
import numpy as np

# For readability purposes, we will disable scientific notation for numbers
np.set_printoptions(suppress=True)

Taking a look at the `import` statements below shows that we are using a mix of Python out-of-the-box 🎁 libraries (os, shutil, gzip) alongside some custom ones (gdown, numpy). Gdown allows us to download files from Google Drive, which is where we saved our modified dataset. Gzip helps us unzip downloaded files. Shutil copies the downloaded files in the right location, and we'll use os to delete unneeded files.

In [None]:
import os
import shutil

import gdown
from numpy import genfromtxt

# Download file from Google Drive
# This file is based on data from: http://insideairbnb.com/get-the-data/
file_id_1 = "13fyESiH1ZEnMV6eabAyhe20t4W6peEWK"
downloaded_file_1 = "WK1_Airbnb_Amsterdam_listings_proj.csv"

# Download the file from Google Drive
gdown.download(id=file_id_1, output=downloaded_file_1)

Downloading...
From: https://drive.google.com/uc?id=13fyESiH1ZEnMV6eabAyhe20t4W6peEWK
To: /content/WK1_Airbnb_Amsterdam_listings_proj.csv
100%|██████████| 246k/246k [00:00<00:00, 36.6MB/s]


'WK1_Airbnb_Amsterdam_listings_proj.csv'

## Preprocessing the Dataset
Getting this particular dataset loaded is a tad bit different from what we learned in this week's content. This time we only have 1 CSV file, so we do not need to merge it. So because there is no need to merge we can move right into data preprocessing!

#### Task 1: Find your delimiter

[*\[Related section on CoRise\]*](https://corise.com/course/python-for-data-science/v2/module/loading-inspect-dataset-kzxvy#corise_clak8pc87000g2a76vw0m8nyq)

Inspect the csv file we just downloaded and look at the type of delimiter it has. Once you've found the right delimiter, use the dtype "unicode". 

In [None]:
from numpy import genfromtxt

my_data = genfromtxt(downloaded_file_1, delimiter="|", dtype="unicode")


<details>
  <summary>Show Solution</summary>

  
```python
my_data = genfromtxt(downloaded_file_1, delimiter="|", dtype="unicode")
```

</details>

Next, output the first four columns for inspection to see if you've got the data formatted how you'd like.

In [None]:
my_data[:,0:4]


array([['', '0', '1', '2'],
       ['id', '23726706', '35815036', '31553121'],
       ['price', '$88.00', '$105.00', '$152.00'],
       ['latitude', '52.34916', '52.42419', '52.43237'],
       ['longitude', '4.97879', '4.95689', '4.91821']], dtype='<U18')

<details>
  <summary>Show Expected Output</summary>

  
```
array([['', '0', '1', '2'],
    ['id', '23726706', '35815036', '31553121'],
    ['price', '$88.00', '$105.00', '$152.00'],
    ['latitude', '52.34916', '52.42419', '52.43237'],
    ['longitude', '4.97879', '4.95689', '4.91821']], dtype='<U18')
```

<details>
<summary>Show Solution</summary>

```python
my_data[:, :4]
```

</details>
</details>

Awesome! But notice our data is aligned a little differently than how we saw in the course materials. It's like we shifted our dataset by 90 degrees! You'll have to fix this a little bit later.

#### Task 2: Clean it up

[*\[Related section on CoRise\]*](https://corise.com/course/python-for-data-science/v2/module/merging-datasets-c8ztil#corise_claka2d1n000p2a76ecnog17z)

In order for your calculations to run correctly, you need to have only the "relevant" numbers/entries present in your dataset. This means no headers, footers, redudant IDs, etc. Can you remove the first row and column, since you won't be needing them? Verify your work by again by printing out the first four columns.

In [None]:
# Remove the first column and row

matrix_r = my_data[1:,1:]
# Print out the first four columns

matrix_r[:,0:4]

array([['23726706', '35815036', '31553121', '34745823'],
       ['$88.00', '$105.00', '$152.00', '$87.00'],
       ['52.34916', '52.42419', '52.43237', '52.2962'],
       ['4.97879', '4.95689', '4.91821', '5.01231']], dtype='<U18')

array([['23726706', '35815036', '31553121', '34745823'],
       ['$88.00', '$105.00', '$152.00', '$87.00'],
       ['52.34916', '52.42419', '52.43237', '52.2962'],
       ['4.97879', '4.95689', '4.91821', '5.01231']], dtype='<U18')

<details>
  <summary>Show Expected Output</summary>

  
```
array([['23726706', '35815036', '31553121', '34745823'],
       ['$88.00', '$105.00', '$152.00', '$87.00'],
       ['52.34916', '52.42419', '52.43237', '52.2962'],
       ['4.97879', '4.95689', '4.91821', '5.01231']], dtype='<U18')
```

<details>
<summary>Show Solution</summary>

```python
# Remove the first column and row
matrix = my_data[1:, 1:]

# Print out the first four columns
matrix[:, :4]
```

</details>
</details>

#### Task 3: Wide to long

[*\[Related section on CoRise\]*](https://corise.com/course/python-for-data-science/v2/module/loading-inspect-dataset-kzxvy#corise_clak8qb34000h2a76ll8r2ole)

As stated previously, our dataset is shifted by 90 degrees. Let's shift it another 90 degrees to get it back to how we'd expect, which is in a much more readable format. Please find in the course material in CoRise the correct operation that you'd need to use to do that. Again, verify your work by printing out of the first five rows.

In [None]:
# Shift the matrix by 90 degrees
matrix_r_90 = matrix_r.T
matrix_r_90

# Print out the first five rows
# Entries: airbnb_id, price_usd, latitude, longitude
matrix_r_90[0:5,:]


array([['23726706', '$88.00', '52.34916', '4.97879'],
       ['35815036', '$105.00', '52.42419', '4.95689'],
       ['31553121', '$152.00', '52.43237', '4.91821'],
       ['34745823', '$87.00', '52.2962', '5.01231'],
       ['44586947', '$160.00', '52.31475', '5.0303']], dtype='<U18')

<details>
  <summary>Show Expected Output</summary>

  
```
array([['23726706', '$88.00', '52.34916', '4.97879'],
       ['35815036', '$105.00', '52.42419', '4.95689'],
       ['31553121', '$152.00', '52.43237', '4.91821'],
       ['34745823', '$87.00', '52.2962', '5.01231'],
       ['44586947', '$160.00', '52.31475', '5.0303']], dtype='<U18')
```

<details>
<summary>Show Solution</summary>

```python
# Shift the matrix by 90 degrees
matrix = matrix.T

# Print out the first five rows
matrix[:5, :]
```

</details>
</details>

Now that all data is loaded properly, let's clean it up a bit like we did before by removing string characters and setting the right type 😁.

#### Task 4: That character is not appropriate

[*\[Related section on CoRise\]*](https://corise.com/course/python-for-data-science/v2/module/merging-datasets-c8ztil#corise_claka2ge3000q2a76vqgh1db3)

String characters like commas and dollar signs are yet again present in the dataset. Please find in the CoRise course materials the correct operation to filter out these two string characters from the dataset.

In [None]:
# Remove the dollar sign

import numpy as np
matrix_wo_dollar =np.char.replace(matrix_r_90, "$", "") # YOUR CODE HERE

# Remove the comma
matrix_wo_dollar_comma = np.char.replace(matrix_wo_dollar, ",", "") # YOUR CODE HERE

<details>
    <summary>Show Solution</summary>

```python
# Remove the dollar sign
matrix = np.char.replace(matrix, "$", "")

# Remove the comma
matrix = np.char.replace(matrix, ",", "")
```
</details>

Awesome! Now the dataset contains only numerical values allowing us to perform numerical operations... at least if we set the type right!

#### Task 5: Verification is the key to success!

[*\[Related section on CoRise\]*](https://corise.com/course/python-for-data-science/v2/module/merging-datasets-c8ztil#corise_claka2ge3000q2a76vqgh1db3)

Let's verify our matrix to confirm there are no more string characters present. Check to see if a dollar sign or comma still appears anywhere in the matrix. 


In [None]:
# Check if the dollar sign is in our dataset
#matrix_wo_dollar_comma[(np.char.find(matrix_wo_dollar_comma, "$") > -1) | (np.char.find(matrix_wo_dollar_comma, ",") > -1)]
matrix_wo_dollar_comma[(np.char.find(matrix_wo_dollar_comma, "$") > -1)]

array([], dtype='<U18')

<details>
  <summary>Show Expected Output</summary>

  
```
array([], dtype='<U18')
```

<details>
<summary>Show Solution</summary>

```python
# Check if the dollar sign is in our dataset
matrix[np.char.find(matrix, "$") > -1]
```

</details>
</details>

In [None]:
# Check if the comma sign is in our dataset
matrix_wo_dollar_comma[(np.char.find(matrix_wo_dollar_comma, ",") > -1)]

array([], dtype='<U18')

<details>
  <summary>Show Expected Output</summary>

  
```
array([], dtype='<U18')
```

<details>
<summary>Show Solution</summary>

```python
# Check if the dollar sign is in our dataset
matrix[np.char.find(matrix, ",") > -1]
```

</details>
</details>

#### Task 6: Are you my type?

[*\[Related section on CoRise\]*](https://corise.com/course/python-for-data-science/v2/module/merging-datasets-c8ztil#corise_claka2ge3000q2a76vqgh1db3)

Enabling numerical operations (calculations) requires you to change the `dtype` from string/Unicode characters to [float of 32-bit precision](https://numpy.org/doc/stable/user/basics.types.html?highlight=data%20types). Please change the dtype of the matrix to float32.


In [None]:
# Change Unicode to float32
matrix = matrix_wo_dollar_comma.astype(np.float32)

# Print out the first five rows (and inspect the dtype for correctness)
# Entries: airbnb_id, price_usd, latitude, longitude
matrix[0:5,:]

array([[23726706.     ,       88.     ,       52.34916,        4.97879],
       [35815036.     ,      105.     ,       52.42419,        4.95689],
       [31553120.     ,      152.     ,       52.43237,        4.91821],
       [34745824.     ,       87.     ,       52.2962 ,        5.01231],
       [44586948.     ,      160.     ,       52.31475,        5.0303 ]],
      dtype=float32)

<details>
  <summary>Show Expected Output</summary>

  
```
array([[23726706.     ,       88.     ,       52.34916,        4.97879],
       [35815036.     ,      105.     ,       52.42419,        4.95689],
       [31553120.     ,      152.     ,       52.43237,        4.91821],
       [34745824.     ,       87.     ,       52.2962 ,        5.01231],
       [44586948.     ,      160.     ,       52.31475,        5.0303 ]],
      dtype=float32)
```

<details>
<summary>Show Solution</summary>

```python
# Change unicode to float32
matrix = matrix.astype(np.float32)

# Print out the first five rows (and inspect the dtype for correctness)
matrix[:5, :]
```

</details>
</details>

## The Price Is Right

<center>
  <img src=https://wwwimage-tve.cbsstatic.com/thumbnails/photos/w400-q80/blog/tpir-logo-promo_0_0.jpg width="500" align="center" />
</center>
<br/>

[*\[Related section on CoRise\]*](https://corise.com/course/python-for-data-science/v2/module/math-superpowers-jkk4pg#corise_clalbmvov00002a7683kahrhx)

Since all our values in the matrix are now recognized as numbers, we can perform some awesome calcultions! 

Our next objective is to change the currency from US dollars to another currency. This can be any currency you like, except for the US dollar. Let's first import the library that helps us to make these conversions. Then let's have another look at the first 5 rows of our matrix.

In [None]:



from currency_converter import CurrencyConverter

cc = CurrencyConverter()

def convert_to_euro(dollar: float, currency: str):
        return cc.convert(dollar,currency)
    
price_euro=[]


for i in matrix_t_c:
    price_euro.append(i)


convert_vec = np.vectorize(convert_to_euro)


final_conv_=convert_vec(matrix[:, 1], "EUR")

final_conv_

matrix[:,1]

array([ 88., 105., 152., ..., 180., 174.,  65.], dtype=float32)

The currency conversion calculations you'll be performing should be applied to the second column. 

As a reminder, you should use the number "1" which represents the second column of the matrix, since indexes start at the number zero.

Please only output the second column below:

In [None]:
matrix_f= matrix_wo_dollar_comma.astype(np.float32)
matrix_f[:,1]

array([ 88., 105., 152., ..., 180., 174.,  65.], dtype=float32)

<details>
  <summary>Show Expected Output</summary>

```
array([ 88., 105., 152., ..., 180., 174.,  65.], dtype=float32)
```

<details>
<summary>Show Solution</summary>

```python
matrix[:, 1]

```

</details>
</details>

#### Task 7: Pick any currency

[*\[Related section on CoRise\]*](https://corise.com/course/python-for-data-science/v2/module/math-superpowers-jkk4pg#corise_clalbmvov00002a7683kahrhx)

The tool you'll be using has a total of 42 currencies. Please select one of them, and use it to convert the dollars into your chosen currency. You can check which are available by running:

In [None]:
cc.currencies

{'AUD',
 'BGN',
 'BRL',
 'CAD',
 'CHF',
 'CNY',
 'CYP',
 'CZK',
 'DKK',
 'EEK',
 'EUR',
 'GBP',
 'HKD',
 'HRK',
 'HUF',
 'IDR',
 'ILS',
 'INR',
 'ISK',
 'JPY',
 'KRW',
 'LTL',
 'LVL',
 'MTL',
 'MXN',
 'MYR',
 'NOK',
 'NZD',
 'PHP',
 'PLN',
 'ROL',
 'RON',
 'RUB',
 'SEK',
 'SGD',
 'SIT',
 'SKK',
 'THB',
 'TRL',
 'TRY',
 'USD',
 'ZAR'}

A suggestion for those who don't know which to choose: Feel free to use GBP.

In [None]:
# Get the rate of conversaton from the US dollar to your currency of choice
gbp_rate = cc.convert(1, "USD", "GBP")

# Multiply the dollar column by your currency of choice
matrix_d_c_gbp= matrix_f[:, 1]*gbp_rate



88.0


<details>
<summary>Show Solution</summary>

```python
# Get the rate of converting dollar to your currency of choice
gbp_rate = cc.convert(1, "USD", "GBP")  # British Pound

# Multiply the dollar column by your currency of choice
matrix[:, 1] = matrix[:, 1] * gbp_rate
```

</details>

#### Task 8: Inflation!

[*\[Related section on CoRise\]*](https://corise.com/course/python-for-data-science/v2/module/math-superpowers-jkk4pg#corise_cl9k5kagm003q3b6pek28zg1w)

Recent inflation all around the world has caused many companies to raise their prices. Consequently, Airbnb listings have also raised their prices by a certain amount. Find the 2022 annual inflation rate for your currency of choice. If you can't find the inflation rate online, use 7% as value. Apply this inflation rate to our newly updated prices.

In [None]:
# Multiply the dollar column by the inflation percentage (1.00 + inflation)
matrix_infla_gbp= matrix_d_c_gbp*(1.07)
matrix_infla_gbp

array([ 77.32021 ,  92.25708 , 133.5531  , ..., 158.15498 , 152.88316 ,
        57.111526], dtype=float32)

<details>
  <summary>Show Expected Output</summary>

```
array([ 77.32021 ,  92.25708 , 133.5531  , ..., 158.15498 , 152.88316 ,
        57.111526], dtype=float32)
```

<details>
<summary>Show Solution</summary>

```python
# Multiply the dollar column by the inflation percentage (1.00 + inflation)
matrix[:, 1] = matrix[:, 1] * 1.07
```

</details>
</details>

#### Task 9: Too many decimals!

[*\[Related section on CoRise\]*](https://corise.com/course/python-for-data-science/v2/module/math-superpowers-jkk4pg#corise_cl9k5vlbx004z3b6pz6puz9x4)

You might have some prices longer than two decimals after changing the currency and adjusting the price for inflation. Please round the prices **down** to the nearest two decimals using a NumPy native function. [Here's a hint.](https://numpy.org/doc/stable/reference/generated/numpy.around.html)

In [None]:
# Round down the new currency column to 2 decimals
matrix_infla_gbp_rounded = np.round(matrix_infla_gbp, 2)
matrix_infla_gbp_rounded

array([ 77.32,  92.26, 133.55, ..., 158.15, 152.88,  57.11], dtype=float32)


<details>
<summary>Show Solution</summary>

```python
# Round down the new currency column to 2 decimals
matrix[:, 1] = np.round(matrix[:, 1], 2)
```

</details>

## Where u (want to be) at?


<center>
  <img src=https://media.timeout.com/images/105504583/750/422/image.jpg width="500" align="center" />
</center>
<br/>

Amsterdam is a city with a long history and a rich international culture, so there is always [something interesting to see and do](https://www.iamsterdam.com/en/see-and-do/things-to-do/top-20-things-to-do-in-amsterdam). What if you were to visit Amsterdam? You'd probably want to have your Airbnb close to your favorite spot!

#### Task 10: Choose your location

Look up a place you'd like to visit in Amsterdam's city center, along with its longitude and latitude. We want to save this for choosing an Airbnb listing to our liking. You can get coordinates from [Google](https://www.google.com/) by searching like so:

<center>
  <img src=https://i.ibb.co/XXdkH3z/Screen-Shot-2022-10-24-at-2-42-54-PM.png width="500" align="center" />
</center>
<br/>




In [None]:
# Cat cabinet 
latitude = 52.3656 # YOUR COORDINATES
longitude = 4.8915 # YOUR COORDINATES



## Listing All Listings

<center>
  <img src=https://images0.persgroep.net/rcs/vnd5KBhggcKV72YJjpLWH_-xljU/diocontent/131036963/_crop/34/170/1378/778/_fitwidth/763?appId=93a17a8fd81db0de025c8abd1cca1279&quality=0.8&desiredformat=webp width="500" align="center" />
</center>
<br/>

Imagine Airbnb Amsterdam decided to deviate from Airbnb Global and provide a feature on their website that showed the best listings for you based on the locations you were planning to visit. Wouldn't it make sense to choose a place to stay in a location closest to where you're likely to go most often?

So this is the most exciting part: You're going to calculate just that! You will limit your results to your favorite location in Amsterdam (as chosen above) and the surrounding available Airbnb listings using math and NumPy!

We've already provided you with the math calculations down below. Please make sure to run that method!

You'll have to use this method in a `for` loop or by using [`np.vectorize`](https://numpy.org/doc/stable/reference/generated/numpy.vectorize.html) as was shown on CoRise. 

In [None]:
import math

def from_location_to_airbnb_listing_in_meters(lat1: float, lon1: float, lat2: list, lon2: list):
    # Source: https://community.esri.com/t5/coordinate-reference-systems-blog
    # /distance-on-a-sphere-the-haversine-formula/ba-p/902128
    
    R = 6371000  # Radius of Earth in meters
    phi_1 = math.radians(lat1)
    phi_2 = math.radians(lat2)

    delta_phi = math.radians(lat2 - lat1)
    delta_lambda = math.radians(lon2 - lon1)

    a = (
        math.sin(delta_phi / 2.0) ** 2
        + math.cos(phi_1) * math.cos(phi_2) * math.sin(delta_lambda / 2.0) ** 2
    )

    c = 2 * math.atan2(math.sqrt(a), math.sqrt(1 - a))

    meters = R * c  # Output distance in meters

    return round(meters, 0)

#### Task 11: Loop or vectorize!

[*\[Related section on CoRise\]*](https://corise.com/course/python-for-data-science/v2/module/math-superpowers#corise_cl9k5z30h005w3b6p5jv0vs82)

Please implement a `for` loop (or vectorize) the `from_location_to_airbnb_listing_in_meters` function. ***For now just calculate these numbers. Don't add it as a new column to your matrix.***

In [None]:


# Create a loop or vectorized way to calculate the distance,

#loop

loop_append_from_location_to_airbnb_listing_in_meters =[]
for i in range(len(matrix_f[:,2])):
    loop_append_from_location_to_airbnb_listing_in_meters.append(from_location_to_airbnb_listing_in_meters(latitude, longitude, matrix_f[:,2][i], matrix_f[:,3][i]))


#vectorize

vect_from_location_to_airbnb_listing_in_meters=np.vectorize(from_location_to_airbnb_listing_in_meters)
# going over all latitude and longitude entries in the dataset

vect_res=vect_from_location_to_airbnb_listing_in_meters(latitude, longitude, matrix_f[:, 2], matrix_f[:, 3])

loop_res= np.array(loop_append_from_location_to_airbnb_listing_in_meters)

vect_res
loop_res

array([6203., 7882., 7642., ..., 6783., 5407., 5342.])

<details>
<summary>Show Solution</summary>

```python
# Allow a Python-function to be used in a (semi-)vectorized way.
conv_to_meters = np.vectorize(from_location_to_airbnb_listing_in_meters)

# Apply the function
conv_to_meters(latitude, longitude, matrix[:, 2], matrix[:, 3])
```

</details>

Now let's use the `timeit` function to see how quickly the code is ran!

[*\[Related section on CoRise\]*](https://corise.com/course/python-for-data-science/v2/module/math-superpowers-jkk4pg#corise_cl9k73m23006d3b6p58l5fw0e)

In [None]:
%%timeit -r 4 -n 100

# Allow a Python function to be used in a (semi-)vectorized way
conv_to_meters = np.vectorize(from_location_to_airbnb_listing_in_meters)

# Apply the function, use timeit
conv_to_meters(latitude, longitude, matrix_f[:, 2], matrix_f[:, 3])

18.4 ms ± 5.71 ms per loop (mean ± std. dev. of 4 runs, 100 loops each)


## Can We Do It Faster?

<center>
  <img src=https://upload.wikimedia.org/wikipedia/commons/9/9f/Serengeti_Lion_Running_saturated.jpg width="500" align="center" />
</center>
<br/>

Now you might be thinking to yourself, *can we do this faster*? 

The answer is ***YES***! 

But please remember that optimization is always a trade-off between the need for speed and the need for delivery of your results!

---

#### (Extra Credit)  Task 12: NumPy all the way!

[*\[Related section on CoRise\]*](https://corise.com/course/python-for-data-science/v2/module/math-superpowers-jkk4pg#corise_clatg1k0e000q3b6p3gqmi24g)

Now convert the `from_location_to_airbnb_listing_in_meters` function into a pure NumPy function. You can do this by changing all the imported math functions into their NumPy variant. ***For now just calculate these numbers. Don't add it as a new column to your matrix.***

In [None]:
def from_location_to_airbnb_listing_in_meters(lat1: float, lon1: float, lat2: np.ndarray, lon2: np.ndarray):   
    R = 6371000  # Radius of Earth in meters
    phi_1 = np.radians(lat1) # CHANGE THIS
    phi_2 = np.radians(lat2) # CHANGE THIS

    delta_phi = np.radians(lat2 - lat1) # CHANGE THIS
    delta_lambda = np.radians(lon2 - lon1) # CHANGE THIS

    a = (
        np.sin(delta_phi / 2.0) ** 2 # CHANGE THIS
        + np.cos(phi_1) * np.cos(phi_2) * np.sin(delta_lambda / 2.0) ** 2 # CHANGE THIS (3x)
    )

    c = 2 * np.arctan2(math.sqrt(a), np.sqrt(1 - a)) # CHANGE THIS (3x)

    meters = R * c  # Output distance in meters

    return np.round(meters, 0) # CHANGE THIS

In [None]:
# Run the converted NumPy method and check if it works
... # It is in the previous cell

<details>

<summary>Show Solution</summary>

```python
def from_location_to_airbnb_listing_in_meters(
    lat1: float, lon1: float, lat2: np.ndarray, lon2: np.ndarray
):
    R = 6371000  # radius of Earth in meters
    phi_1 = np.radians(lat1)
    phi_2 = np.radians(lat2)

    delta_phi = np.radians(lat2 - lat1)
    delta_lambda = np.radians(lon2 - lon1)

    a = (
        np.sin(delta_phi / 2.0) ** 2
        + np.cos(phi_1) * np.cos(phi_2) * np.sin(delta_lambda / 2.0) ** 2
    )

    c = 2 * np.arctan2(np.sqrt(a), np.sqrt(1 - a))

    meters = R * c  # output distance in meters

    return np.round(meters, 0)
```

</details>

#### (Extra Credit) Task 13: How much faster is it?

[*\[Related section on CoRise\]*](https://corise.com/course/python-for-data-science/v2/module/math-superpowers-jkk4pg#corise_cl9k6pc1700683b6py8seb6f7)

Use the `timeit` function so we can compare it to the outcome of the prior task. This should be very similar to the `timeit`-code that was used below Task 11.

In [None]:

# Copy the code from Task 12 and add a timeit function above this comment
def from_location_to_airbnb_listing_in_meters(lat1: float, lon1: float, lat2: np.ndarray, lon2: np.ndarray):   
    R = 6371000  # Radius of Earth in meters
    phi_1 = np.radians(lat1) # CHANGE THIS
    phi_2 = np.radians(lat2) # CHANGE THIS

    delta_phi = np.radians(lat2 - lat1) # CHANGE THIS
    delta_lambda = np.radians(lon2 - lon1) # CHANGE THIS

    a = (
        np.sin(delta_phi / 2.0) ** 2 # CHANGE THIS
        + np.cos(phi_1) * np.cos(phi_2) * np.sin(delta_lambda / 2.0) ** 2 # CHANGE THIS (3x)
    )

    c = 2 * np.arctan2(np.sqrt(a), np.sqrt(1 - a)) # CHANGE THIS (3x)

    meters = R * c  # Output distance in meters
    return np.round(meters, 0) # CHANGE THIS



In [None]:

%%timeit -r 4 -n 100
from_location_to_airbnb_listing_in_meters(latitude, longitude, np.array(matrix_f[:, 2]), np.array(matrix_f[:, 3]))



314 µs ± 53.4 µs per loop (mean ± std. dev. of 4 runs, 100 loops each)


WOW! You see a massive speed-up just by switching your functions from default Python functions to their NumPy variants! Awesome!

---

## Prep the Dataset for Download!


Now that we've created a function to calculate the distance in meters for every Airbnb listing, we'll perform this calculation on the entire dataset and add the outputs to the matrix as a new column.

Next to that, we'll add another column that contains only ones and zeros to represent the "color" of an entry/row. This column can be used later if you want to turn this dataset into an app using [Streamlit](https://streamlit.io/). This resource is great for when you want to translate your Python projects into an interactive website. More on that in the next section.

As you'll see from the code, we'll also add our favorite location as an entry/row. (We've selected the coordinates of the Rijksmuseum. Feel free to change it to your favorite location in Amsterdam). 




In [None]:
# Run the previous method
meters = from_location_to_airbnb_listing_in_meters(
    latitude, longitude, matrix_f[:, 2], matrix_f[:, 3]
)

# Add an axis to make concatenation possible
meters = meters.reshape(-1, 1)

# Append the distance in meters to the matrix
matrix = np.concatenate((matrix_f, meters), axis=1)

In [None]:
# Append a color to the matrix
colors = np.zeros(meters.shape)
matrix = np.concatenate((matrix, colors), axis=1)

# Append our entry to the matrix
fav_entry = np.array([1, 0, 52.3600, 4.8852, 0, 1]).reshape(1, -1) # Change coordinates to your favorite location
matrix = np.concatenate((fav_entry, matrix), axis=0)

# Entries: airbnb_id, price, latitude, longitude,
# meters from favorite point, color
matrix[:5, :]

array([[       1.        ,        0.        ,       52.36      ,
               4.8852    ,        0.        ,        1.        ],
       [23726706.        ,       88.        ,       52.34915924,
               4.97878981,     6203.        ,        0.        ],
       [35815036.        ,      105.        ,       52.42419052,
               4.95689011,     7882.        ,        0.        ],
       [31553120.        ,      152.        ,       52.43236923,
               4.91821003,     7642.        ,        0.        ],
       [34745824.        ,       87.        ,       52.2961998 ,
               5.01231003,    11267.        ,        0.        ]])

In [None]:
# Export the data to use in the primer for next week
np.savetxt("WK1_Airbnb_Amsterdam_listings_proj_solution.csv", matrix, delimiter=",")

Great! By running all the cells above, you've saved the matrix here on your Google Colab instance. Let's now look into how to download the dataset to your local machine.

### Download the Dataset to Your Local Machine!

Google Colab comes with its own Python packages, allowing us to quickly download generated files like so:

In [None]:
from google.colab import files

# Download the file locally
files.download('WK1_Airbnb_Amsterdam_listings_proj_solution.csv')

<IPython.core.display.Javascript object>

<IPython.core.display.Javascript object>

### Make an App for Your Portfolio!

<center>
  <img src=https://griddb-pro.azureedge.net/en/wp-content/uploads/2021/08/streamlit-1160x650.png width="500" align="center" />
</center>
<br/>

**Participants such as yourselves often want to use the weekly CoRise projects for their portfolios. To facilitate that, we've created this section. It might seem like a lot, but it's actually just following instructions and copy-pasting. Reach out on Slack if you get stuck!** 

You will make an app that visualizes the dataset as a DataFrame and as a geographic visualization like:

<center>
  <img src=https://i.ibb.co/gRhj6Jd/Screen-Shot-2022-11-10-at-3-58-17-PM.png width="500" align="center" />
</center>
<br/>

Five out of the six columns in the dataset are used as so:
- **Listing_id**: Ignored for visualization purposes
- **Price**: Hovering over a blue/red dot displays in **bold** the price at the top
- **Latitude**: Used to plot the blue/red dot on the map
- **Longitude**: Used to plot the blue/red dot on the map
- **Meters from favorite point**: Hovering over a blue/red dot displays the number of meters from the blue point
- **Color**: Dependent on the category its assigned

To visualize this, we will again use a library called [Streamlit](https://streamlit.io/). For now you are not expected to know how Streamlit works, but you are expected to be able to copy-paste and follow instructions if you want to share this project as part of your portfolio!

We are going to use [Streamlit Share](https://share.streamlit.io/) to host your projects. It's a website that allows us to host our interactive projects for free online! Again, we don't expect you to understand how to use and/or modify the code we will show below. We do expect you to read the instructions and copy-paste our code to the Streamlit Share platform. Feel free to change it any way you like. Some great starting points are [here](https://python.plainenglish.io/how-to-build-web-app-using-streamlit-pandas-numpy-5e134f0cf552), [here](https://docs.streamlit.io/library/get-started/create-an-app), [here](https://streamlit.io/components), and [here](https://streamlit.io/gallery)!

*Please make sure to change the currency symbol in the code below to the appropriate one if you've chosen something other than GBP/pound.*

In [None]:
%%writefile streamlit_app.py
import pandas as pd
import plotly.express as px
import streamlit as st

# Display title and text
st.title("Week 1 - Data and visualization")
st.markdown("Here we can see the dataframe created during this weeks project.")

# Read dataframe
dataframe = pd.read_csv(
    "WK1_Airbnb_Amsterdam_listings_proj_solution.csv",
    names=[
        "Airbnb Listing ID",
        "Price",
        "Latitude",
        "Longitude",
        "Meters from chosen location",
        "Location",
    ],
)

# We have a limited budget, therefore we would like to exclude
# listings with a price above 100 pounds per night
dataframe = dataframe[dataframe["Price"] <= 100]

# Display as integer
dataframe["Airbnb Listing ID"] = dataframe["Airbnb Listing ID"].astype(int)
# Round of values
dataframe["Price"] = "£ " + dataframe["Price"].round(2).astype(str) # <--- CHANGE THIS POUND SYMBOL IF YOU CHOSE CURRENCY OTHER THAN POUND
# Rename the number to a string
dataframe["Location"] = dataframe["Location"].replace(
    {1.0: "To visit", 0.0: "Airbnb listing"}
)

# Display dataframe and text
st.dataframe(dataframe)
st.markdown("Below is a map showing all the Airbnb listings with a red dot and the location we've chosen with a blue dot.")

# Create the plotly express figure
fig = px.scatter_mapbox(
    dataframe,
    lat="Latitude",
    lon="Longitude",
    color="Location",
    color_discrete_sequence=["blue", "red"],
    zoom=11,
    height=500,
    width=800,
    hover_name="Price",
    hover_data=["Meters from chosen location", "Location"],
    labels={"color": "Locations"},
)
fig.update_geos(center=dict(lat=dataframe.iloc[0][2], lon=dataframe.iloc[0][3]))
fig.update_layout(mapbox_style="stamen-terrain")

# Show the figure
st.plotly_chart(fig, use_container_width=True)

Writing streamlit_app.py


The **%%writefile [FILE_NAME].[FILE_EXTENSION]** command let's us save the code written in the cells in your Google Colab instance. Having it saved like that enables us to download it as a file, as seen below:

In [None]:
from google.colab import files

# Download the file locally
files.download('streamlit_app.py')

<IPython.core.display.Javascript object>

<IPython.core.display.Javascript object>

In [None]:
%%writefile requirements.txt
pandas
streamlit
plotly

Writing requirements.txt


In [None]:
from google.colab import files

# Download the file locally
files.download('requirements.txt')

<IPython.core.display.Javascript object>

<IPython.core.display.Javascript object>

Please verify that you've downloaded three files:
- `WK1_Airbnb_Amsterdam_listings_proj_solution.csv`
- `streamlit_app.py`
- `requirements.txt`

Now let's head over to GitHub and [create an account](https://github.com/signup).

Then, since you are logged in [go to GitHub.com](https://github.com) and click on the **+** icon at the top-right corner and select **New repository**.

<center>
  <img src=https://i.ibb.co/4gkPBCp/Screen-Shot-2022-11-28-at-1-51-02-PM.png width="300" align="center" />
</center>
<br/>

Here you provide:
- **Repository name**: Up to you
- **License**: Up to you. We recommend **apache-2.0**.

- **Public or private?** Public, otherwise you can't host it on [Streamlit Share](https://share.streamlit.io)!

<center>
  <img src=https://i.ibb.co/0B533dw/Screen-Shot-2022-11-28-at-1-55-14-PM.png width="450" align="center" />
</center>
<br/>

Then upload the three files to this URL below. ***Please modify it before copy-pasting it***:

```https://github.com/[YOUR_ACCOUNT_NAME]/[YOUR_REPOSITORY_NAME]/upload/main```

<center>
  <img src=https://i.ibb.co/jTsrgJw/Screen-Shot-2022-11-28-at-1-58-31-PM.png width="500" align="center" />
</center>
<br/>

Commit directly to the `main` branch, then click **Commit changes**.

Next, you have to create an account on [Streamlit Share](https://share.streamlit.io/signup). 

<center>
  <img src=https://i.ibb.co/znFngJc/Screen-Shot-2022-11-28-at-1-59-47-PM.png width="500" align="center" />
</center>
<br/>

It's recommended to click **Continue with GitHub**. 

Then, select **New app** **>** **Deploy a new app...** **>** **From existing repo**.

<center>
  <img src=https://i.ibb.co/VQPQzt3/Screen-Shot-2022-11-28-at-2-05-04-PM.png width="500" align="center" />
</center>

Followed by providing your:

```[GITHUB_ACCOUNT_NAME]/[GITHUB_REPOSITORY]```

<center>
  <img src=https://i.ibb.co/PDSQccD/Screen-Shot-2022-11-28-at-2-10-47-PM.png width="500" align="center" />
</center>

You will have to wait around 1-5 minutes, then an automatic hyperlink is generated for your new website. An example is this app:

```https://[GITHUB_ACCOUNT_NAME]-[GITHUB_REPOSITORY]-[RANDOM_6_LETTER_STRING].streamlit.app/```

***Please modify the link before copy-pasting it.***

---

# 🎉 CONGRATULATIONS!

You've made it to the end of the Week 1 assignment! You should be proud. 

If you have any lingering questions, post them on Slack! As you know, we're always here to help.

And if you want any additional challenge questions, check out the bonus extensions below.

---

## Extensions (Optional)

<center>
  <img src=https://miro.medium.com/max/4800/1*qd9TMO5j_wLxDbPT7qkdxw.png width="500" align="center" />
</center>
<br/>

We invite you to try and see if you can apply [Numba](https://numba.pydata.org/) to the project and potentially speed up some of these calculations. This is a tool commonly used in the industry to make code run faster. 

Another experiment you could try (if you have access to a GPU) is to see if running your code via [CuPy](https://cupy.dev/) speeds up your implementation. A great way to start is described in [this post](https://medium.com/data-analysis-center/a-practical-approach-to-speed-up-python-code-numba-numpy-cupy-65ab52526ad4).

Lastly, Can't get enough of Numpy 🥰? Try [this tutorial on Kaggle](https://www.kaggle.com/code/legendadnan/numpy-tutorial-for-beginners-data-science/notebook), which covers some more interesting NumPy uses.

# Next Up?
Next week we will delve into Pandas, a Python library focused on tabular data instead of matrixed data. We'll show you how those are different, and how you'll be able to harness Pandas for your machine learning journey with Python!