# Pandas for Data Analysis: Extra Useful Stuff

## Outline:

* [Fetching Data from APIs](#Fetching-Data-from-APIs)
* [Web Scraping to Build Your Own Dataset](#Web-Scraping-to-Build-Your-Own-Dataset)
* [Anonymizing Data with Faker](#Anonymizing-Data-with-Faker)
* [Ploting Data on Google Maps with gmaps](#Plotting-Data-on-Google-Maps-with-gmaps)

## Fetching Data from APIs

[JSONPlaceholder](https://jsonplaceholder.typicode.com/) comes with a set of 6 common resources:


| Endpoint  | Data         |
|-----------|--------------|
| /posts    | 100 posts    |
| /comments | 500 comments |
| /albums   | 100 albums   |
| /photos   | 5000 photos  |
| /todos    | 200 todos    |
| /users    | 10 users     |

In [None]:
import pandas as pd
import requests

In [None]:
users_url = 'https://jsonplaceholder.typicode.com/users'
r = requests.get(users_url)
data = r.json()
data

In [None]:
ids = [each['id'] for each in data]
usernames = [each['username'] for each in data]

In [None]:
user_df = pd.DataFrame(data={
    'id': ids,
    'username': usernames
})

In [None]:
user_df.head()

In [None]:
albums_url = 'https://jsonplaceholder.typicode.com/albums'
r = requests.get(albums_url)
data = r.json()
data

In [None]:
album_df = pd.DataFrame(data=data)

In [None]:
album_df.head()

In [None]:
users_albums = pd.merge(user_df, album_df, how='inner', left_on='id', right_on='userId')

In [None]:
users_albums

In [None]:
users_albums.username.value_counts()

## Web Scraping to Build Your Own Dataset

In [None]:
import pandas as pd
import requests
from bs4 import BeautifulSoup

In [None]:
url = 'http://econpy.pythonanywhere.com/ex/001.html'
r = requests.get(url)

In [None]:
soup = BeautifulSoup(r.content, 'lxml')
soup

In [None]:
soup.title

In [None]:
soup.div

In [None]:
soup.find_all('div', title='buyer-name')

In [None]:
names = []
for each in soup.find_all('div', title='buyer-name'):
    names.append(each.string)
    
names

In [None]:
soup.find_all(attrs={'class': 'item-price'})

In [None]:
soup.find_all('span', class_='item-price')

In [None]:
prices = []
for each in soup.find_all(attrs={'class': 'item-price'}):
    prices.append(each.string)
                  
prices

In [None]:
list(zip(names, prices))

In [None]:
column_headers = ['name', 'price']
df = pd.DataFrame(list(zip(names, prices)), columns=column_headers)

In [None]:
df.head()

เนื่องจากเว็บอาจจะมีการใช้ pagination เพื่อให้การแสดงผลข้อมูลแบ่งออกเป็นแต่ละหน้า ทำให้ข้อมูลไม่อยู่ในหน้าเดียว วิธีแก้ปัญหาคือ เราจะไล่ scrape ทุกหน้าไปเรื่อยๆ

In [None]:
pages = ['001', '002', '003', '004', '005']
names = []
prices = []
for each_page in pages:
    url = 'http://econpy.pythonanywhere.com/ex/' + each_page + '.html'
    response = requests.get(url)
    soup = BeautifulSoup(response.content, 'lxml')
    
    for each in soup.find_all('div', title='buyer-name'):
        names.append(each.string)

    for each in soup.find_all(attrs={'class': 'item-price'}):
        prices.append(each.string)

In [None]:
column_headers = ['name', 'price']
df = pd.DataFrame(list(zip(names, prices)), columns=column_headers)

In [None]:
df.head()

## Anonymizing Data with Faker

In [None]:
import pandas as pd
from faker import Faker

In [None]:
people_data_url = 'https://raw.githubusercontent.com/lawlesst/vivo-sample-data/master/data/csv/people.csv' 
df = pd.read_csv(people_data_url)
df.head()

In [None]:
fake = Faker()

In [None]:
fake.name()

In [None]:
fake.email()

In [None]:
def new_name(name):
    return fake.name()

In [None]:
df['new_name'] = df.name.map(new_name)

In [None]:
df.head()

## Plotting Data on Google Maps with gmaps

gmaps: https://jupyter-gmaps.readthedocs.io/en/latest/

In [None]:
import gmaps
import gmaps.datasets

ถ้าต้องการให้การแสดงผลข้อมูลในแผนที่เราถูกต้องมากขึ้น และใช้งานได้เยอะๆ เราควรต้องใช้ API key จาก Google

In [None]:
#gmaps.configure(api_key="AI...") # Your Google API key

In [None]:
locations = gmaps.datasets.load_dataset('taxi_rides')

In [None]:
locations

In [None]:
fig = gmaps.figure()
fig.add_layer(gmaps.heatmap_layer(locations))
fig

In [None]:
earthquake_df = gmaps.datasets.load_dataset_as_df('earthquakes')
earthquake_df.head()

In [None]:
locations = earthquake_df[['latitude', 'longitude']]
weights = earthquake_df['magnitude']
fig = gmaps.figure()
fig.add_layer(gmaps.heatmap_layer(locations, weights=weights))
fig