# Lecture 3: 
Visualization

## Zhentao Shi


<img src="graph/Minard.png" width="1000">

kernel: base (python 3.11.3)

## Graphs

* "One picture is worth ten thousand words".
* Modern graphs: web-based, interactive.

* Academia
* Journalism: Economist, SCMP, ...

In [None]:
import numpy as np  
import pandas as pd  
import matplotlib.pyplot as plt
import datetime

In [None]:
# Read the CSV file
d0 = pd.read_csv("data_example/AJR.csv")

# Plot the data
plt.scatter(d0['avexpr'], d0['logpgp95'])
plt.xlabel('avexpr')
plt.ylabel('logpgp95')
plt.show()

In [None]:

# Read the CSV file
bank_0 = pd.read_csv("data_example/bank-full.csv", sep=";")

# Display the dataframe
print(bank_0)

# Print the names of the columns
print(bank_0.columns)

In [None]:
# Scatter plot
plt.scatter(bank_0['age'], bank_0['balance'])
plt.xlabel('Age')
plt.ylabel('Balance')
plt.show()

In [None]:
# Scatter plot with groups
import seaborn as sns
sns.scatterplot(data=bank_0, x='age', y='balance', hue='education', alpha=0.5)
plt.show()

In [None]:
# Create a FacetGrid
g = sns.FacetGrid(bank_0, col='education', row='marital')

# Map a scatter plot to the FacetGrid
g.map(plt.scatter, 'age', 'balance', alpha=0.5)

# Show the plot
plt.show()

In [None]:
# Bar plot with 'education' as hue
sns.countplot(data=bank_0, x='age', hue='education')
plt.show()

In [None]:
# Dodged bar plot with 'education' as hue
sns.countplot(data=bank_0, y='age', hue='education', dodge=True)
plt.show()

### Data Manipulation

In [None]:
# Read the CSV file
d0 = pd.read_csv("data_example/PWT100.csv")

# Display the first few rows of the dataframe
print(d0.head())

# Print the names of the columns
print(d0.columns)

In [None]:
# Select specific columns and filter rows
d1 = d0[['countrycode', 'year', 'rgdpe', 'pop']]
d1 = d1[d1['countrycode'].isin(['CHN', 'RUS', 'JPN', 'USA'])]

# Create new column 'gdpcapita'
d1['gdpcapita'] = d1['rgdpe'] / d1['pop']

# Print the dataframe
print(d1)

In [None]:
# Scatter plot with 'countrycode' as hue
sns.scatterplot(data=d1, x='year', y='rgdpe', hue='countrycode')
plt.show()

In [None]:
# Line plot with 'countrycode' as hue
sns.lineplot(data=d1, x='year', y='gdpcapita', hue='countrycode')
plt.show()

In [None]:
# Select specific columns
s1 = d1[['countrycode', 'year', 'pop']]

# Spread 'year' column into multiple columns with 'pop' as values
s1 = s1.pivot(index='countrycode', columns='year', values='pop')

print(s1)

In [None]:
# Gather '1950' to '2019' columns into key-value pairs
s1 = s1.reset_index().melt(id_vars='countrycode', var_name='year', value_name='pop')

# Print the dataframe
print(s1)

## Interactive Graphs

* [Plotly Express](https://plotly.com/graphing-libraries/)
  * [Youtube](https://www.youtube.com/watch?v=_b2KXL0wHQg)


In [None]:
import plotly.express as px

d0 = pd.read_csv("data_example/AJR.csv")
fig = px.scatter(d0, x='avexpr', y='logpgp95', title='Scatter plot')
fig.show()




* Shinny for Python [posit](https://shiny.posit.co/py/docs/overview.html): [gallary](https://shiny.posit.co/py/gallery/)
* [Shinny Express](https://shiny.posit.co/blog/posts/shiny-express/)
  * Demo: [Shenzhen housing](https://zhentao-shi.shinyapps.io/ShenzhenHousing-Shiny/)
  * Web scrapper: `data_example/Scrape_Lianjia.ipynb`
