## Visualization: `plotly`

### Programming for Data Science
### Created: April 11, 2023
---  

### PREREQUISITES
- variables
- data types
- numpy
- pandas
- matplotlib

### SOURCES 
- https://plotly.com/python/

### OBJECTIVES
- Introduce some basic functionality of the `plotly` package

### CONCEPTS
- creating baisc visualizations (e.g., barplot, histrogram, scatterplot)
- drawing specialized plots (e.g., map plot, sunburst plot)
- changing xlabel, ylabel, and title
- creating subplots or facets to display multiple plots in a single figure 
- adding animnations to visualize changes over time or across data subsets.

---

### `plotly`

- `plotly` is a high-level interface to [plotly.js](https://github.com/plotly/plotly.js), based on [d3.js](https://d3js.org/) which provides an easy-to-use UI to generate slick D3 interactive graphics. These interactive graphs give the user the ability to zoom the plot in and out, hover over a point to get additional information, filter to groups of points, and much more. These interactive components contribute to an engaging user experience and allows information to be displayed in ways that are not possible with static figures.

- `plotly` is a web application for creating and sharing data visualizations. `plotly` can work with several programming languages and applications including R, Python, and Microsoft Excel. We're going to concentrate on creating different graphs with `plotly`.

- Interactive Graphics!
  - Zooming
  - Silencing
  - Hovering
  - Sliding, etc.


#### Install Plotly

Install with pip
`pip install plotly==5.14.1`

#### Load packages and import some data

In [None]:
import pandas as pd
import plotly.express as px

Import the *iris* dataset and the *gapminder* dataset from `plotly`. You can check the available data from `plotly` [here](https://plotly.com/python-api-reference/generated/plotly.express.data.html).

In [None]:
iris = px.data.iris() # load iris data
df = px.data.gapminder() # load gapminder data
iris.head()

In [None]:
df.head()

In [None]:
df.info()

---

### Barplot

Draw a barplot with *species* variable from the iris data.

In [None]:
px.bar(iris, x='species', title="Frequencies of Species")

In [None]:
p = px.bar(iris, x='species', title="Frequencies of Species")
print(p)

In [None]:
print(p.layout)

In [None]:
p.layout.title = 'frequencies of species' # change title

In [None]:
p.show()

Draw another barplot using variables from the gapminder dataset.

In [None]:
px.bar(df.query("country=='United States' | country=='Canada'"), 
           x='year', y='pop', color='country') # stacked

Create horizontal barplots and reorder the countries accordingly.

In [None]:
px.bar(df.query("country=='United States' | country=='Canada'"), 
       x='year', y='pop', color='country', barmode='group',
       category_orders={"country": ["United Sates", "Canada"]})

---
### Histrogram

Draw a histogram using *sepal_length* from the iris dataset.

In [None]:
px.histogram(iris, x="sepal_length", nbins=10)

Draw a histogram of the average *sepal width* in the bins of *sepal_length*.

In [None]:
px.histogram(iris, x="sepal_length", y='sepal_width', histfunc='avg', nbins=10,
             labels={'sepal_length': 'Sepal Length', 'sepal_width':'Sepal Width'})

Modify the plot above by varying the color by species.

In [None]:
px.histogram(iris, x="sepal_length", y='sepal_width', histfunc='avg',
             labels={'sepal_length': 'Sepal Length', 'sepal_width':'Sepal Width'}, 
             color="species", 
             barmode="overlay") # allow overlaying histograms.

---
### Line plot

In [None]:
p = px.line(df.query("country=='United States' | country=='Canada'"), 
        x="year", y="lifeExp", color='country')

In [None]:
# change layout
p.update_layout(
    legend=dict(
    yanchor="bottom",
    y=0.02,
    xanchor="right",
    x=0.99)
    ,title='Average Life Expectancy by Country over Time')

p.update_traces(mode="lines+markers")

---

### Scatterplot

In [None]:
px.scatter(df.query("year==2002 & continent=='Americas'"),  x="lifeExp", y="gdpPercap", color="country",
           title = 'Average Life Expectancy by Country in 2002')

In [None]:
p = px.scatter(df.query("year==2002 & continent=='Americas'"),  x="lifeExp", y="gdpPercap", color="country",
           size='pop', # change the size of points proportionally to the population size.
           title = 'Average Life Expectancy by Country in 2002',
           labels={'lifeExp': 'Life Expectancy', 'gdpPercap':'GDP per Capita'}) # change labels
p.show()

In [None]:
# adjust the width of the point edge and change the edge color to black.
p.update_traces(marker=dict(line=dict(width=1.5, color='black')))

---

### Scatter or Correlation Matrix

Draw a scatter matrix using the iris dataset

In [None]:
p = px.scatter_matrix(iris, 
                      dimensions=["sepal_width", "sepal_length", "petal_width", "petal_length"],
                      color="species")

p.show()
#p.update_traces(diagonal_visible=False) # suppress the plots on the diagonal.

Create a correlation matrix and visualize it using the `imshow` function. `imshow` creates an image plot by mapping a 2D array of pixel values to colors and displaying the resulting image in a plot. This function can be used to create visualizations such as heatmaps, image classification results, and more. You can find available colorscales [here](https://plotly.com/python/builtin-colorscales/).

In [None]:
corr_mat = iris.drop(columns=['species_id']).corr()
corr_mat

In [None]:
px.imshow(corr_mat, x=corr_mat.index, y=corr_mat.index,
          color_continuous_scale="Viridis") # try with ["lightgrey", "grey", "black"]

---

### Map Plot

Choropleth maps display data on a map by shading regions with different colors.

In [None]:
px.choropleth(df.query("year == 2002"), 
              locations='iso_alpha',  # ISO country codes
              color='gdpPercap',
              color_continuous_scale="deep",
              title="World GDP per Capita (2002)",
              hover_name="country") 

# add `, projection="natural earth"` and see a difference.

Create a scatter plot on the map.

In [None]:
px.scatter_geo(df.query("year == 2002"),
               locations="iso_alpha",
               color="continent",
               size="pop",
               projection="natural earth",
               hover_name="country")

---
### Sunburst Plot

A sunburst plot is a type of visualization that displays hierarchical data in a radial format, starting from the root and branching outwards to the leaves. The hierarchy is defined by labels and parent attributes. The root is lcoated at the center and the children are added to the outer rings.

In [None]:
grade = pd.DataFrame({
    'subject': ['Math', 'Math', 'Math', 'Science', 'Science', 'English', 'English', 'History'],
    'grade': ['A', 'B', 'C', 'A', 'B', 'A', 'B', 'C'], 
    'number_of_students': [20, 30, 10, 15, 25, 30, 20, 5]
})


px.sunburst(grade, path=['subject', 'grade'], # hierarchical path
            values='number_of_students')


---
### Facets

You can use facets to analyze relationships within subgroups.

In [None]:
px.scatter(df[df['year']==2002],  # alternative way to subset the data.
           x="lifeExp", y="gdpPercap", color="country", 
           facet_col="continent") # add `, facet_col_wrap=2`

---
### Animations

Create an animated scatter plot by specifying the variable(s) to be animated, the animation frames (e.g., time or categories), and the duration and/or transition style of the animation.

In [None]:
p = px.scatter(df[df['continent']=='Americas'], 
               x='lifeExp',
               y='gdpPercap', 
               color='country', 
               animation_frame="year", 
               animation_group="country",
               hover_name="country"
)

p.show()

In [None]:
p.update_xaxes(range=[df.lifeExp.min(), df.lifeExp.max()])
p.update_yaxes(range=[df.gdpPercap.min(), 50000])

p.show()

---  

### TRY FOR YOURSELF
Do the following:

- Use `plotly` to add animinations on the scatter plot based on the gapminder dataset.
   - Create a new dataframe using data from the 1990s and located in Asia:
   - Fit a scatterplot between life expectancy (*lifeExp*) and GDP per Capita (*gdpPercap*), with point size based on population (*pop*)
   - Add x-label, y-label, and title.
   - Adjust x or y limits properly.
   - Display the scatter plot.

In [None]:
df2 = df.query("year < 2000 & continent=='Asia'")
p = px.scatter(df2,
               x='lifeExp',
               y='gdpPercap', 
               color='country', 
               size='pop',
               animation_frame="year", 
               animation_group="country",
               hover_name='country',
               title = 'Average Life Expectancy vs. GDP per Capita',
               labels={'lifeExp': 'Life Expectancy', 'gdpPercap':'GDP per Capita'}
 
)

p.update_xaxes(range=[df.lifeExp.min(), df.lifeExp.max()])
p.update_yaxes(range=[-5000, 50000])

p.show()