# Plotly

Plotly is a visualization tool to create online, interactive charts and dashboards. It also provides libraries for other languages including R and Python to create visualizations from within these languages. We will use Plotly R library to locally create plotly objects in our jupyter notebooks. The following links are good references for learning plotly:

* **Reference** [Plotly R cheat sheet](https://images.plot.ly/plotly-documentation/images/r_cheat_sheet.pdf)
* **Reference** [Plotly R reference](https://plot.ly/r/reference/)

We can use plotly in two different ways in R: 
 1. By calling plot_ly function to create the graphics in as plotly object,
 2. By creating a ggplot object and converting it to plotly object. Let's see how we do that.

It takes a little longer than usual to create plotly objects, but the result is worth the wait. **The plots are interactive**, you can hover your mouse pointer over them to see the data points, and you can zoom in and pan some plotly objects. 

**Run the following cells and hover the mouse over to see the information on data points, and try zooming and panning. If a plot does not appear in the first run, re-run the cell and it should appear in the second run.** 

In [None]:
library(ggplot2)
library(plotly)

# Let's start with the diamonds data set and create a scatterplot by directly using plotly functions. 
set.seed(100)
d <- diamonds[sample(nrow(diamonds),1000),]
head(d)

In [None]:
plot_ly(d, x = ~carat, y = ~price, color = ~carat, size = ~carat, text = ~paste("Clarity: ", clarity), type="scatter", mode="markers")

In [None]:
# We can change what to print when the mouse pointer hovers over data points. 
plot_ly(d, x = ~carat, y = ~price, color = ~carat, size = ~carat, 
        text = ~paste("Clarity: ", clarity), 
        #text = ~paste("Price: ", price, '$<br>Cut:', cut),
        type="scatter", mode="markers")

In [None]:
# We can also create complex ggplot objects and render them with plotly 

pf <- ggplot(data = d, aes(x = carat, y = price)) + geom_point(aes(text = paste("Clarity:", clarity))) 
pf <- pf + geom_smooth(aes(colour = cut, fill = cut)) + facet_wrap(~ cut)


ggplotly(pf)

**Now, let's use ggplot to create multiple density plots and send that to plotly to create an interactive plot.**


In [None]:

p <- ggplot(diamonds, aes(x = price)) + geom_density(aes(fill = color), alpha = 0.5) + 
     ggtitle("Kernel Density estimates by group")

ggplotly(p)

In [None]:
# We can also create a 2D density plot over scatterplot

# Let's create a toy data set for this purpose
set.seed(123)
df <- data.frame(x <- rchisq(1000, 10, 10), y <- rnorm(1000))

# Now use ggplot
p2 <- ggplot(df, aes(x, y)) + geom_point(alpha = 0.5) + geom_density_2d()
p2 <- p2 + theme(panel.background = element_rect(fill = '#ffffff')) + ggtitle("2D density plot with scatterplot overlay")
# and send it to plotly
ggplotly(p2)

In [None]:

# We can create a boxplot by using plotly, we should define the type as "box"
p3 <- plot_ly(midwest, x = ~percollege, color = ~state, type = "box")
p3

---

## Heatmap Examples with Plotly

In [None]:
# The volcano data set can be plotted in 3D; you can zoom and rotate. Try it. 

plot_ly(z = ~volcano, type = "surface")

In [None]:
# Or it can be mapped to a heat map. Now it's a 2D plot where colors encode height. 
plot_ly(z = volcano, type = "heatmap")

In [None]:
# We can also change colors.  
plot_ly(z = volcano, colors = colorRamp(c("blue", "red")), type = "heatmap")

In [None]:
# We  can create a heatmap with categorical axes. This can be used to visualize a correlation matrix for example.

# First create some random 3x3 matrix 
m <- matrix(rnorm(9), nrow = 3, ncol = 3)
# And plot it with categories in axes' labels. 
plot_ly(x = c("a", "b", "c"), y = c("d", "e", "f"), z = m, type = "heatmap")

In [None]:
# Now, we can create a heatmap to represent density, it's actually a 2D histogram where frequency is coded with color.

# Create some data 
s <- matrix(c(1, -.75, -.75, 1), ncol = 2)
obs <- mvtnorm::rmvnorm(500, sigma = s)


pd <- plot_ly(x = obs[,1], y = obs[,2])
ppd <- subplot(
  pd %>% add_markers(alpha = 0.2),
  pd %>% add_histogram2d()
)
ppd

In [None]:
# Let's do the same with diamonds data set. 
cnt <- with(diamonds, table(cut, clarity))
pd <- plot_ly(diamonds, x = ~cut, y = ~clarity, z = ~cnt) %>%
  add_histogram2d()
pd


---

## Cluster Visualization Example

Let's do a little more complicated plot. 

We'll create a random data set, cluster it into **six clusters, and visualize these clusters** by using geom_polygon in ggplot and render with plotly.

In [None]:
library("RColorBrewer")

# Create random data
nn <- 500
myData <- data.frame(X = rnorm(nn),
                     Y = rnorm(nn))
head(myData)

In [None]:
# Do kmeans clustering and find six clusters 
setK = 6  
clusterSolution <- kmeans(myData, centers = setK)

# Add an attribute to show cluster numbers for aech data point 
myData$whichCluster <- factor(clusterSolution$cluster)

str(clusterSolution)

In [None]:
# These following lines find the "convex hull" of each cluster; that is finding the smallest polygon
# that contains all data points in that cluster.
splitData <- split(myData, myData$whichCluster)
appliedData <- lapply(splitData, function(df){
  df[chull(df), ]  
  })
combinedData <- do.call(rbind, appliedData)

In [None]:
# Now create the ggplot: we'll do a scatter plot with a layer of polygon on top. 
cp <- ggplot(data = myData,
                     aes(x = X, y = Y))
cp <- cp + geom_polygon(data = combinedData,  # This is also a nice example of how to plot
                          aes(x = X, y = Y, fill = whichCluster),  # two superimposed geoms
                          alpha = 1/2)                             # from different data.frames
cp <- cp + geom_point(size=1)
cp <- cp + coord_equal()
cp <- cp + scale_fill_manual(values = colorRampPalette(rev(brewer.pal(11, "Spectral")))(setK))

In [None]:
# now plot it. 
ggplotly(cp)

### YOUR TURN:

**Create a similar clustering visualization for the iris data set**. Use `Petal.Length` and `Petal.Width` as X and Y coordinates, find **three clusters**, and visualize them.  

In [None]:
str(iris)
head(iris)

In [None]:
ggplot(iris, aes(x=Petal.Length,y=Petal.Width, color=Species)) + geom_point()

In [None]:

# You can use the following to create a 2D data set from the iris data and run the code for clustering and visualization. 
myData=data.frame(X=iris$Petal.Length, Y=iris$Petal.Width)

In [None]:
str(myData)

setK=3

clusterSolution <- kmeans(myData, centers = setK)

# Add an attribute to show cluster numbers for aech data point 
myData$whichCluster <- factor(clusterSolution$cluster)

str(clusterSolution)

# These following lines find the "convex hull" of each cluster; that is finding the smallest polygon
# that contains all data points in that cluster.
splitData <- split(myData, myData$whichCluster)
appliedData <- lapply(splitData, function(df){
  df[chull(df), ]  
  })
combinedData <- do.call(rbind, appliedData)

# Now create the ggplot: we'll do a scatter plot with a layer of polygon on top. 
cp2 <- ggplot(data = myData,
                     aes(x = X, y = Y))
cp2 <- cp2 + geom_polygon(data = combinedData,  # This is also a nice example of how to plot
                          aes(x = X, y = Y, fill = whichCluster),  # two superimposed geoms
                          alpha = 1/2)                             # from different data.frames
cp2 <- cp2 + geom_point(size=1)
cp2 <- cp2 + geom_point(data=iris, aes(x=Petal.Length, y=Petal.Width, color=Species)) 

cp2 <- cp2 + coord_equal()
cp2 <- cp2 + scale_fill_manual(values = colorRampPalette(rev(brewer.pal(11, "Spectral")))(setK))

ggplotly(cp2)