
    VIDHI JATIN SHAH                                                                                     UBPerson No:50207090
                                                                                                           
                                                                        
** LAB 3 : Classical package: H. Wickhamâ€™s ggplot2 Vignette**
---------------------------------------


- Importing required libraries.

    - ggplot2-To plot different graphs.
    - ggrepel-To repel text labels overlaps in plots.
    - scales-scales map data to aesthetics, and provide methods for  determining breaks and labels for axes and legends
    - tidyr-for data tidying

In [None]:
library(ggplot2)
library(ggrepel)
library(scales)
library(tidyr)

- Reading the CSV for housing 

In [None]:
housing <- read.csv("landdata-states.csv")
head(housing[1:5])

*ggplot2 VS Base for simple graphs where base wins*

- Base graphics histogram example ploting home.value column

In [None]:
hist(housing$Home.Value)

- ggplot2 histogram example ploting Home.Value column

In [None]:
ggplot(housing, aes(x = Home.Value)) +
  geom_histogram()

*ggplot2 Base graphics VS ggplot for more complex graphs where ggplot2 wins.*

- Base colored scatter plot example

In [None]:
plot(Home.Value ~ Date,
     data=subset(housing, State == "MA"))
points(Home.Value ~ Date, col="red",
       data=subset(housing, State == "TX"))
legend(1975, 400000,
       c("MA", "TX"), title="State",
       col=c("black", "red"),
       pch=c(1, 1))

- ggplot2 colored scatter plot example

In [None]:
ggplot(subset(housing, State %in% c("MA", "TX")),
       aes(x=Date,
           y=Home.Value,
           color=State))+
  geom_point()

- To get a list of available geometric objects

In [None]:
help.search("geom_", package = "ggplot2")

-ScatterPlot to plot Land Value Vs Structural cost

In [None]:
hp2001Q1 <- subset(housing, Date == 2001.25) 
ggplot(hp2001Q1,
       aes(y = Structure.Cost, x = Land.Value)) +
  geom_point()

- Log of Land Value Vs Structural Cost scatterplot

In [None]:
ggplot(hp2001Q1,
       aes(y = Structure.Cost, x = log(Land.Value))) +
  geom_point()

- ScatterPlot of Land Value Vs Structural Cost with a regression Line(Prediction Line)

In [None]:
hp2001Q1$pred.SC <- predict(lm(Structure.Cost ~ log(Land.Value), data = hp2001Q1))

p1 <- ggplot(hp2001Q1, aes(x = log(Land.Value), y = Structure.Cost))

p1 + geom_point(aes(color = Home.Value)) +
  geom_line(aes(y = pred.SC))

* Smoothers- geom that includes a line and a ribbon

In [None]:
p1 +
  geom_point(aes(color = Home.Value)) +
  geom_smooth()

* label to label points in the plot

In [None]:
p1 + 
  geom_text(aes(label=State), size = 3)

In [None]:
p1 + 
  geom_point() + 
  geom_text_repel(aes(label=State), size = 3)

* Variables are mapped to aesthetics with the aes() function

In [None]:
p1 +
  geom_point(aes(size = 2),# incorrect! 2 is not a variable
             color="red") # this is fine -- all points red

* Mapping Variables To Other Aesthetics

In [None]:
p1 +
  geom_point(aes(color=Home.Value, shape = region))

**Statistical Transformations**

- default statistic for geom_bar is stat_bin.

In [None]:
args(geom_histogram)
args(stat_bin)

 - Without binwidth

In [None]:
p2 <- ggplot(housing, aes(x = Home.Value))
p2 + geom_histogram()

 - we can change binwidth argument by passing to the stat_bin function.

In [None]:
p2 + geom_histogram(stat = "bin", binwidth=4000)

**Changing The Statistical Transformation**

In [None]:
housing.sum <- aggregate(housing["Home.Value"], housing["State"], FUN=mean)
rbind(head(housing.sum), tail(housing.sum))

In [None]:
ggplot(housing.sum, aes(x=State, y=Home.Value)) + 
  geom_bar(stat="identity")

*Scales*

- Distribution of home values by Date and State

In [None]:
p3 <- ggplot(housing,
             aes(x = State,
                 y = Home.Price.Index)) + 
        theme(legend.position="top",
              axis.text=element_text(size = 6))
(p4 <- p3 + geom_point(aes(color = Date),
                       alpha = 0.5,
                       size = 1.5,
                       position = position_jitter(width = 0.25, height = 0)))

- Modify the breaks for the x axis and color scales

In [None]:
p4 + scale_x_discrete(name="State Abbreviation") +
  scale_color_continuous(name="",
                         breaks = c(1976, 1994, 2013),
                         labels = c("'76", "'94", "'13"))

- Changing the low and high values to blue and red

In [None]:
p4 +
  scale_x_discrete(name="State Abbreviation") +
  scale_color_continuous(name="",
                         breaks = c(1976, 1994, 2013),
                         labels = c("'76", "'94", "'13"),
                         low = "blue", high = "red")

In [None]:
p4 +
  scale_color_continuous(name="",
                         breaks = c(1976, 1994, 2013),
                         labels = c("'76", "'94", "'13"),
                         low = muted("blue"), high = muted("red"))

- Using different color scales

In [None]:
p4 +
  scale_color_gradient2(name="",
                        breaks = c(1976, 1994, 2013),
                        labels = c("'76", "'94", "'13"),
                        low = muted("blue"),
                        high = muted("red"),
                        mid = "gray60",
                        midpoint = 1994)

 - mapping each State to diffrent color
 
 What is the trend in housing prices in each state?

In [None]:
p5 <- ggplot(housing, aes(x = Date, y = Home.Value))
p5 + geom_line(aes(color = State))

- Faceting by state

In [None]:
(p5 <- p5 + geom_line() +
   facet_wrap(~State, ncol = 10))

*Themes*

In [None]:
p5 + theme_linedraw()

In [None]:
p5 + theme_linedraw()

In [None]:
p5 + theme_linedraw()

In [None]:
p5 + theme_light()

In [None]:
p5 + theme_minimal() +
  theme(text = element_text(color = "turquoise"))

- Creating and saving new themes

In [None]:
theme_new <- theme_bw() +
  theme(plot.background = element_rect(size = 1, color = "blue", fill = "black"),
        text=element_text(size = 12, family = "Serif", color = "ivory"),
        axis.text.y = element_text(colour = "purple"),
        axis.text.x = element_text(colour = "red"),
        panel.background = element_rect(fill = "pink"),
        strip.background = element_rect(fill = muted("orange")))

p5 + theme_new


- Wrong way to plot two variables in a data.farme

In [None]:
housing.byyear <- aggregate(cbind(Home.Value, Land.Value) ~ Date, data = housing, mean)
ggplot(housing.byyear,
       aes(x=Date)) +
  geom_line(aes(y=Home.Value), color="red") +
  geom_line(aes(y=Land.Value), color="blue")

- Plotting two variables in my data.frame as separate points, with different color depending on which variable it is.

In [None]:

home.land.byyear <- gather(housing.byyear,
                           value = "value",
                           key = "type",
                           Home.Value, Land.Value)
ggplot(home.land.byyear,
       aes(x=Date,
           y=value,
           color=type)) +
  geom_line()