# ggplot cookbook

This cookbook is about data visualization with ggplot in Python (Grammar Graphics Plotting).  

The Grammar of Graphics is an abstract model of:
- what semantic elements a plot can have (points, lines, scales, title etc.)
- how to build a plot layer by combining these elements
- how to stack layers on top of each other to produce a complete plot

You specify in abstract terms exactly what plot you need.  
The computer figures out the technical details needed to create the plot.  
So you simply declare what you want, the computer does the work and creates it.

In 1999 the academic [Wilkinson Leland](https://en.wikipedia.org/wiki/Leland_Wilkinson) published the theory of "The Grammar of Graphics".  
In 2005 Hadley Wickham implemented an interpretation of this grammar in the [ggplot2](https://en.wikipedia.org/wiki/Ggplot2) package for R .  
In 2014 Yhat came with [ggplot](http://ggplot.yhathq.com/) for Python (a Python clone of R ggplot2).  
In 2016 Hassan Kibirige released [plotnine](https://plotnine.readthedocs.io/en/stable/#) (a better R ggplot2 clone for Python)

We will use plotnine.

According to Yhat ggplot is [easy to learn, fun and powerful](http://ggplot.yhathq.com/).

### example

![ggplot2 example](http://i.imgur.com/4S7r3Z3.jpg)

With only four lines we can produce a complex plot.  

Note that this example was not produced with Python but with R.  
However that does not really matter, in Python it looks very much the same
[Have a look](apdx02_replication%20of%20R%20results.ipynb).

At first glance the first three lines come over as gobbledygook.  
There is clearly some explaining to do.   
But luckily the explanation is not that hard, have a look at 101-the basics. 


# Contents

- ============ the basics ========================
- 101 - [the basics](101_ggplot_the_basics.ipynb) specify the data-layer with ggplot() and one or more visualization layers with geoms 
- 102 - [short forms](102_simple_syntax_forms.ipynb) make plots with less typing  
-   
- ============ geom's ============================   
- 210 - how to use [geom_point](210_geom_point.ipynb) to make a dotplot (scatter plot)
- 211 - how to use [geom_jitter](211_geom_jitter.ipynb) to make a jitter plot
- 
- 220 - how to use [geom_line](220_geom_line.ipynb) to make a line plot
- 221 - how to use [geom_hline](221_geom_xline.ipynb) to add a horizontal line
- 221 - how to use [geom_vline](221_geom_xline.ipynb) to add a vertical line
- 221 - how to use [geom_abline](221_geom_xline.ipynb) to add an abline (i.e. any straight line)
- 221 - how to add a linear [trendline](221_geom_xline.ipynb) to a plot 
- 222 - how to use [geom_smooth](222_geom_smooth.ipynb) to make smooth lines
- 
- 230 - how to use [geom_bar](230_geom_bar.ipynb) to make a barplot
- 231 - how to use [geom_histogram](231_geom_histogram.ipynb) to make a histogram
- 232 - how to use [geom_density](232_geom_density.ipynb) to make a density plot
- 233 - how to use [geom_ribbon](233_geom_ribbon.ipynb) to make ribbon and area plots
- 
- 240 - how to use [geom_boxplot](240_geom_boxplot.ipynb) to make boxplots
- 241 - how to use [geom_violin](241_geom_violin.ipynb) to make violin plots
-  
- ============ theory ============================
- 900 - the underlying theory (and explanation of vocab) 
- 910 - relevant visual perception theory
- 920 - explanatory V. exploratory visualization
- 
- ============ annotate a plot ===================
- 300 - how to use [ggtitle](300_annotate_plot.ipynb) to add a title to a plot.
- 300 - how to use [xlab and ylab](300_annotate_plot.ipynb) to customize the x or y label of a plot.
- 310 - how to use [geom_text](310_geom_text.ipynb) to add text to the plot grid.
- 310 - how to use [geom_text](310_geom_text.ipynb) to add a label to each plotted point.
-  
- ============ scales ===========================
- 400 - how to use [xlim and ylim](400_xlim_ylim.ipynb) to set the upper and lower limits of a scale.
- 410 - how to use [scale_x/y_continuous](410_scale_y_continuous.ipynb) to customize the x or y scale.
- 420 - how to use [scale_x/y_log10](420_scale_xy_log.ipynb) to produce logarithmic scales. 
- 430 - how to use [scale_x/y_reverse](430_scale_xy_reverse.ipynb) to reverse scales.
-  
- 440 - how to use [scale_x/y_date](440_scale_xy_date.ipynb) to format scales that display dates
-  
- 450 - how to use [scale_color_brewer](450_scale_color_brewer.ipynb) to adjust discrete color scales.
- 460 - how to use [scale_color_gradient](460_scale_color_gradient.ipynb) to adjust continuous color scales.  
-  
- =========== coordinates ==========================
- 500 - how to use [coord_flip](500_coord_flip.ipynb) to flip the x and y-axis. 
-  
- =========== stats ================================
-  
- =========== facets ===============================
-  
- =========== theme's ==============================
-  
- =========== save to file =========================
-  


# Apendici

- appendix 00 - an Explanatory Data Analysis ([EDA](apdx00_EDA_of_diamonds_data.ipynb)) of the diamonds data
- appendix 01 - examples of [ggplot2 in R](apdx01_ggplot2_in_R_Examples.ipynb)   
- appendix 02 - [compare](apdx02_replication of R results.ipynb) ggplot-code in R and Python


# References

Wilkinson, Leland (2005). The Grammar of Graphics. [pdf](http://download.springer.com/static/pdf/700/bfm%253A978-0-387-28695-2%252F1.pdf?originUrl=http%3A%2F%2Flink.springer.com%2Fbook%2Fbfm%3A978-0-387-28695-2%2F1&token2=exp=1496103581~acl=%2Fstatic%2Fpdf%2F700%2Fbfm%25253A978-0-387-28695-2%25252F1.pdf%3ForiginUrl%3Dhttp%253A%252F%252Flink.springer.com%252Fbook%252Fbfm%253A978-0-387-28695-2%252F1*~hmac=9b3a45ecb1d791ed8177de18b9f553738fdb65c3d84528621f7f589316f7919f)   
Springer. ISBN 978-0-387-98774-3.

Hadley Wickham (2010) “The Layered Grammar of Graphics”, [pdf](http://vita.had.co.nz/papers/layered-grammar.pdf),   
Journal of Computational and Graphical Statistics, Volume 19, Number 1, Pages 3–28 

 Noah Iliinsky, [Designing Data Visualizations](https://www.amazon.com/Designing-Data-Visualizations-Informational-Relationships/dp/1449312284/), 2011, O'Reilly

ggplot2 [cheatsheet](https://www.rstudio.com/wp-content/uploads/2015/03/ggplot2-cheatsheet.pdf)

ggplot2 on [tidyverse](http://ggplot2.tidyverse.org/index.html)

[Data visualisation](http://r4ds.had.co.nz/data-visualisation.html)
chapter 3 from [R for Data Science](http://r4ds.had.co.nz/)
(recommended)


[plotnine](https://plotnine.readthedocs.io/en/stable/) A Grammar of Graphics for Python

plotnine on [github](https://github.com/has2k1/plotnine) 

The Yhat [git repository](https://github.com/yhat/ggpy) of ggplot for Python, with docs and examples

Yhat ggplot documentation:[documentation](http://ggplot.yhathq.com/)

Sape Research Group ggplot2 Quick Reference [for R](http://sape.inf.usi.ch/quick-reference/ggplot2/)

Data Visualization with ggplot2 (Part 1) [DataCamp](https://www.datacamp.com/courses/data-visualization-with-ggplot2-1?tap_a=5644-dce66f&tap_s=93618-a68c98)

see for "philosopy" of dat visualization: [compleXDiagrams.com](http://complexdiagrams.com/)

Properties and Best Uses of Visual Encodings [link](http://complexdiagrams.com/wp-content/2012/01/VisualPropertiesTable.pdf)

The Table of Visual Attributes (2013) [link](https://richardbrath.wordpress.com/2013/09/28/the-table-of-visual-attributes-2013-2/)

A very nice intro to data Viz https://www.youtube.com/watch?v=XIgjTuDGXYY

good intro to + history of dataViz https://github.com/ryandata/DataViz
also via this playlist: https://www.youtube.com/playlist?list=PLCj1LhGni3hPGy6Kj1AFxHYkKklxenO9D