-
Notifications
You must be signed in to change notification settings - Fork 0
/
Intro to R - Tutorial.Rmd
742 lines (537 loc) · 22.9 KB
/
Intro to R - Tutorial.Rmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
524
525
526
527
528
529
530
531
532
533
534
535
536
537
538
539
540
541
542
543
544
545
546
547
548
549
550
551
552
553
554
555
556
557
558
559
560
561
562
563
564
565
566
567
568
569
570
571
572
573
574
575
576
577
578
579
580
581
582
583
584
585
586
587
588
589
590
591
592
593
594
595
596
597
598
599
600
601
602
603
604
605
606
607
608
609
610
611
612
613
614
615
616
617
618
619
620
621
622
623
624
625
626
627
628
629
630
631
632
633
634
635
636
637
638
639
640
641
642
643
644
645
646
647
648
649
650
651
652
653
654
655
656
657
658
659
660
661
662
663
664
665
666
667
668
669
670
671
672
673
674
675
676
677
678
679
680
681
682
683
684
685
686
687
688
689
690
691
692
693
694
695
696
697
698
699
700
701
702
703
704
705
706
707
708
709
710
711
712
713
714
715
716
717
718
719
720
721
722
723
724
725
726
727
728
729
730
731
732
733
734
735
736
737
738
739
740
741
---
title: "Intro to R - Tutorial"
author: "Tyler Kukla"
date: "1/5/19"
output: learnr::tutorial
runtime: shiny_prerendered
---
```{r setup, include=FALSE}
library(learnr)
library(dplyr)
library(ggplot2)
knitr::opts_chunk$set(echo = FALSE)
breakfast <- function(myName, myNumber){print(paste(myName, "ate", myNumber, "eggs for breakfast today", sep=' '))}
add1000 <- function(my1000){if(my1000==1000){print(paste("My two numbers add to 1000! I passed!"))
}else{"My numbers do NOT equal 1000... Should I pull out my calculator???"} }
```
## Introduction
This tutorial will cover some of the basics in coding with R and is designed for beginners with no-to-little experience. We will cover a number of exercises that involve directly writing and entering code, just as you might in a script of your own. While we will eventually work on creating our own scripts, the goal right now is to cover the basic syntax.
In this exercise we will cover the following topics:
1. Comments
2. Math
3. Logical statements
4. Functions
5. Data structures
6. "For" loops
7. Data visualization
*If you have any questions about this tutorial feel free to contact me - tykukla@stanford.edu*
## 1. Comments
In R the **pound sign (or "hashtag" [#])** is used to "comment-out" text in a script. Any text that is commented out will not be seen by the computer. Commented-out text generally serves as a note for the user of the script or as a place-holder for an alternative value or variable.
Below, try commenting-out the phrase you agree less with so it does not appear. Then click "Run Code":
```{r commenting, exercise=TRUE}
print("Tyler Kukla is super cool") # This is a comment -- The "print()" function will print the text inside when this code is run
print("Tyler Kukla is only sort of cool") # Here is another comment
```
```{r commenting-hint}
print("Tyler Kukla is super cool")
# print("Tyler Kukla is only sort of cool")
```
We can also use the **#** to de-select a variable value and replace it with something else.
> Before we do this, let's **define a variable**. We do this with either the equal sign (**=**) or an arrow showing directionality (**<-**). *I prefer the arrow because it is less ambiguous*. After a variable is defined in a script, it will be stored in your environment.
Here, set the value of the variables. Set **myNumber** to any number and **myName** to a any name (use single or double quotes for text in R):
```{r myVars, exercise=TRUE}
myNumber
myName
breakfast(myName=myName, myNumber=myNumber)
```
```{r myVars-hint}
myNumber <- 9999
myName <- "Tyler Kukla"
```
Now we will show how we can comment-out values to keep in mind for later. Comment out the incorrect value and replace it with the correct one:
```{r commentOut, exercise=TRUE}
# There are this many states in the USA
NumStates <- 324
```
```{r commentOut-solution}
NumStates <- 50 #324
```
Great! *You are now the hashtag-comment master!* **Proceed to the next topic**
## 2. Math
We will graduate from using R as a sophisticated graphing calculator... someday... but until then we must go through the math basics.
This is all very intuitive and simple.
### Addition
Change the below equation so the sum is 1,000:
```{r add, exercise=TRUE}
43 + 567
```
Now set this solution as a variable
```{r add2, exercise=TRUE}
my1000
add1000(my1000=my1000)
```
```{r add2-hint}
my1000 <- 999 + 1
```
### Subtraction
Make an equation that equals -75
```{r sub, exercise=TRUE}
50 - 33
```
### Multiplication
We use the star **(asterisk * )** symbol to multiply
```{r mult, exercise=TRUE}
#... what is the product of your two favorite numbers???
FavNum1
FavNum2
FavNum1 * FavNum2
```
### Exponents
There are two ways to take a value to the *nth* power:
1. Use the "carrot" symbol ^
2. Use the asterisk twice **
```{r pwrX, exercise=TRUE}
myNum <- 5
myPower <- 3
myNum ^ myPower # this is the same as...
myNum ** myPower # this.
```
### Other useful things
Function | Description
-------- | --------
**sqrt(x)** | square root
**abs(x)** | absolute value
**cos(x)** | cosine
**sin(x)** | sine
**tan(x)** | tangent
**log(x)** | natural log
**log10(x)** | common log
**exp(x)** | e^x
*You are now a math whiz!! Let's move in!*
## 3. Logical statements
There will be many times when you will need to compare two elements in R and look for similarities/differences or other relationships. We use logical statements to help us do this. Running code with these statements typically returns a **"TRUE"** or **"FALSE"**.
> **Note** both TRUE, FALSE and T, F behave the same way in R
Here I will cover both relational and logical operators
### Relational Operators
Operator | Description
---------- | -----------
**<** | *less than*
**>** | *greater than*
**<=** | *less than or equal to*
**>=** | *greater than or equal to*
**==** | *equal to*
**!=** | *not equal to*
Now test out these operators on two numbers that you define!
```{r relOp, exercise=TRUE}
num1 <- 77
num2 <- 79
#... enter different relational operators below
num1 < num2
```
You can also use the *equal to* and *not equal to* operators for characters
```{r relOp2, exercise=TRUE}
char1 <- "Apples"
char2 <- "Oranges"
#... try out the "==" and "!="
char1 == char2
```
### Logical Operators
Operator | Description
---------- | -----------
**!** | *logical NOT*
**&** | *element-wise logical AND*
**&&** | *logical AND*
$\mid$ | *element-wise logical OR*
$\|$ | *logical OR*
> **Note** the element-wise operators are used for a vector and produce results of the vector's length. Non-element wise work for single values or the first value of a vector (for more info on vectors, see Section 5)
Let's test out the logical operators on two numeric values (un-comment the first line to try operators without a check number for comparison):
```{r logOp, exercise=TRUE}
num1 <- 5
num2 <- 0
CheckNum <- 5
#... enter different logical operators below
#num1 != num2
num1 & num2 == CheckNum
```
A helpful way to understand operators is to "read" them aloud. For example:
```{r, eval=FALSE, echo=TRUE}
#... num1 AND num2 are EQUAL TO CheckNum
num1 & num2 == CheckNum
#... num1 OR num2 is EQUAL TO CheckNum
num1 | num2 == CheckNum
#... num1 is NOT EQUAL TO num2
num1 != num2
```
*You are now logically oriented in R! Let's move on to some functions*
## 4. Functions
It is often useful to take operations that will be repeatedly used in R and turn them into a function. This way we do not re-write the code every time we wish to complete the operation.
Functions can be used for a wide range of tasks, not just mathematical operations. For example, you can build a function to save data that you generate in R, or to create figures or even email yourself automatic updates. Recently, I downloaded ~50 gigabytes of data that all needed to be processed, and I wrote one function that did it all for me in a matter of minutes (rather than processing each file by myself).
Here we will create some simple functions and discuss their utility.
### Summation function
To start, let's make a function that completes some simple addition:
```{r, eval=FALSE, echo=TRUE}
mySum <- function(num1, num2){num1 + num2}
```
```{r prepare-addfun}
mySum <- function(num1, num2){num1 + num2}
```
The function takes in two values **num1 and num2** and produces their sum. Note that we put the variables required for the function in paranthesis and the actual function in curly-brackets.
When we call the function we can use the equal sign to set **num1** and **num2** to whatever values we wish:
```{r addfun1, exercise=TRUE, exercise.setup="prepare-addfun"}
mySum(num1 = 50, num2 = 45)
```
We can also enter defined-variables into a function:
```{r addfun2, exercise=TRUE, exercise.setup="prepare-addfun"}
thisNum <- 50
thatNum <- 45
mySum(num1 = thisNum, num2 = thatNum)
```
### Quadratic Formula function
It doesn't save us too much time to have a function that adds two numbers, but let's consider something more complicated like the *quadratic formula*, where we compute both $-b + \sqrt{}$ and $-b - \sqrt{}$:
```{r, eval=FALSE, echo=TRUE}
myQuadForm <- function(a, b, c){xMinus <- (-b - sqrt(b**2 - 4*a*c)) / (2*a)
xPlus <- (-b + sqrt(b**2 - 4*a*c)) / (2*a)
print(paste("lower solution =", xMinus, "; upper solution =", xPlus, sep=' '))}
```
```{r prepare-quadfun}
myQuadForm <- function(a, b, c){xMinus <- (-b - sqrt(b**2 - 4*a*c)) / (2*a)
xPlus <- (-b + sqrt(b**2 - 4*a*c)) / (2*a)
print(paste("lower solution =", xMinus, "; upper solution =", xPlus, sep=' '))}
```
Now testing out our function
```{r quadfun1, exercise=TRUE, exercise.setup="prepare-quadfun"}
myQuadForm(a=1, b=5, c=2)
```
Functions can return values, characters, vectors, plots, dataframes and much more! We will explore some of these throughout the course. Now have a crack at writing your own function. Let's try the equation for a line $y=mx + b$. Let the function return the value $y$ given the variables $m, x, b$.
```{r linefun, exercise=TRUE}
#... define the functijon
LineFun <- function(){}
#... assign values and run the function
myY <- LineFun()
#... print the result
print(myY)
```
*How FUN are these FUNctions?! Now on to Data structures*
## 5. Data structures
There is a wide range of ways to store data in R--from a single-valued variable to *n*-dimensional arrays--and whatever you use will depend on the data you are using, the figures you wish to make, and your personal preferences.
Here I will go over the basic structures:
1. Vector
2. Data frame (or matrix)
3. List
### Vectors
A vector has 1 dimension--length. It can be a sequence of values or characters, and there are functions built into R that help you populate vectors.
We'll go through three ways to make a vector. *First*, let's build a vector by hand:
```{r, echo=TRUE}
myVec <- c(0,1,2,3,4,5,6,7,8,9,10)
print(myVec)
```
As above, any time you assign something with multiple values we must put the values in paranthesis and start with the letter _**c**_. The _**c**_ is short for *concatenate*.
Here's an example of a vector of character strings:
```{r, echo=TRUE}
myCharVec <- c("Tyler K.", "Dan I.", "Page C.")
print(myCharVec)
```
**Try making your own vector!**
```{r makeVec, exercise=TRUE}
myFirstVector
print(myFirstVector)
```
```{r makeVec-hint}
myFirstVector <- c(1, 55, 23, 9)
```
Let's make a vector now using the **sequence function**. The sequence function generates a vector that is a series of numbers based on rules that you define.
For example:
```{r, echo=TRUE}
#... a sequence of values from 0 to 10 in steps of 2
myVec <- seq(from=0, to=10, by=2)
print(myVec)
#... a sequence of values from 0 to 10 split into 15 segments
myVec1 <- seq(from=0, to=10, length=15)
print(myVec1)
```
**Now build your own sequence vector!**
```{r seqVec, exercise=TRUE}
thisVec <- seq()
print(thisVec)
```
Another vector-building function is the **repetition function**. This function repeats a single value, character string, or even another vector.
For exmaple:
```{r, echo=TRUE}
#... a vector of 42 repeated ten times
myVec <- rep(x=42, 10)
print(myVec)
```
#### Examining vectors
It is often useful to learn about vectors whose contents we may be unfamiliar with. R has a number of functions that do this.
For example:
```{r, echo=TRUE}
thisVec <- seq(33, 133, length=58)
#... and the length
length(thisVec)
#... what type of vector is this? (what type of data does it hold; double is vector of numbers)
typeof(thisVec)
#... look at the start of the vector
head(thisVec)
#... and the end
tail(thisVec)
#... and the mean
mean(thisVec)
```
*Now on to data frames*
### Data frames
This is (likely) the most common data structure you will use. It is a simple matrix with columns and rows. R has a number of datasets that are built in as matrices, but let's first build our own.
```{r, echo=TRUE}
mydf <- matrix(nrow=5, ncol=2, data=c(1:10))
print(mydf)
```
We do not need to have data in the matrix when we build it (it may be empty).
That's a good start, but it's not pretty and there are no column names. Let's use the _**dplyr**_ package to tidy our data up. I particularly like using "tibbles" because they are easy to navigate, process, and graph.
```{r, echo=TRUE}
#..."as_tibble" makes the data structure a tibble
mydf <- as_tibble(matrix(nrow=5, ncol=2, data=c(1:10)))
#... "colnames" names the columns
colnames(mydf) <- c('Col1', 'Col2')
#... now let's check it out again
print(mydf)
#... we can use the "$" symbol to identify different columns
mydf$Col1
```
For fun, let's check out a pre-loaded dataset in R.
One interesting dataset is **mtcars** which is fuel consumption and other data for 32 automobiles published in the *1974 Motor Trend US magazine*.
```{r, echo=TRUE}
#... bring data into the environment
data("mtcars")
#... let's find the dimensions of the data
dim(mtcars)
#... and the number of rows
nrow(mtcars)
#... and number of columns
ncol(mtcars)
#... and miles per gallon
mtcars$mpg
```
We can look at different parts of the data by using brackets denoting the row(s) and column(s) we wish to see.
```{r, echo=TRUE}
data("mtcars")
mtcars[1,1] # row 1 column 1 value
mtcars[3,5] # row 3 column 5 value
mtcars[ ,5] # all rows of column 5
mtcars[c(1,2), 6] # rows 1 and 2 of column 6
```
Now it's your turn! pick a pre-loaded dataset, load it, and explore it a little bit.
*Some examples are*
1. "iris" measurements of 50 flowers from 3 species of iris
2. "PlantGrowth" data from experiment of plant biomass under two treatments (and control)
3. "USArrests" violent crimes in US by state (values per 100,000 people)
```{r playdf, exercise=TRUE, exercise.lines=10}
data("")
```
*Now let's check out some Lists*
### Lists
A list is a set of elements that can vary from one to the other. For example, if you want to place a single value, a vector, and a data frame in the *same location* you could place them in a list.
For example:
```{r, echo=TRUE}
myVar <- "Tyler"
myVec <- seq(3,30, by=10)
mydf <- as_tibble(matrix(data=c(1:10), nrow=5, ncol=2))
#... lets bring these into a list
myList <- list(myVar, myVec, mydf)
#... and let's view the output
myList
```
The printed output tells you how to call each element of the list.
```{r, echo=TRUE}
myVar <- "Tyler"
myVec <- seq(3,30, by=10)
mydf <- as_tibble(matrix(data=c(1:10), nrow=5, ncol=2))
myList <- list(myVar, myVec, mydf)
#... to get just the first element
myList[[1]]
#... combine with dataframe notation to see the second column of the third element
myList[[3]]$V2
```
*Final quiz* -- can you turn objects or data from a pre-loaded dataset into a list? (remember, some data options are "iris", "USArrests", "PlantGrowth")
```{r myList, exercise=TRUE, exercise.lines=10}
data("")
```
```{r myList-hint}
data("iris")
object1 <- iris$Species
object2 <- iris$Sepal.Length[c(1,2)]
object3 <- c(iris$Petal.Length, iris$Petal.Width)
myList <- list(object1, object2, object3)
```
*Awesome! Let's move on to "For" loops*
## 6. "For" loops
A "for" loop is a part of a code where you complete some pre-defined computation _**x**_ number of times. For example, if I have a vector of data that includes 55 values, I could use a for-loop to loop through each value and calculate the square of each.
```{r, echo=TRUE}
myVec <- c(10:20)
#... "outVec" is the solution of the computation
#... "length(myVec)" equals the number of times we loop through
#... "i" counts where we are in the loop
outVec <- vector() # an empty vector
for(i in 1:length(myVec)){
outVec[i] <- sqrt(myVec[i])
}
#... see the results
print(outVec)
```
For this particular problem, we would usually just code: outVec <- sqrt(myVec)
This would accomplish the same thing. But this example is useful to understand how for-loops work.
We can also **nest** a for loop within another one. This is useful if we have 2-dimensional data (like a dataframe).
```{r, echo=TRUE}
mydf <- mydf <- as_tibble(matrix(data=c(1:10), nrow=5, ncol=2))
# Here, we loop through all columns (i) and all rows (k).
# I want to save data in a vector, so I need a unique value to
# track what iteration we're on. We'll call the tracker "dex"
# (short for "index")
outVec <- vector() # empty vector to hold output
dex <- 1 # tracker to store data in vector
for(i in 1:ncol(mydf)){ # loop through each column
for(k in 1:nrow(mydf)){ # loop through each row
outVec[dex] <- mydf[k, i] ** 2 # square the value
#... set the tracker to the next number
dex <- dex + 1
}
}
#... check out the results
outVec
```
*Use the space below to write your own for loop*. Multiply the miles-per-gallon of the "mtcars" data by the horsepower, and save in a vector.
```{r forLoop, exercise=TRUE}
data("mtcars")
mtcars$mpg # miles-per-gallon data
mtcars$hp # horsepower data
outVec <- vector()
for(i in 1:length(mtcars$mpg)){
}
```
```{r forLoop-solution}
data("mtcars")
outVec <- vector()
for(i in 1:length(mtcars$mpg)){
outVec[i] <- mtcars$mpg[i] * mtcars$hp[i]
}
```
*Great! We can now bring the data structures and for loops together to work on data visualization*
## 7. Data visualization
There is *SO MUCH* to learn with data visualization (you could do an entire PhD on it, and people do). Here, I will just cover the very basics.
R is a powerful data visualization tool, with many plotting commands that are useful in "base R" (the functions that come built-in). *However*, R has a much more powerful and easy-to-use plotting tool called **"ggplot"**. This is a set of plotting commands that follow simple, organized data structures and allow for easily-customizable visualizations.
*Some ggplot2 specifics*:
Each element of a plot is generally its own line in the script. Each line ends with a **"+"**, indicating there is another line after it (the last line does not end with a plus). We first tell ggplot what data we are using:
```{r, echo=TRUE, eval=FALSE}
data("mtcars")
ggplot(mtcars) +
```
Now we tell ggplot what kind of plot to make. These usually start with **"geom_"**, denoting the plot's "geometry". For a scatterplot, we use "geom_point".
```{r, echo=TRUE, eval=FALSE}
ggplot(mtcars) +
geom_point()
```
Next we determine the plot's _**aesthetics**_ with the **aes()** command. Within this command we tell ggplot what to put on the X and Y axes:
```{r, echo=TRUE, eval=FALSE}
ggplot(mtcars) +
geom_point(aes(x = wt, y = mpg))
```
Notice that the x and y values are just the names of the columns of data we wish to plot. Here we investigate how miles-per-gallon (mpg) relates to the weight (wt) of a car.
Now let's look at (and make) some figures!
**There are loads of amazing examples online. Here I will just cover a couple.**
1. Scatterplot
2. Density plot
3. Box and whisker or "violin"
### Scatterplot
Starting with the classic scatterplot -- what does X vs Y look like?
Let's take a look at the "mtcars" dataset
```{r, echo=TRUE, eval=TRUE}
data("mtcars")
ggplot(mtcars) +
geom_point(aes(x=wt, y=mpg)) +
theme_linedraw()
# there are many "themes" you can choose from (or you can avoid them all together). This is one that I like.
```
We can also add color to the data points, and change their size and shape. These steps involve those words exactly **("color", "size", and "shape")**. When looking for a nice color palette, I suggest checking out [Adobe color wheel](https://color.adobe.com/create/color-wheel/). *Especially the pre-defined themes under __Explore__*. You can simply copy and paste the HEX code of the color you want, then call that color by putting quotes around the code and a hashtag in front. *For example: "#FA7268"*.
```{r, echo=TRUE, eval=TRUE}
data("mtcars")
ggplot(mtcars) +
geom_point(aes(x=wt, y=mpg), color='#FA7268', shape=15, size=4) +
theme_linedraw()
# google "ggplot2 shapes R" to see what shape corresponds to what code
```
We can also change the axis labels *(labs)*, add a title *(ggtitle)*, make the points semi-transparent *(alpha)*, and color them by a different parameter.
To color points by another parameter, the "color" value must be in the aesthetic **aes()**. Then we can define the color with a variety of **"scale_color_"** commands. For example:
```{r, echo=TRUE, eval=TRUE}
data("mtcars")
ggplot(mtcars) +
#... the data
geom_point(aes(x=wt, y=mpg, color=qsec), shape=16, size=4, alpha=0.7) +
#... the color gradient
scale_color_gradient(name="Quarter mile time", low="#C5003E", high="#D9FF5B") +
#... axis labels
labs(x="Weight (x1e3 pounds)", y="Miles per gallon") +
ggtitle("Fuel Efficiency") +
theme_linedraw()
```
**Your turn!** Make your own plot or copy and paste the code above to modify mine:
```{r scatterplot, exercise=TRUE, exercise.lines=12}
data("mtcars")
ggplot(mtcars) +
geom_point()
```
### Density plot
We can use the same basic structure in ggplot that we learned in the scatter plot example to make a density plot.
Let's look at a density plot for horsepower in the mtcars data:
```{r, echo=TRUE, eval=TRUE}
data("mtcars")
ggplot(mtcars) +
geom_density(aes(x=hp), fill="#C5003E", alpha=0.6) +
labs(x="Horsepower") +
ggtitle("Horsepower density plot") +
theme_linedraw()
```
As you can see, it's almost exactly the same as the scatter plot figure but **geom_point()** is replaced with **geom_density** and there is no y-axis defined for the density plot.
We can make a **2-D density** plot in which case the Y axis must be defined:
```{r, echo=TRUE, eval=TRUE}
data("mtcars")
ggplot(mtcars) +
#... density plot
stat_density2d(aes(x=hp, y=wt, fill=..level..), geom='polygon') +
#... plus data
geom_point(aes(x=hp, y=wt), color="gray") +
labs(x="Horsepower", y="Weight") +
ggtitle("Horsepower vs Weight 2D density plot") +
theme_linedraw()
```
*Feel free to make your own density plot here!*
```{r densityplot, exercise=TRUE, exercise.lines=12}
data("mtcars")
ggplot(mtcars)
```
### Box and whisker or "violin"
Box and whisker plots are useful for statistical visualization of groups of data. The same data can be represented by density in "violin" plots as well. Both of these are easy in ggplot and we will walk through them here.
Box and whisker example (using scale_fill_ instead of scale_color_ because color controls the outline):
```{r, echo=TRUE, eval=TRUE}
data("mtcars")
ggplot(mtcars) +
geom_boxplot(aes(x=as.character(am), y=mpg, fill=as.character(am))) +
scale_fill_manual(name="Transmission", label=c("0"="Automatic", "1"="Manual"),
values=c("0"="#348899", "1"="#962D3E")) +
ggtitle("Miles per gallon by transmission") +
labs(x="Transmission (0=auto; 1=man)", y="Miles per gallon") +
theme_linedraw()
```
And almost the exact same code works for a violin plot (**geom_violin** instead of geom_boxplot):
```{r, echo=TRUE, eval=TRUE}
data("mtcars")
ggplot(mtcars) +
geom_violin(aes(x=as.character(am), y=mpg, fill=as.character(am))) +
scale_fill_manual(name="Transmission", label=c("0"="Automatic", "1"="Manual"),
values=c("0"="#348899", "1"="#962D3E")) +
ggtitle("Miles per gallon by transmission") +
labs(x="Transmission (0=auto; 1=man)", y="Miles per gallon") +
theme_linedraw()
```
*Now try your own box plot!*
```{r boxplot, exercise=TRUE, exercise.lines=12}
data("mtcars")
ggplot(mtcars)
```