# R Introduction Push-In
The goal of this Push-In is to provide a light introduction to R. We aim to build up to data frames, which are one of the most commonly used data structures in R.

# Section 1. Jupyter Notebooks

This course will be using a Jupyter Notebook to interact with R.  The bit of extra setup is well worth it because the Notebook provides code completion and other helpful features.

Jupyter Notebooks are included in the Anaconda distribution which you installed. Notebook files have the extension ".ipynb" to distinguish them.

## Download the training material

1. Go to our GitHub repo. The URL is [**https://github.com/dlab-berkeley/R-Push-Ins**](https://github.com/dlab-berkeley/R-Push-Ins)
2. Click on the green button in the top right that says "Clone or download".
3. Click "Download ZIP".
4. Unzip the file.
5. Move the unzipped `R-Push-Ins` to your Desktop.

## Open Jupyter Notebook

1. Open the Anaconda Navigator program on your computer.
2. Open Jupyter Notebook from there.
3. Navigate to the `R-Push-Ins` folder. Depending on what your home directory is, this most likely mean clicking the "Desktop" folder, at which point `R-Push-Ins` should be visible.
4. Click on `R-Push-Ins`.
5. Click on `Lessons` -> Lesson 1.

This will start a Jupyter Notebook server and open your default web browser.

- The server runs locally on your machine only and does not use an internet connection.
- The server sends messages to your browser.
- The server does the work and the web browser renders the notebook.
- You can type code into the browser and see the result when the web page talks to the server.

Using Jupyter notebooks has several advantages:

- You can easily type, edit, and copy and paste blocks of code.
- Tab complete allows you to easily access the names of things you are using and learn more about them.
- It allows you to annotate your code with links, different sized text, bullets, etc., to make it more accessible to you and your collaborators.
- It allows you to display figures next to the code that produces them to tell a complete story of the analysis.
- The notebook is stored as JSON but can be saved as a .py file if you would like to run it from the bash shell or a python interpreter.
- Just like a webpage, the saved notebook looks different than what you see when it gets rendered by your browser.

## Navigating in Jupyter Notebook

Jupyter Notebooks have more useful features for interactive use than the standard python interpreter, but they work in the same basic way: you type things and then execute them.

Unlike many graphical systems, there is no button to run your code! Instead, **you run code using Shift-Enter**. This also moves you to the next box (or "cell") for code below the one you just ran (but this may change in the future).

# Section 2: Variable Assignment

Try assigning the value "5" to the variable 'number' and then run 'number'.
To run a line of code, place your cursor on a line of runnable code and click the "Run" button or click Shift + Enter.

In [1]:
number <- 5
number

In [None]:
## You can also use the '=' operator to do variable assignment. 
number = 5
number

There are subtle differences between '<-' and '=', which won't matter in most cases. However, using '<-' is considered good code style. You want your code to adhere to good stylistic practices, since that makes it easier to read and use by other users.

In [None]:
## You can perform basic arithmetic in R:
number + 1
number - 2
number * 3
number / 4

Use a hashtag to comment your code (e.g., write notes to your future self and your collaborators) to help keep your script organized. 

# Section 3: Functions and Arguments
**Functions** perform actions on inputs. They are followed by trailing round parentheses.

**Arguments** are the inputs - values, expressions, text, entire datasets, etc. You tell a function what arguments it needs inside the parentheses. Sometimes, these arguments are "named". This is helpful when you need to enter multiple arguments: the names tell R which arguments correspond to what variables you're passing into the function.

In [2]:
## Use the ls() function to see all of the variables you have defined.
## Notice that ls() does not take any arguments!
ls()

In [None]:
## You can use the "TAB" key to autocomplete a variable.
## Place your cursor after the 'b' in 'numb' below and press TAB.
## This works for variables and functions alike.
numb

In [None]:
# The class() function tells the data class/type of the variable and requires one argument
class(number)
ls()

In [None]:
## Removing Variables. rm() will remove a variable
rm(number)
ls()
number # Error

In [None]:
## Remove all variables with rm(list = ls()).
## Notice that this is the first function we're using with a named argument!
## Or, click the broom icon at top of 'Environment' pane.
rm(list = ls()) 
ls()

### Challenge 1: Variable Assignment
Define three variables and then write a mathematical expression using only those variables.



In [None]:
# Your code here



# Section 4: Data Types

There are five main types of data we will work with in R:
1. numeric: decimals (the default for ALL numbers in R).
2. integer: whole numbers (positive and negative, including zero).
3. character: text strings (always wrapped in quotations).
4. logical: TRUE or FALSE (1 or 0).
5. factor: nominal or ordinal categorical type.

## Section 4.1: Numerics
Assign 5 to 'number' and check its class. 



In [None]:
number <- 5
number
class(number)

## Section 4.2: Integers
Coerce 'number' to integer type with the as.integer() function:

In [None]:
number_int <- as.integer(number)
number_int
class(number_int)

## Section 4.3: Characters
Define welcome <- "Welcome to the D-Lab" and check its class:


In [None]:
welcome <- "Welcome to the D-Lab"
class(welcome)
ls()

In [3]:
## Single and double quotes work similarly:
contraction <- 'I am hungry.'
contraction

contraction <- "I am hungry."
contraction

In [None]:
## You can nest single quotes inside of double quotes:
contraction <- "I'm hungry"
contraction

## Or, you can use all single quotes along with escape characters:
contraction <- 'I\'m hungry'
contraction

However, you cannot nest single quotes inside of single quotes.

## Section: 4.4 Logicals
Logical data will check to see if a condition is TRUE (1) or FALSE (0).


In [None]:
class(TRUE)
class(FALSE)

In [None]:
## Since TRUE and FALSE are stored as 1 and 0, they take on mathematical properties:
TRUE + 2
FALSE - 4

In [None]:
## Boolean data types evaluate whether a statement is TRUE. Check the following:
FALSE < TRUE # less than
TRUE >= TRUE # greater than or equal to
FALSE == FALSE # equivalent to (equal to)
"Mac" == "mac" # R is case sensitive
FALSE != FALSE # not equivalent to (not equal to)
"PC" != "Windows"



In [None]:
## Boolean 'and' (all conditions must be satisfied):
TRUE & TRUE 
TRUE & FALSE

In [None]:
## Boolean "or" (just one condition must be satisfied):
TRUE | TRUE 
TRUE | FALSE

# Section 4.5: Factors

A **factor** variable is a set of categorical or ordinal values. We won't cover factors in this Push-In, but check out our [Fundamentals](https://github.com/dlab-berkeley/R-Fundamentals) workshop to learn more!

### Challenge 2: Data type coercion
Like `as.integer`, other "as dot" functions exist as well, such as `as.numeric`, `as.character`, `as.logical`, and `as.factor`.

1. Define three variables: one numeric, one character, and one logical

In [None]:
# Your code here



2. Can you convert numeric to integer type?
3. Convert numeric to logical?
4. Convert numeric to character?
5. Convert logical to character?
6. Convert character to numeric?


In [None]:
# Your code here




# Section 5: Data Structures

Data structures are useful ways of representing and organizing data in R. There are several data structures we can construct, but we'll focus on two:

1. `c()`: ordered groupings of the SAME type of data (called "vectors").
2. `data.frame()`: and ordered group of equal-length vectors; think of an Excel spreadsheet.

## Section 5.1: Vectors
A vector is an ordered group of the *same* type of data. We can we can create vectors by "concatenating" data together with the `c()` function:


In [None]:
vec <- c(2, 5, 8, 11, 14)
vec

It does not matter what type the data is contained within the vector, as long as it is all the same:

In [None]:
numeric_vector <- c(234, 31343, 78, 0.23, 0.0000002)
numeric_vector
class(numeric_vector)
length(numeric_vector) # There are five elements in this vector.

### Indexing a vector
To index a vector means to extract an element based on its position. For example, if we want to return just the third thing from "numeric_vector", we would use square brackets and type:



In [None]:
numeric_vector[3]

When we want a numeric vector with entries separated by 1, we can also use the colon operator: 



In [None]:
colon_vector <- c(28:36)
colon_vector 

Vectors can contain other types, too. Consider the following examples:

In [4]:
character_vector <- c("Canada", "United States", "Mexico")
character_vector
class(character_vector)

# Section 6: Data frames

Why do we need a data frame? Think about datasets that you have seen before. For example, suppose we collected data on the characteristics of D-Lab Workshop learners. We might want to know the age, degree program, previous familiarity with programming, research interests, and likely many other attributes (variables). 

This kind of dataset is multidimensional. We have one row for each participant and a number of columns for each attribute we collect data on. If we had forty participants and collected 10 attributes for each participant, then we would have a 40 by 10 dataset.

The data structure in R that is most suited for this kind of problem is the data frame. 
A data frame is an ordered group of equal-length vectors. They are the most common type of data structure used for data analyses. Most of the time when we load real data into R, we are loading that data into a data frame. 

Since they are vectors, each column can only contain the same data type, but columns of different types can be lined up next to each other.

Meanwhile, rows can contain heterogeneous data.

Let's create a data frame capturing some information about countries:




countries <- c("Canada", "Mexico", "United States")
populations <- c(10, 20, 30)
areas <- c(30, 10, 20)


In [None]:
## We can create the data frame with the data.frame() function.
## The equal-length vectors are the arguments.
## Notice that the name of each variable becomes the name of the column.
df <- data.frame(countries, populations, areas)
df

In [None]:
## If we wanted to change the column names, we can specify that with the function argument:
df <- data.frame(country = countries, population = populations, area = areas)
df

In [None]:
## Check the compact structure of the data frame:
str(df)

In [None]:
## View the dimensions (nrow x ncol) of the data frame:
dim(df) 

In [None]:
## View column names:
colnames(df)

In [None]:
## View row names (we did not change these and they default to character type):
rownames(df)
class(rownames(df))

In [None]:
## You can extract a single column with the $ operator:
df$country

In [None]:
## The $ operator can also be used to create new columns:
df$density <- df$population / df$area
df

### Challenge 3: Make your own data frame.
1. Create a data frame that contains four different food items and three attributes for each: name, price, and quantity.
2. Add a fourth column that calculates the total cost for each food item.
3. What function could you use to calculate the total cost of all the fruits combined?

In [None]:
# Your code here



This concludes the R Introduction Push-In!

We encourage you to check out [R-Fundamentals](https://github.com/dlab-berkeley/R-Fundamentals) for a more detailed introduction into R.


