## Lab 1: Introduction to R in Statistics & Data Analysis
#### MA 189 Data Dive Into Birmingham (with R)
##### _Blazer Core: City as Classroom_

Course Website: [Github.com/kerenli/statbirmingham/](https://github.com/kerenli/statbirmingham/) 


#### Levels:
<div class="alert-success"> Concepts and general information</div>
<div class="alert-warning"> Important methods and technique details </div>
<div class="alert-info"> Extended reading </div>
<div class="alert-danger"> (Local) examples, assignments, and <b>Practice in Birmingham</b> </div>

## <div class="alert alert-block alert-success"> What Is R? </div>

Simply speaking, R is a language and environment for statistical computing and graphics.

R does:
* Graphics, statistics, machine learning, etc.
* Data acquisition, munging, management
* Literate programming (dynamic reports)
* Web applications

More specifically, R is a powerful and versatile programming language specifically designed for statistical analysis and data visualization. It was created by statisticians for statisticians, making it particularly well-suited for tasks such as data manipulation, analysis, and graphical representation.

One of the key strengths of R is its extensive ecosystem of packages—collections of functions and datasets that extend the capabilities of base R. These packages allow users to perform a wide range of tasks, from basic statistical analysis to complex machine learning algorithms, all within the same environment.

R is widely used in academia, research, and industry, especially in fields like data science, bioinformatics, economics, and social sciences. Its open-source nature means it is freely available to everyone, and its active community continuously contributes to its development and improvement.

In this course, you will use R to explore and analyze real-world data, gaining practical experience that will be invaluable in your studies and future career.

## <div class="alert alert-block alert-success"> Options for Running R </div>

##### <div class="alert alert-block alert-danger"> Download and Install R </div>
1. Go to [http://www.r-project.org/](http://www.r-project.org/)

2. Click on the "CRAN" link and choose a CRAN mirror close to your location.

3. Download the version of R appropriate for your operating system (Windows, macOS, or Linux).

4. Follow the installation instructions provided on the website.

There are several options for running R.

##### <div class="alert alert-block alert-danger"> Run R in RStudio </div>

RStudio is a popular integrated development environment (IDE) for R. It provides a user-friendly interface, making it easier to write, run, and debug R code.

1. Install R:
    - Go to [http://www.r-project.org/](http://www.r-project.org/) and download R for your operating system.
    - Install R by following the instructions. 
    

2. Install RStudio:
    - Go to [https://www.rstudio.com/products/rstudio/download/](https://www.rstudio.com/products/rstudio/download/).
    - Download the installer for your operating system and install RStudio. 

3. Run RStudio:
    - Open RStudio.
    - You can start writing and running R code directly in the RStudio console or script editor. 

##### <div class="alert alert-block alert-danger"> Setting Up R Kernel in JupyterLab via Anaconda </div>

Anaconda is a distribution that includes Python, R, and other packages, and it simplifies managing multiple programming environments.

Follow the steps below to create a new environment and install the necessary packages.

1. Install Anaconda:
    - Download Anaconda from [https://www.anaconda.com/products/individual](https://www.anaconda.com/products/individual).
    - Follow the installation instructions.

2. Open Anaconda Navigator.
3. Create a New Environment:
    If you cannot find the `r-base` package in the default (base/root) environment, it is recommended to create a new environment. 
    - In the **Anaconda Navigator** window, navigate to the **Environments** tab on the left-hand side.
    - In the bottom left corner, click the **Create** button.
    - A dialog box will appear. Enter a name for your new environment (for example, `r_env`).
    - Under the **Packages** section, check both **Python** and **R** to include them in your environment.
    - Click **Create**. Anaconda Navigator will now set up your new environment.
       
4. Activate and Install Packages in the New Environment
    - Once the environment is created, select it from the list of environments on the left-hand side.
    - To install additional packages, such as **JupyterLab**, click the **Search Packages** box.
    - Search for **JupyterLab**, select it from the results, and click **Apply** to install it.
    - You can also install other packages like `r-base` and `IRkernel` directly from the Navigator by searching for them.

5. Launch JupyterLab:
    - In Anaconda Navigator, go to the Home tab.
    - Under Applications on [Your Environment Name], find JupyterLab and click Launch.
    - In JupyterLab, when you create a new notebook, select R from the list of available kernels.

<div class="alert alert-block alert-info">
Setup JupyterLab for R
</div>

Jupyter Notebook allows you to run R code within an interactive notebook environment, which is excellent for combining code, visualizations, and text. Below are the steps to install and run R in Jupyter Notebook:

1. Install R:
    - Go to [http://www.r-project.org/](http://www.r-project.org/) and download R for your operating system.
    - Install R by following the instructions.

2. Install Python:
    - Download and install Python from [https://www.python.org/downloads/](https://www.python.org/downloads/) if not already installed.

3. Install JupyterLab:
    - Open your command line interface and install JupyterLab by running:
      
      ---
      ```bash
      pip install jupyterlab
      ```
      ---
      
    - On macOS, you might need to use `pip3` instead:
    
      --- 
      ```bash
      pip3 install jupyterlab
      ```
      ---
      
4. Install R Kernel for Jupyter Notebook:
    - Open the command line and type `R` to start the R console.
    - In the R console, install the IRkernel package:
    
      ---
      ```r
      install.packages('IRkernel')
      ```
      ---
      
    - When prompted to select a CRAN mirror, choose one close to your location (e.g., “USA (TX 1)”).
    - Make the R kernel available to Jupyter:
    
      ---
      ```r
      IRkernel::installspec()
      ```
      ---

5. Run JupyterLab:
    - Start JupyterLab by typing `jupyter lab` in the command line.
    - This will open JupyterLab in your web browser.
    - Create a new notebook and select the "R" kernel to start coding in R.

<div class="alert alert-block alert-info">
Run R in the Command Line
</div>

You can run R directly from the command line, which is useful for quick tasks or when you prefer working in a terminal environment.

1. Install R:
    - Go to [http://www.r-project.org/](http://www.r-project.org/) and download R for your operating system.
    - Install R by following the instructions.

2. Open the Command Line:
    - On Windows: Open Command Prompt.
    - On macOS/Linux: Open Terminal.

3. Start R:
    - Type `R` and press Enter.
    - This will start the R console in your command line interface.


## <div class="alert alert-block alert-success"> Basic R </div>

### <div class="alert alert-block alert-success"> Basic R Operations and Commonly Used Functions </div>

**Objective:** 
Introduce students to the basic operations in R and familiarize them with commonly used functions.

#### <div class="alert alert-block alert-success"> 1. Basic Arithmetic Operations </div>
   - Addition (`+`)
   - Subtraction (`-`)
   - Multiplication (`*`)
   - Division (`/`)
   - Exponentiation (`^`)
   - Modulus (`%%`)
   - Integer Division (`%/%`)

   *Example:*

In [None]:
   5 + 3   # Addition
   10 - 4  # Subtraction
   7 * 2   # Multiplication
   21 / 5  # Division
   2^3     # Exponentiation
   10 %% 3 # Modulus
   10 %/% 3 # Integer Division

#### <div class="alert alert-block alert-success"> 2. Comparison Operators </div>
   - Equal to (`==`)
   - Not equal to (`!=`)
   - Greater than (`>`)
   - Less than (`<`)
   - Greater than or equal to (`>=`)
   - Less than or equal to (`<=`)

   *Example:*

In [None]:
   5 == 3  # False
   5 != 3  # True
   5 > 3   # True
   5 < 3   # False

#### <div class="alert alert-block alert-success"> 3. Logical Operators </div>
   - AND (`&`)
   - OR (`|`)
   - NOT (`!`)

   *Example:*

In [None]:
   (5 > 3) & (4 > 2)  # True
   (5 > 3) | (4 < 2)  # True 
   !(5 > 3)           # False

#### <div class="alert alert-block alert-success"> 4. Commonly Used Functions </div>
   - `sum()`: Sum of elements
   - `mean()`: Mean of elements
   - `median()`: Median of elements
   - `sd()`: Standard deviation
   - `min()`, `max()`: Minimum and maximum values
   - `range()`: Range of values
   - `length()`: Number of elements in an object

   *Example:*

In [None]:
   numbers <- c(1, 2, 3, 4, 5)
   sum(numbers)      # Sum: 15
   mean(numbers)     # Mean: 3
   median(numbers)   # Median: 3
   sd(numbers)       # Standard Deviation: 1.581
   min(numbers)      # Minimum: 1
   max(numbers)      # Maximum: 5
   range(numbers)    # Range: 1 5
   length(numbers)   # Length: 5

#### <div class="alert alert-block alert-success"> 5. Help and Documentation </div>
   - How to get help on a function using `?` or `help()`.

   *Example:*

In [None]:
?mean  # Help on the mean function
help(sd)  # Help on the standard deviation function

### <div class="alert alert-block alert-success"> Data Structures in R </div>

**Objective:** 
Introduce students to the primary data structures in R, including vectors, matrices, data frames, and lists.

#### <div class="alert alert-block alert-success"> 1. Vectors </div>
   - Creating vectors using `c()`
   - Accessing elements in a vector
   - Vector operations (element-wise)
   - Common functions applied to vectors (`sum()`, `mean()`, etc.)

   *Example:*

In [None]:
# Creating a vector
vec <- c(1, 2, 3, 4, 5)

# Accessing elements
vec[1]    # First element
vec[1:3]  # First three elements

# Vector operations
vec + 2       # Adds 2 to each element
vec * 2       # Multiplies each element by 2
sum(vec)      # Sum of elements
mean(vec)     # Mean of elements

#### <div class="alert alert-block alert-success"> 2. Matrices </div>
   - Creating matrices using `matrix()`
   - Accessing elements, rows, and columns
   - Basic matrix operations (addition, multiplication)
   
   *Example:*

In [None]:
# Creating a matrix
mat <- matrix(1:9, nrow=3, ncol=3)

# Accessing elements
mat[1, 2]     # Element in first row, second column
mat[, 1]      # First column
mat[1, ]      # First row

# Matrix operations
mat + 2       # Add 2 to each element
mat * 2       # Multiply each element by 2
mat %*% mat   # Matrix multiplication

#### <div class="alert alert-block alert-success"> 3. Data Frames </div>
   - Creating data frames using `data.frame()`
   - Accessing columns by name and position
   - Adding and removing columns
   
   *Example:*

In [None]:
# Creating a data frame
df <- data.frame(
  Name = c("Alice", "Bob", "Charlie"),
  Age = c(25, 30, 35),
  Height = c(5.5, 6.0, 5.8)
)

# Print the data frame
df

# Accessing columns
df$Name     # Access by column name
df[, 2]     # Access by column position

# Adding a column
df$Weight <- c(130, 150, 160)



# Removing a column
df$Height <- NULL



#### <div class="alert alert-block alert-success"> 4. Lists </div>
   - Creating lists using `list()`
   - Accessing elements in a list
   - Lists containing different types of objects
   
   *Example:*

In [None]:
# Creating a list
my_list <- list(
  Name = "Alice",
  Age = 25,
  Scores = c(85, 90, 95)
)

#Print list
my_list 

# Accessing elements
my_list$Name         # Access by name
my_list[[2]]         # Access by position
my_list$Scores[2]    # Accessing elements within a vector in a list

#### <div class="alert alert-block alert-success"> Lab Activity </div>

1. **Interactive Coding Session:**
   - Students will follow along with the instructor to practice basic operations, functions, and working with data structures in R.
   - Instructor will provide additional examples and hands-on exercises.

2. **Homework Assignment:**
   - Students will be assigned exercises to reinforce their understanding of R basics and data structures.
   - The assignment will include tasks such as creating vectors, matrices, data frames, and lists, performing basic operations, and using commonly used functions.

### <div class="alert alert-block alert-success"> Reading and Writing Data in R </div>

**Objective:** 
Introduce students to reading data from files and writing data to disk in R.

#### <div class="alert alert-block alert-success"> 1. Reading Data </div>
   - Reading CSV files using `read.csv()`
   - Reading Excel files using `readxl` package
   - Reading data from text files using `read.table()`
   
   *Example:*

In [None]:
# Reading a CSV file
data <- read.csv("data/us-states.csv")

data
# Reading an Excel file
# Install the readxl package if not already installed
# install.packages("readxl")
library(readxl)
data <- read_excel("data/House Prices.xlsx")

# Reading a text file
data <- read.table("data/house.txt", header = TRUE)



#### <div class="alert alert-block alert-success"> 2. Writing Data </div>
   - Writing CSV files using `write.csv()`
   - Writing data to text files using `write.table()`
      
   *Example:*

In [None]:
# Writing a CSV file
write.csv(data, "data/exit_poll_output.csv")

# Writing a text file
write.table(data, "data/exit_poll_output.txt", sep = "\t")

### <div class="alert alert-block alert-danger"><b>Lab Discussion: Exit Poll for Presidential Election</b></div>

An exit poll was done at the 2020 Presidential Election for different areas of states. Take a look at the results for Alabama ([source](https://www.cnn.com/election/2020/exit-polls/president/alabama)). For the Birmingham/South Central area of Alabama, there were 336 respondents, and 56% said they voted for Biden.


**Objective:** 
Guide students on how to apply the concepts learned in the lab using a real-world example.

1. **Performing Basic Operations:**
  - Calculate the number of respondents who voted for Biden.
  - Calculate the percentage of respondents who did not vote for Biden.

In [None]:
total_respondents <- 336
voted_for_biden <- 0.56 * total_respondents
voted_for_biden

In [None]:
188/336

### <div class="alert alert-block alert-danger"><b>Practice in Birmingham</b></div> 

**Question 1:** Calculate the percentage of respondents who voted for Trump.

**Question 2:** How many respondents voted for Trump if the total respondents are 336 and 56% voted for Biden?

**Question 3:** If 10 more respondents had voted for Biden out of the 336 polled, what would the new percentage of Biden voters be?

   *Hints:*
   - Use basic arithmetic operations to calculate the answers.
   - Consider what happens to the percentages if the total number of respondents changes.

**Question 4:** The data from the exit poll includes personal and potentially sensitive information regarding political choices. Based on the ethical considerations learned in Chapter 2, discuss the following:
    
   1. How should the data from this exit poll be treated to ensure the confidentiality of the respondents?
   2. What steps should the researchers take to obtain informed consent from participants in this type of study?
   3. Considering the public nature of exit polls, is it enough to keep the data confidential, or should researchers also strive to make the data anonymous? Explain why or why not.

**Submission:**

   - Write your code in a separate script (.ipynb) and print it as a PDF or HTML file to submit the results on course Canvas website.
    
    
