# 11 - ECON 326: An Example of a Final Project

COMET Team <br> \_\_  
7/1/23

## Introduction:

Now that you are well armored with a statistical toolkit and experience
with R, you are well on your way to embark on your own economic research
adventure! This project serves as a sample to give you some intuition
into the broad steps to a successful research project. It synthesizes
the knowledge you have gained in your study of the ECON325 and ECON326
modules, and allows you to apply it to your own research project. It
explains the steps involved in cleaning your data and preparing it for
analysis, the actual analysis itself, and the careful interpretation and
visualization of that analysis. Let’s get started!

In [None]:
library(haven)
library(tidyverse)

── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
✔ dplyr     1.1.0     ✔ readr     2.1.4
✔ forcats   1.0.0     ✔ stringr   1.5.0
✔ ggplot2   3.4.1     ✔ tibble    3.1.8
✔ lubridate 1.9.2     ✔ tidyr     1.3.0
✔ purrr     1.0.1     
── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
✖ dplyr::filter() masks stats::filter()
✖ dplyr::lag()    masks stats::lag()
ℹ Use the ]8;;http://conflicted.r-lib.org/conflicted package]8;; to force all conflicts to become errors

## Getting the Data:

## Importing Data into R

Once you have gathered data, R has great dependability and dexterity in
the viewing and manipulation of that data. To do this, you will want to
import your datasets into R, like you have observed in multiple other
modules so far. The data that you have gathered could be in a host of
different formats like,

-   .csv (Comma-Separated Values file),
-   .dta (STATA data file),
-   .xlsx (Excel file),
-   .sav (SPSS file) or,
-   .sas (SAS file)

Fortunately, we will not be needing separate packages to import these
files; `haven` is our jack-of-all-trades. We used the command
`library(haven)` to load it at beginning of this module.

In [None]:
# Reading the Data
gdp_data <- read.csv("../datasets/gdp_data.csv")
pollution_data <- read.csv("../datasets/pollution_data.csv")

In [None]:
can_gdp_data <- gdp_data %>% filter(GEO == 'Canada')
can_gdp_data <- can_gdp_data %>% mutate(REF_DATE = substr(REF_DATE, 1, 4),
         MONTH = substr(REF_DATE, 6, 7))
        
can_yearly_gdp <- can_gdp_data %>% group_by(REF_DATE) %>% summarize(VALUE = sum(VALUE))
can_yearly_gdp <- can_yearly_gdp %>% mutate(REF_DATE = as.integer(REF_DATE))

In [None]:
can_poll_data <- pollution_data %>% filter(Sector == "Total, industries and households") %>% filter(GEO == 'Canada')

In [None]:
merged_data <- left_join(can_yearly_gdp, can_poll_data, by = 'REF_DATE')

In [None]:
# Preliminary Regressions Run
reg1 <- lm(VALUE.y ~ VALUE.x, data = merged_data)
summary(reg1)


Call:
lm(formula = VALUE.y ~ VALUE.x, data = merged_data)

Residuals:
   Min     1Q Median     3Q    Max 
-67780  -3170   7607  14124  18034 

Coefficients:
             Estimate Std. Error t value Pr(>|t|)    
(Intercept) 6.909e+05  1.045e+05   6.612 5.99e-05 ***
VALUE.x     5.024e-04  7.078e-04   0.710    0.494    
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Residual standard error: 24630 on 10 degrees of freedom
Multiple R-squared:  0.04796,   Adjusted R-squared:  -0.04724 
F-statistic: 0.5038 on 1 and 10 DF,  p-value: 0.4941