# Reading and Writing Data in R
### by [Jason DeBacker](http://jasondebacker.com), October 2017

This notebook illustrates how to read and write data in R.

## Reading Data

### CSV Files

In [1]:
# see what directory we are in
getwd()
# read in Kauffman data we looked at in the Python tutorial
# Read in csv file and put in data frame named kisa
# column names are set from the header of the read file if
# header = TRUE
kisa <- read.table("../Python/kisa_2015.csv", 
                 header = TRUE,
                 sep = ",", fill = TRUE)
kisa[1:5,] # show the first 5 rows

month,grdatn,marstat,age,class,region,state,hours,mlr,natvty,...,homeown,hoursu1b,hoursu1b_t1,se15u,se15u_t1,ent015u,ent015ua,vet,wgtat,wgtat1
12,42,5,57,4,1,14,40,1,57,...,,40,40,0,0,0,0,0,269.1724,270.4338
12,39,7,26,4,1,14,40,1,57,...,,40,40,0,0,0,0,0,403.0235,404.9121
12,41,1,43,4,2,41,46,1,110,...,,46,40,0,0,0,0,0,402.7901,404.6776
12,39,1,38,4,2,41,40,1,57,...,,40,30,0,0,0,0,0,342.9345,344.5415
12,42,1,51,-1,3,58,-1,6,57,...,,-1,-1,0,0,0,0,0,560.2244,562.8497


In [5]:
# the read.table() function above can also read in tab delimited text
# there is a function specifically for csv (essentially the same,
# just has sep default set to "," and fill default to TRUE)
kisa2 <- read.csv("../Python/kisa_2015.csv", 
                 header = TRUE)
kisa2[1:5,] # show the first 5 rows

month,grdatn,marstat,age,class,region,state,hours,mlr,natvty,⋯,homeown,hoursu1b,hoursu1b_t1,se15u,se15u_t1,ent015u,ent015ua,vet,wgtat,wgtat1
12,42,5,57,4,1,14,40,1,57,⋯,,40,40,0,0,0,0,0,269.1724,270.4338
12,39,7,26,4,1,14,40,1,57,⋯,,40,40,0,0,0,0,0,403.0235,404.9121
12,41,1,43,4,2,41,46,1,110,⋯,,46,40,0,0,0,0,0,402.7901,404.6776
12,39,1,38,4,2,41,40,1,57,⋯,,40,30,0,0,0,0,0,342.9345,344.5415
12,42,1,51,-1,3,58,-1,6,57,⋯,,-1,-1,0,0,0,0,0,560.2244,562.8497


In [5]:
install.packages("readxl")
library(readxl)

Installing package into 'C:/Users/yafei/Documents/R/win-library/3.4'
(as 'lib' is unspecified)


ERROR: Error in contrib.url(repos, "source"): trying to use CRAN without setting a mirror


### Reading data from Excel

In [4]:
# import readxl package  -- need to make sure installed 
# install.packages("readxl) # install package (only need to do if don't have already)
library(readxl)

# read in data used for PS #4
df <- read_excel("../LoopConditional/radio_merger_data.xlsx",
                            sheet = 1)
df[1:5,]

ERROR: Error in library(readxl): there is no package called 'readxl'


### Reading Stata data files

In [18]:
# Activate the `haven` library
# install.packages("haven") # if not already installed
library(haven)

# Read Stata data file used for PS #3
ps3_data <- read_dta("../Functions/PS3_data.dta") 
ps3_data[1:5,]

also installing the dependencies ‘BH’, ‘readr’, ‘hms’, ‘forcats’




The downloaded binary packages are in
	/var/folders/b0/wwxd0byd1hx0y_rqrnqhmc0m0000gn/T//RtmpbSEaSP/downloaded_packages


id68,year,intid,relhh,hannhrs,wannhrs,hlabinc,wlabinc,nochild,wrace,⋯,redpregovinc,hsex,wsex,age,wage,hpersno,wpersno,hyrsed,wyrsed,pce
1,1967,1,1,1200,2000,,,0,,⋯,5614,1,2,52,46,1,2,8.0,8,0
2,1967,2,1,0,0,,,0,,⋯,0,1,2,56,57,1,2,3.0,3,0
3,1967,3,1,0,0,,,0,,⋯,0,1,2,77,64,1,2,,3,0
4,1967,4,1,1560,0,,,6,1.0,⋯,3280,1,2,45,44,1,2,8.0,5,0
5,1967,5,1,2500,2000,,,3,1.0,⋯,7900,1,2,24,22,1,2,10.0,9,0


Be sure to read more about these functions in there documentation.  They are quite flexible in terms of skipping lines and reformating data that you put into your R data frame.  They can also ready data from URLs (as we did in Python) with some modification.  Additionally, there are functions to read in data in many other formats.  DataCamp provides a good overview of these [here](https://www.datacamp.com/community/tutorials/r-data-import-tutorial#stata).

## Writing data

As with reading in data, there is the ability to write data into multiple formats from R.

###  To CSV

In [6]:
# write the radio mergers data to csv (df was the
# name of the data frame from above)
write.csv(file='test_write.csv', x=df)

ERROR: Error in as.data.frame.default(x[[i]], optional = TRUE): cannot coerce class ""function"" to a data.frame


### To Excel

In [32]:
# install.packages("openxlsx") # if not already installed
library(openxlsx)

# for writing a data.frame or list of data.frames to an xlsx file
write.xlsx(df, "test_write.xlsx")


The downloaded binary packages are in
	/var/folders/b0/wwxd0byd1hx0y_rqrnqhmc0m0000gn/T//RtmpbSEaSP/downloaded_packages


### To Stata

In [29]:
# Uses the haven package we imported above
write_dta(df, 'test_write.dta', version = 14) # note, haven only supports to Stata 14

Again, be sure to read the documentation about these functions to see the available keyword arguments that will allow you to adjust how the data are saved.