<a href="https://colab.research.google.com/github/zia207/r-colab/blob/main/NoteBook/R%20for%20Beginners/data_import_export_r.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

![alt text](http://drive.google.com/uc?export=view&id=1bLQ3nhDbZrCCqy_WCxxckOne2lgVvn3l)

# **Data Import/Export to/from R**



## Introduction

One of the most important steps in data analysis is importing data into R and exporting from R. This process can be done using various functions depending on the format of the data, such as CSV, Excel, or SQL. In this context, it is essential to learn some of the most common ways to read and write data with R. By importing data into R, users can perform a wide range of data analysis, from simple data visualization to complex machine learning algorithms. Therefore, mastering data importation and exporting is a fundamental skill for any data scientist or analyst.


## Install rpy2

Easy way to run R in Colab with Python runtime using **rpy2** python package. We have to install this package using the `pip` command:

In [1]:
!pip uninstall rpy2 -y
! pip install rpy2==3.5.1
%load_ext rpy2.ipython

Found existing installation: rpy2 3.4.2
Uninstalling rpy2-3.4.2:
  Successfully uninstalled rpy2-3.4.2
Collecting rpy2==3.5.1
  Downloading rpy2-3.5.1.tar.gz (201 kB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m201.7/201.7 kB[0m [31m5.2 MB/s[0m eta [36m0:00:00[0m
[?25h  Preparing metadata (setup.py) ... [?25l[?25hdone
Building wheels for collected packages: rpy2
  Building wheel for rpy2 (setup.py) ... [?25l[?25hdone
  Created wheel for rpy2: filename=rpy2-3.5.1-cp310-cp310-linux_x86_64.whl size=314930 sha256=b6171a5d76232e8bee82a227f677b67891690204ed558ef342b8e8cd374c864d
  Stored in directory: /root/.cache/pip/wheels/73/a6/ff/4e75dd1ce1cfa2b9a670cbccf6a1e41c553199e9b25f05d953
Successfully built rpy2
Installing collected packages: rpy2
Successfully installed rpy2-3.5.1


##  Mount Google Drive

Then you must create a folder in Goole drive named "R" to install all packages permanently. Before installing R-package in Python runtime. You have to mount Google Drive and follow on-screen instruction:

In [2]:
from google.colab import drive
drive.mount('/content/drive')

Mounted at /content/drive


In this exercise, we will use the following R-packages:

1.  [readxl](https://readxl.tidyverse.org/): to read MS **Excel** file.  usually comes with [tidyverse](https://www.tidyverse.org/)
2.  [rjson](https://github.com/alexcb/rjson): to read **.json** file
3.  [foreign](https://cran.r-project.org/web/packages/foreign/index.html): to read data stored by **Minitab**, **S**, **SAS**, **SPSS**, **Stata**, **Systat**, **dBase**, and so forth.
4.  [haven](https://haven.tidyverse.org/): read and write data from different statistical packages. It usually comes with [tidyverse](https://www.tidyverse.org/)
5. [writexl](https://cran.r-project.org/web/packages/writexl/writexl.pdf): to write MS **Excel** file.

In [None]:
%%R
pkg <- c('readxl','rjson', 'foreign', 'haven', 'writexl')
new.packages <- pkg[!(pkg %in% installed.packages(lib='drive/My Drive/R/')[,"Package"])]
if(length(new.packages)) install.packages(new.packages, lib='drive/My Drive/R/')

## Laod Libaray

In [28]:
%%R
# set library path
.libPaths('drive/My Drive/R')
library(readxl)
library(writexl)
library(rjson)
library(foreign)
library(haven)

## Data


All data set use in this exercise can be downloaded from my [Dropbox](https://www.dropbox.com/scl/fo/fohioij7h503duitpl040/h?rlkey=3voumajiklwhgqw75fe8kby3o&dl=0) or from my [Github](https://github.com/zia207/r-colab/tree/main/Data/R_Beginners) accounts.

It would be best if you created a working directory in R to read and write files locally. The following example shows how to create the working directory in R.

Before creating a working directory, you may check the directory of your current R session; the function `getwd()` will print the current working directory path as a string.

If you want to change the working directory in R you just need to call the `setwd()` function, specifying as argument the path of the new working directory folder.

> setwd("F:\\R-Project")

> setwd("F:/R-Project")

Remember that you must use the forward slash `/` or double backslash `\\` in R! The Windows format of single backslash will not work.

The files under in a directory can check using `dir()` function:

> dir()

In [16]:
%%R
dataFolder<-"/content/drive/MyDrive/R_Website/R_Bigenner/Data/"

## Data Import to R

### Importing data using R-Studio IDE

Importing data into R becomes time-intensive. The easiest way to import data into R is by using RStudio IDE. This feature can be accessed from the **Environment pane** or from the **tools menu**. The importers are grouped into three categories: Text data, Excel data, and statistical data. The details can be found [here](https://support.posit.co/hc/en-us/articles/218611977-Importing-Data-with-the-RStudio-IDE).


To access this feature, use the "Import Dataset" dropdown from the "Environment" pane:



![alt text](http://drive.google.com/uc?export=view&id=1nRZoZ5qU-jzpC3n0FAk63VVqQHCFVDaH)






Or through the "File" menu, followed by the "Import Dataset" submenu:



![alt text](http://drive.google.com/uc?export=view&id=1nKfmrVXro6rSlCizhJXF_rS6ZFIldNUn)



### Read Text File (.txt)

The easiest form of data to import into R is a simple text file. The primary function to import from a text file is `read.table()`.

> read.table(file, header = FALSE, sep = "", quote =""'",.....)

In [15]:
%%R
# read .txt file
df.txt<-read.table(paste0(dataFolder,"test_data.txt"), header= TRUE)
#df.txt<-read.table("/content/drive/MyDrive/R_Website/R_Bigenner/Data/test_data.txt", header= TRUE)
head(df.txt)

   ID treat  var rep    PH   TN   PN   GW ster   DTM   SW   GAs  STAs
1 Low    As BR01   1  84.0 28.3 27.7 35.7 20.5 126.0 28.4 0.762 14.60
2 Low    As BR01   2 111.7 34.0 30.0 58.1 14.8 119.0 36.7 0.722 10.77
3 Low    As BR01   3 102.3 27.7 24.0 44.6  5.8 119.7 32.9 0.858 12.69
4 Low    As BR06   1 118.0 23.3 19.7 46.4 20.3 119.0 40.0 1.053 18.23
5 Low    As BR06   2 115.3 16.7 12.3 19.9 32.3 120.0 28.2 1.130 13.72
6 Low    As BR06   3 111.0 19.0 15.3 35.9 14.9 116.3 42.3 1.011 15.97


Or you can driectly load data directly from my [Github data folder](https://github.com/zia207/r-colab/raw/main/Data/R_Beginners/test_data.txt) using following code:

In [None]:
%%R
df.txt<-read.table("https://github.com/zia207/r-colab/raw/main/Data/R_Beginners/test_data.txt",
                    header= TRUE)
head(df.txt)

   ID treat  var rep    PH   TN   PN   GW ster   DTM   SW   GAs  STAs
1 Low    As BR01   1  84.0 28.3 27.7 35.7 20.5 126.0 28.4 0.762 14.60
2 Low    As BR01   2 111.7 34.0 30.0 58.1 14.8 119.0 36.7 0.722 10.77
3 Low    As BR01   3 102.3 27.7 24.0 44.6  5.8 119.7 32.9 0.858 12.69
4 Low    As BR06   1 118.0 23.3 19.7 46.4 20.3 119.0 40.0 1.053 18.23
5 Low    As BR06   2 115.3 16.7 12.3 19.9 32.3 120.0 28.2 1.130 13.72
6 Low    As BR06   3 111.0 19.0 15.3 35.9 14.9 116.3 42.3 1.011 15.97


However, `scan()` function could be used to scan and read data. It is usually used to read data into vector or list or from file in R Language.

> scan(scan(file = "", what = double(), nmax = -1, n = -1, sep ="",..)

In [None]:
%%R
# Scan data
df.scan<-scan("https://github.com/zia207/r-colab/raw/main/Data/R_Beginners/test_data.txt",  what = list("", "", ""))




### Read Comma-Separated File (.csv)

A comma delimited or comma-separated file (CSV) is one where each value in the file is separated by a comma, although other characters can be used. Reading data from a CSV file is made easy by the `read.csv()`, an extension of `read.table()`. It facilitates the direct import of data from CSV

In [17]:
%%R
df.csv<-read.csv(paste0(dataFolder,"test_data.csv"), header= TRUE)
#df.csv<-read.csv("/content/drive/MyDrive/R_Website/R_Bigenner/Data/test_data.csv", header= TRUE)
head(df.csv)

  ID  treat  var rep    PH   TN   PN   GW ster   DTM   SW   GAs  STAs
1  1 Low As BR01   1  84.0 28.3 27.7 35.7 20.5 126.0 28.4 0.762 14.60
2  2 Low As BR01   2 111.7 34.0 30.0 58.1 14.8 119.0 36.7 0.722 10.77
3  3 Low As BR01   3 102.3 27.7 24.0 44.6  5.8 119.7 32.9 0.858 12.69
4  4 Low As BR06   1 118.0 23.3 19.7 46.4 20.3 119.0 40.0 1.053 18.23
5  5 Low As BR06   2 115.3 16.7 12.3 19.9 32.3 120.0 28.2 1.130 13.72
6  6 Low As BR06   3 111.0 19.0 15.3 35.9 14.9 116.3 42.3 1.011 15.97


Or you can data directly from my Github data folder using following code:

In [None]:
%%R
df.csv<-read.csv("https://github.com/zia207/r-colab/raw/main/Data/R_Beginners/test_data.csv",
                  header= TRUE)
head(df.csv)

  ID  treat  var rep    PH   TN   PN   GW ster   DTM   SW   GAs  STAs
1  1 Low As BR01   1  84.0 28.3 27.7 35.7 20.5 126.0 28.4 0.762 14.60
2  2 Low As BR01   2 111.7 34.0 30.0 58.1 14.8 119.0 36.7 0.722 10.77
3  3 Low As BR01   3 102.3 27.7 24.0 44.6  5.8 119.7 32.9 0.858 12.69
4  4 Low As BR06   1 118.0 23.3 19.7 46.4 20.3 119.0 40.0 1.053 18.23
5  5 Low As BR06   2 115.3 16.7 12.3 19.9 32.3 120.0 28.2 1.130 13.72
6  6 Low As BR06   3 111.0 19.0 15.3 35.9 14.9 116.3 42.3 1.011 15.97


### Excel Files (.xlsx)

If you want to get data from Excel into R, one of the easiest ways to do it is to export the Excel file to a CSV file and then import it using the above method. But if you don't want to do that, you can use the 'readxl' package. It's easy to use since it has no extra dependencies, so thato you can install it on any operating system.

**readxl** supports both the legacy `.xls` format and the modern xml-based `.xlsx` format. The libxls C library is used to support `.xls`, which abstracts away many of the complexities of the underlying binary format. To parse .`xlsx,` we use the RapidXML C++ library.

`read_excel()` reads both `xls` and `xlsx` files and detects the format from the extension.

In [18]:
%%R
df.xl <-readxl::read_excel(paste0(dataFolder,"test_data.xlsx"), 1)
#df.xl <- readxl::read_excel("/content/drive/MyDrive/R_Website/R_Bigenner/Data/test_data.xlsx", 1)
head(df.xl)

# A tibble: 6 × 13
     ID treat  var     rep    PH    TN    PN    GW  ster   DTM    SW   GAs  STAs
  <dbl> <chr>  <chr> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
1     1 Low As BR01      1   84   28.3  27.7  35.7  20.5  126   28.4 0.762  14.6
2     2 Low As BR01      2  112.  34    30    58.1  14.8  119   36.7 0.722  10.8
3     3 Low As BR01      3  102.  27.7  24    44.6   5.8  120.  32.9 0.858  12.7
4     4 Low As BR06      1  118   23.3  19.7  46.4  20.3  119   40   1.05   18.2
5     5 Low As BR06      2  115.  16.7  12.3  19.9  32.3  120   28.2 1.13   13.7
6     6 Low As BR06      3  111   19    15.3  35.9  14.9  116.  42.3 1.01   16.0


### JSON Files (.json)

JSON is an open standard file and lightweight data-interchange format that stands for *J*ava*S*cript *O*bject *N*otation. The JSON file is a text file that is language independent, self-describing, and easy to understand.

The JSON file is read by R as a `list` using the function `fromJSON()` of **rjson** package.

In [19]:
%%R
# read .json file
df.json <- rjson::fromJSON(file= paste0(dataFolder, "test_data.json"),  simplify=TRUE)
print(df.json)

$ID
 [1]  1  2  3  4  5  6  7  8  9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25
[26] 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42

$treat
 [1] "Low As"   "Low As"   "Low As"   "Low As"   "Low As"   "Low As"  
 [7] "Low As"   "Low As"   "Low As"   "Low As"   "Low As"   "Low As"  
[13] "Low As"   "Low As"   "Low As"   "Low As"   "Low As"   "Low As"  
[19] "Low As"   "Low As"   "Low As"   "High As " "High As " "High As "
[25] "High As " "High As " "High As " "High As " "High As " "High As "
[31] "High As " "High As " "High As " "High As " "High As " "High As "
[37] "High As " "High As " "High As " "High As " "High As " "High As "

$var
 [1] "BR01"      "BR01"      "BR01"      "BR06"      "BR06"      "BR06"     
 [7] "BR28"      "BR28"      "BR28"      "BR35"      "BR35"      "BR35"     
[13] "BR36"      "BR36"      "BR36"      "Jefferson" "Jefferson" "Jefferson"
[19] "Kaybonnet" "Kaybonnet" "Kaybonnet" "BR01"      "BR01"      "BR01"     
[25] "BR06"      "BR06"      "BR06"      

We can convert a JESON file to a regilar data frame:

In [None]:
%%R
df.json <- as.data.frame(df.json)
head(df.json)

  ID  treat  var rep    PH   TN   PN   GW ster   DTM   SW   GAs  STAs
1  1 Low As BR01   1  84.0 28.3 27.7 35.7 20.5 126.0 28.4 0.762 14.60
2  2 Low As BR01   2 111.7 34.0 30.0 58.1 14.8 119.0 36.7 0.722 10.77
3  3 Low As BR01   3 102.3 27.7 24.0 44.6  5.8 119.7 32.9 0.858 12.69
4  4 Low As BR06   1 118.0 23.3 19.7 46.4 20.3 119.0 40.0 1.053 18.23
5  5 Low As BR06   2 115.3 16.7 12.3 19.9 32.3 120.0 28.2 1.130 13.72
6  6 Low As BR06   3 111.0 19.0 15.3 35.9 14.9 116.3 42.3 1.011 15.97


### Import Data from Other Statistical Software

**foreign** packages is mostly used to read data stored by **Minitab**, **S**, **SAS**, **SPSS**, **Stata**, **Systat**, **dBase**, and so forth.

> install.packages("foreign")

**Haven** enables R to read and write various data formats used by other statistical packages by wrapping with [ReadStat](https://github.com/WizardMac/ReadStat) C library. written b Haven is part of the tidyverse. Current it support **SAS**, **SPSS** and **Stata** files

`read.dta()` function from **foreign** package can reads a file in Stata version 5-12 binary format (`.dta`) into a data frame.

#### Read STATA File (.dta)

In [20]:
%%R
# Foreign - read.dta()
df.dta_01 <- foreign::read.dta(paste0(dataFolder,"test_data.dta"))
# Haven - read_dta()
df.dta_02 <- haven::read_dta(paste0(dataFolder,"test_data.dta"))
head(df.dta_02)

# A tibble: 6 × 13
     ID treat  var     rep    PH    TN    PN    GW  ster   DTM    SW   GAs  STAs
  <dbl> <chr>  <chr> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
1     1 Low As BR01      1   84   28.3  27.7  35.7  20.5  126   28.4 0.762  14.6
2     2 Low As BR01      2  112.  34    30    58.1  14.8  119   36.7 0.722  10.8
3     3 Low As BR01      3  102.  27.7  24    44.6   5.8  120.  32.9 0.858  12.7
4     4 Low As BR06      1  118   23.3  19.7  46.4  20.3  119   40   1.05   18.2
5     5 Low As BR06      2  115.  16.7  12.3  19.9  32.3  120   28.2 1.13   13.7
6     6 Low As BR06      3  111   19    15.3  35.9  14.9  116.  42.3 1.01   16.0


#### Read SPSS File (.sav)

In [21]:
%%R
# Foreign - read.spss()
df.sav_01 <- foreign::read.spss(paste0(dataFolder,"test_data.sav"))
# Haven - read_sav()
df.sav_02 <- haven::read_sav(paste0(dataFolder,"test_data.sav"))
head(df.sav_02)

# A tibble: 6 × 13
     ID treat  var     rep    PH    TN    PN    GW  ster   DTM    SW   GAs  STAs
  <dbl> <chr>  <chr> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
1     1 Low As BR01      1   84   28.3  27.7  35.7  20.5  126   28.4 0.762  14.6
2     2 Low As BR01      2  112.  34    30    58.1  14.8  119   36.7 0.722  10.8
3     3 Low As BR01      3  102.  27.7  24    44.6   5.8  120.  32.9 0.858  12.7
4     4 Low As BR06      1  118   23.3  19.7  46.4  20.3  119   40   1.05   18.2
5     5 Low As BR06      2  115.  16.7  12.3  19.9  32.3  120   28.2 1.13   13.7
6     6 Low As BR06      3  111   19    15.3  35.9  14.9  116.  42.3 1.01   16.0


#### Read SAS File (.sas7bdat)

`read_sas()` function from haven package can read sas (.sas7bdat) file easily.

In [22]:
%%R
# read .sas7bdat file
df.sas <- haven::read_sas(paste0(dataFolder,"test_data.sas7bdat"))
head(df.sas)

# A tibble: 6 × 13
     ID treat  var     rep    PH    TN    PN    GW  ster   DTM    SW   GAs  STAs
  <dbl> <chr>  <chr> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
1     1 Low As BR01      1   84   28.3  27.7  35.7  20.5  126   28.4 0.762  14.6
2     2 Low As BR01      2  112.  34    30    58.1  14.8  119   36.7 0.722  10.8
3     3 Low As BR01      3  102.  27.7  24    44.6   5.8  120.  32.9 0.858  12.7
4     4 Low As BR06      1  118   23.3  19.7  46.4  20.3  119   40   1.05   18.2
5     5 Low As BR06      2  115.  16.7  12.3  19.9  32.3  120   28.2 1.13   13.7
6     6 Low As BR06      3  111   19    15.3  35.9  14.9  116.  42.3 1.01   16.0


## Data Export from R

### Writre as CSV file

In [8]:
%%R
Variety =c("BR1","BR3", "BR16", "BR17", "BR18", "BR19","BR26",
	      "BR27","BR28","BR29","BR35","BR36") # create a text vector
Yield = c(5.2,6.0,6.6,5.6,4.7,5.2,5.7,
	            5.9,5.3,6.8,6.2,5.8) # create numerical vector
rice.data= data.frame(Variety, Yield)
head(rice.data)

  Variety Yield
1     BR1   5.2
2     BR3   6.0
3    BR16   6.6
4    BR17   5.6
5    BR18   4.7
6    BR19   5.2


The popular R base functions for writing data are `write.table()`, `write.csv()`, `write.csv2()` and `write.delim()` functions.

Before start, you need to specify the working or destination directory in where you will save the data.

In [23]:
%%R
write.csv(rice.data, paste0(dataFolder, "rice_data.csv"), row.names = F) # no row names
# write.csv(rice.data, "rice_data.csv", row.names = F) # no row names

### Write as Excel File

Exporting data from R to Excel can be achieved with several packages. The most known package to export data frames or tables as Excel is "writexl", that provides the `write_xlsx` functions.


In [29]:
%%R

# write as xlsx file
writexl::write_xlsx(rice.data, paste0(dataFolder, "rice_data.xlsx"))

### JSON Objects

To write JSON Object to file, the `toJSON()` function from the `rjson` library can be used to prepare a JSON object and then use the `write()` function for writing the JSON object to a local file.

In [30]:
%%R
# create a JSON object
jsonData <-rjson::toJSON(rice.data)
# write JSON objects
write(jsonData, file= paste0(dataFolder,"rice_data.json"))

### R Data File

If you want to share the data from R as Objects and share those with your colleagues through different systems so that they can use it right away into their R-workspace. These objects are of two types **.rda/.RData** which can be used to store some or all objects, functions from R global environment.

The `save()` function allows us to save multiple objects into our global environment:

In [31]:
%%R
save(rice.data, Variety, Yield,  file= paste0(dataFolder,"rice_data.RData"))
#save(rice.data, Variety, Yield,  file="rice_data.RData")

If you specify `save.image(file = "R_objects.RData")` Export all objects (the workspace image).

To save only one object it is more recommended saving it as RDS with the `saveRDS()` function:

In [32]:
%%R
# write .RDS file
saveRDS(rice.data,  file= paste0(dataFolder,"rice_data.rds"))

### Export to Other Statistcal Software

#### STATA File

If you want export data from R to STATA, you will need to use the `write.dta()` function of the `foreign` package.

In [33]:
%%R
# write dta file
foreign::write.dta(rice.data, file= paste0(dataFolder,"rice_data.dta"))


#### SPSS File

Haven enables R to read and write various data formats used by other statistical packages by wrapping with [ReadStat](https://github.com/WizardMac/ReadStat) C library. written b Haven is part of the tidyverse. Current it support **SAS**, **SPSS** and **Stata** files

The `write_sav()` function of **haven** package can be used to export R-object to SPSS

In [35]:
%%R
# write .sav file
haven::write_sav(rice.data, "/content/drive/MyDrive/R_Website/R_Bigenner/Data/rice_data.sav")

#### SAS File

The `write_sas()` function of **haven** package can be used to export R-object to SAS (.sas7bdat)

In [38]:
%%R
# write .sav file
haven::write_sas(rice.data, "/content/drive/MyDrive/R_Website/R_Bigenner/Data/rice_data.sav")

## Summary and Conclusion

This guide covers the necessary skills to import and export data into/from R for data analysis or statistical modeling. It discusses various data formats, including CSV, Excel, and text files, and strategies for managing missing values, data types, and potential import challenges. Advanced import/export techniques are also covered, allowing you to streamline your workflow and spend more time analyzing data. By practicing these techniques, you can transform your analysis into actionable insights and make a meaningful impact in your field.

## References

1.  [How do I read data into R?](https://www.datafiles.samhsa.gov/get-help/format-specific-issues/how-do-i-read-data-r)

2.  [R Coder](https://r-coder.com/export-data-r/)

3.  [Introduction to bioinformatics](https://uclouvain-cbio.github.io/WSBIM1207/sec-bioinfo.html)

4.  [Many Ways of Reading Data Into R --- 1](https://medium.com/analytics-vidhya/many-ways-of-reading-data-into-r-1-52b02825cb27)