<a href="https://colab.research.google.com/github/zia207/r-colab/blob/main/NoteBook/R_Beginner/01-02-01-data-import-export-r.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

![alt text](http://drive.google.com/uc?export=view&id=1bLQ3nhDbZrCCqy_WCxxckOne2lgVvn3l)

# Data Import/Export to/from R



One of the most important steps in data analysis is importing data into R and exporting from R. This process can be done using various functions depending on the format of the data, such as CSV, Excel, or SQL. In this context, it is essential to learn some of the most common ways to read and write data with R. By importing data into R, users can perform a wide range of data analysis, from simple data visualization to complex machine learning algorithms. Therefore, mastering data importation and exporting is a fundamental skill for any data scientist or analyst.

## Install rpy2

Easy way to run R in Colab with Python runtime using **rpy2** python package. We have to install this package using the pip command:

In [None]:
!pip uninstall rpy2 -y
! pip install rpy2==3.5.1
%load_ext rpy2.ipython

Found existing installation: rpy2 3.5.17
Uninstalling rpy2-3.5.17:
  Successfully uninstalled rpy2-3.5.17
Collecting rpy2==3.5.1
  Downloading rpy2-3.5.1.tar.gz (201 kB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m201.7/201.7 kB[0m [31m11.2 MB/s[0m eta [36m0:00:00[0m
[?25h  Preparing metadata (setup.py) ... [?25l[?25hdone
Building wheels for collected packages: rpy2
  Building wheel for rpy2 (setup.py) ... [?25l[?25hdone
  Created wheel for rpy2: filename=rpy2-3.5.1-cp311-cp311-linux_x86_64.whl size=314978 sha256=5a50b239f286ea4760d39df914dfde92c7f52df506e9eaa0a51ac6a297695122
  Stored in directory: /root/.cache/pip/wheels/e9/55/d1/47be85a5f3f1e1f4d1e91cb5e3a4dcb40dd72147f184c5a5ef
Successfully built rpy2
Installing collected packages: rpy2
Successfully installed rpy2-3.5.1


##  Mount Google Drive

Then you must create a folder in Goole drive named "R" to install all packages permanently. Before installing R-package in Python runtime. You have to mount Google Drive and follow on-screen instruction:

In [None]:
from google.colab import drive
drive.mount('/content/drive')

Mounted at /content/drive


## Check and Install Required R Packages

In this exercise, we will use the following R-packages:

1.   [readxl](https://readxl.tidyverse.org/): to read MS **Excel** file.  usually comes with [tidyverse](https://www.tidyverse.org/)
2. [writexl](https://docs.ropensci.org/writexl/):Portable, light-weight data frame to xlsx exporter based on libxlsxwriter. No Java or Excel required.
3.  [rjson](https://github.com/alexcb/rjson): to read **.json** file
4.  [foreign](https://cran.r-project.org/web/packages/foreign/index.html): to read data stored by **Minitab**, **S**, **SAS**, **SPSS**, **Stata**, **Systat**, **dBase**, and so forth.
5.  [haven](https://haven.tidyverse.org/): read and write data from different statistical packages. It usually comes with [tidyverse](https://www.tidyverse.org/)


### Install Required R Packages

In [None]:
%%R
packages <- c(
          'readxl',
          'writexl',
          'rjson',
          'foreign',
          'haven'
)

### Install Missing Packages

In [None]:
%%R
# Install missing packages
new.packages <- packages[!(packages %in% installed.packages(lib='drive/My Drive/R/')[,"Package"])]
if(length(new.packages)) install.packages(new.packages, lib='drive/My Drive/R/')



























	‘/tmp/RtmpE6MXsx/downloaded_packages’



In [None]:
%%R
# set library path
.libPaths('drive/My Drive/R')
# Verify installation
cat("Installed packages:\n")
print(sapply(packages, requireNamespace, quietly = TRUE))

Installed packages:
 readxl writexl   rjson foreign   haven 
   TRUE    TRUE    TRUE    TRUE    TRUE 


### Load R-packages

In [None]:
%%R
# set library path
.libPaths('drive/My Drive/R')
# Load packages with suppressed messages
invisible(lapply(packages, function(pkg) {
  suppressPackageStartupMessages(library(pkg, character.only = TRUE))
}))
# Check loaded packages
cat("Successfully loaded packages:\n")
print(search()[grepl("package:", search())])# Check loaded packages


Successfully loaded packages:
 [1] "package:haven"     "package:foreign"   "package:rjson"    
 [4] "package:writexl"   "package:readxl"    "package:tools"    
 [7] "package:stats"     "package:graphics"  "package:grDevices"
[10] "package:utils"     "package:datasets"  "package:methods"  
[13] "package:base"     


## Data


All data set use in this exercise can be downloaded from my [Dropbox](https://www.dropbox.com/scl/fo/fohioij7h503duitpl040/h?rlkey=3voumajiklwhgqw75fe8kby3o&dl=0) or from my [Github](https://github.com/zia207/r-colab/tree/main/Data/R_Beginners) accounts.

It would be best if you created a working directory in R to read and write files locally. The following example shows how to create the working directory in R.

Before creating a working directory, you may check the directory of your current R session; the function `getwd()` will print the current working directory path as a string.

If you want to change the working directory in R you just need to call the `setwd()` function, specifying as argument the path of the new working directory folder.

> setwd("F:\\R-Project")

> setwd("F:/R-Project")

Remember that you must use the forward slash `/` or double backslash `\\ in R! The Windows format of single backslash will not work.

The files under in a directory can check using `dir()` function

> dir()

## Data Import Into R

### Read Text File (.txt)

A `text file` is a type of computer file that contains only `plain text` — which means it includes letters, numbers, symbols, and spaces that can be read by humans and processed by computers. It doesn't include any special formatting like bold, italics, or images—just raw text.

Key points about text files:

- `File extension`: Usually ends in `.txt`
- `Editable with`: Any text editor (like Notepad, TextEdit`, VS Code, etc.)
- `Encoding`: Often uses encodings like `ASCII` or `UTF-8
- `Uses`:
  - Storing notes or data
  - Writing code (before saving with a specific programming extension)
  - Configuration files for software (like `.ini`, `.conf`, etc.)

Text files are often used for data storage and transfer because they are simple and widely supported. They can be easily created, edited, and read by both humans and machines. The easiest form of data to import into R is a simple text file. The primary function to import from a text file is `read.table()`.

```         
read.table(file, header = FALSE, sep = "", quote =""'",.....)
```


In [None]:
%%R
dataFolder<- "/content/drive/MyDrive/R_Website/R_Bigenner/Data/"
df.txt<-read.table(paste0(dataFolder,"test_data.txt"), header= TRUE)


   ID treat  var rep    PH   TN   PN   GW ster   DTM   SW   GAs  STAs
1 Low    As BR01   1  84.0 28.3 27.7 35.7 20.5 126.0 28.4 0.762 14.60
2 Low    As BR01   2 111.7 34.0 30.0 58.1 14.8 119.0 36.7 0.722 10.77
3 Low    As BR01   3 102.3 27.7 24.0 44.6  5.8 119.7 32.9 0.858 12.69
4 Low    As BR06   1 118.0 23.3 19.7 46.4 20.3 119.0 40.0 1.053 18.23
5 Low    As BR06   2 115.3 16.7 12.3 19.9 32.3 120.0 28.2 1.130 13.72
6 Low    As BR06   3 111.0 19.0 15.3 35.9 14.9 116.3 42.3 1.011 15.97


Or you can directly load data directly from my Github data folder using following code:

In [None]:
%%R
df.txt<-read.table("https://github.com/zia207/r-colab/raw/main/Data/R_Beginners/test_data.txt",
                    header= TRUE)
head(df.txt)

   ID treat  var rep    PH   TN   PN   GW ster   DTM   SW   GAs  STAs
1 Low    As BR01   1  84.0 28.3 27.7 35.7 20.5 126.0 28.4 0.762 14.60
2 Low    As BR01   2 111.7 34.0 30.0 58.1 14.8 119.0 36.7 0.722 10.77
3 Low    As BR01   3 102.3 27.7 24.0 44.6  5.8 119.7 32.9 0.858 12.69
4 Low    As BR06   1 118.0 23.3 19.7 46.4 20.3 119.0 40.0 1.053 18.23
5 Low    As BR06   2 115.3 16.7 12.3 19.9 32.3 120.0 28.2 1.130 13.72
6 Low    As BR06   3 111.0 19.0 15.3 35.9 14.9 116.3 42.3 1.011 15.97


However, `scan()` function could be used to scan and read data. It is usually used to read data into vector or list or from file in R Language.

```
scan(file = "", what = double(), nmax = -1, n = -1, sep ="",..)
```

In [None]:
%%R
#df.scan<-scan(paste0(dataFolder,"test_data.txt"),  what = list("", "", ""))
df.scan<-scan("https://github.com/zia207/r-colab/raw/main/Data/R_Beginners/test_data.txt",
                    what = list("", "", ""))




### Comma-Separated File (.csv)

A comma delimited or comma-separated file (CSV) is one where each value in the file is separated by a `comma`, although other characters can be used. Reading data from a CSV file is made easy by the `read.csv()`, an extension of `read.table()`. It facilitates the direct import of data from CSV files.

```
read.csv(file, header = TRUE, sep = ",", quote = """,...)
```


In [None]:
%%R
df.csv<-read.csv(paste0(dataFolder,"test_data.csv"), header= TRUE)

Or you can load data directly from my Github data folder using following code:

In [None]:
%%R
df.csv<-read.csv("https://github.com/zia207/r-colab/raw/main/Data/R_Beginners/test_data.csv",
                  header= TRUE)

### Excel Files (.xlsx)

If you want to get data from Excel into R, one of the easiest ways to do it is to export the Excel file to a CSV file and then import it using the above method. But if you don't want to do that, you can use the {readxl} package. It's easy to use since it has no extra dependencies, so that you can install it on any operating system.

{readxl} package supports both the legacy `.xls` format and the modern xml-based `.xlsx` format. The libxls C library is used to support `.xls`, which abstracts away many of the complexities of the underlying binary format. To parse `.xlsx`, we use the RapidXML C++ library.

To install the package, you can use the following command:

```
install.packages(readxl")
```

`read_excel()` reads both `xls` and `xlsx` files and detects the format from the extension.

In [None]:
%%R
# Import Sheet 1, from a excel file
df.xl <-readxl::read_excel(paste0(dataFolder,"test_data.xlsx"), 1)
head(df.xl)

# A tibble: 6 × 13
     ID treat  var     rep    PH    TN    PN    GW  ster   DTM    SW   GAs  STAs
  <dbl> <chr>  <chr> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
1     1 Low As BR01      1   84   28.3  27.7  35.7  20.5  126   28.4 0.762  14.6
2     2 Low As BR01      2  112.  34    30    58.1  14.8  119   36.7 0.722  10.8
3     3 Low As BR01      3  102.  27.7  24    44.6   5.8  120.  32.9 0.858  12.7
4     4 Low As BR06      1  118   23.3  19.7  46.4  20.3  119   40   1.05   18.2
5     5 Low As BR06      2  115.  16.7  12.3  19.9  32.3  120   28.2 1.13   13.7
6     6 Low As BR06      3  111   19    15.3  35.9  14.9  116.  42.3 1.01   16.0


### JSON Files (.json)

JSON is an open standard file and lightweight data-interchange format that stands for *J*ava*S*cript *O*bject *N*otation. The JSON file is a text file that is language independent, self-describing, and easy to understand.

The JSON file is read by R as a **list** using the function `fromJSON()` of  {rjson} package.

```
install.packages("rjson")
fromJSON(json_str, file, method = "C", unexpected.escape = "error", sim..)
```

In [None]:
%%R
# read .json file
df.json <- rjson::fromJSON(file= paste0(dataFolder, "test_data.json"),  simplify=TRUE)
#print(df.json)

We can convert a JESON file to a regilar data frame:

In [None]:
%%R
df.json <- as.data.frame(df.json)
head(df.json)

  ID  treat  var rep    PH   TN   PN   GW ster   DTM   SW   GAs  STAs
1  1 Low As BR01   1  84.0 28.3 27.7 35.7 20.5 126.0 28.4 0.762 14.60
2  2 Low As BR01   2 111.7 34.0 30.0 58.1 14.8 119.0 36.7 0.722 10.77
3  3 Low As BR01   3 102.3 27.7 24.0 44.6  5.8 119.7 32.9 0.858 12.69
4  4 Low As BR06   1 118.0 23.3 19.7 46.4 20.3 119.0 40.0 1.053 18.23
5  5 Low As BR06   2 115.3 16.7 12.3 19.9 32.3 120.0 28.2 1.130 13.72
6  6 Low As BR06   3 111.0 19.0 15.3 35.9 14.9 116.3 42.3 1.011 15.97


### Import Data from Other Statistical Software

{foreign} packages is mostly used to read data stored by `Minitab`, `SAS`, `SPSS`, `Stata`, `Systat`, `dBase`, and so forth.

```
install.packages("foreign")
```

{Haven} enables R to read and write various data formats used by other statistical packages by wrapping with [ReadStat](https://github.com/WizardMac/ReadStat) C library. written b Haven is part of the tidyverse. Current it support `SAS`, `SPSS` and `Stata` files

```
install.packages("haven")
```


#### STATA File (.dta)

##### Foreign - read.dta()

`read.dta()` function from {foreign} package can reads a file in Stata version 5-12 binary format (`.dta`) into a data frame.

In [None]:
%%R
# read .dta file
df.dta_01 <- foreign::read.dta(paste0(dataFolder,"test_data.dta"))

##### Haven - read_dta()
`read_dta()` function from {haven} package can read a file in Stata version 5-12 binary format (`.dta`) into a data frame.

In [None]:
%%R
# read .dta file
df.dta_02 <- haven::read_dta(paste0(dataFolder,"test_data.dta"))

#### SPSS File (.sav)

##### Foreign - read.spss()

In [None]:
%%R
# read .sav file
df.sav_01 <- foreign::read.spss(paste0(dataFolder,"test_data.sav"))

##### Haven - read_sav()

In [None]:
%%R
# read .sav file
df.sav_02 <- haven::read_sav(paste0(dataFolder,"test_data.sav"))

#### SAS File (.sas7bdat)

`read_sas()` function from haven package can read sas (.sas7bdat) file easily.

## Write as CSV Files

First of all, let create a data frame that we will going to export as a text/CSV file.



In [None]:
%%R
Variety =c("BR1","BR3", "BR16", "BR17", "BR18", "BR19","BR26",
	      "BR27","BR28","BR29","BR35","BR36") # create a text vector
Yield = c(5.2,6.0,6.6,5.6,4.7,5.2,5.7,
	            5.9,5.3,6.8,6.2,5.8) # create numerical vector
rice.data= data.frame(Variety, Yield)
head(rice.data)

  Variety Yield
1     BR1   5.2
2     BR3   6.0
3    BR16   6.6
4    BR17   5.6
5    BR18   4.7
6    BR19   5.2


The popular R base functions for writing data are `write.table()`, `write.csv()`, `write.csv2()` and `write.delim()` functions.

Before start, you need to specify the working or destination directory in where you will save the data.

In [None]:
%%R
dataFolder<- "/content/drive/MyDrive/R_Website/R_Bigenner/Data/"
write.csv(rice.data, paste0(dataFolder, "rice_data.csv"), row.names = F) # no row names


## Write as Excel File

Exporting data from R to Excel can be achieved with several packages. The most known package to export data frames or tables as Excel is **writexl**, that provides the ``write_xlsx()` functions.

In [None]:
%%R
dataFolder<- "/content/drive/MyDrive/R_Website/R_Bigenner/Data/"
# write as xlsx file
writexl::write_xlsx(rice.data, paste0(dataFolder, "rice_data.xlsx"))

## Write as JSON Objects

To write JSON Object to file, the `toJSON()` function from the **rjson** library can be used to prepare a JSON object and then use the `write()` function for writing the JSON object to a local file.

In [None]:
%%R
dataFolder<-"/content/drive/MyDrive/R_Website/R_Bigenner/Data/"
# create a JSON object
jsonData <-rjson::toJSON(rice.data)
# write JSON objects
write(jsonData, file= paste0(dataFolder,"rice_data.json"))

## R Data File

If you want to share the data from R as Objects and share those with your colleagues through different systems so that they can use it right away into their R-workspace. These objects are of two types **.rda/.RData** which can be used to store some or all objects, functions from R global environment.

The `save()` function allows us to save multiple objects into our global environment:.

In [None]:
%%R
dataFolder<-"/content/drive/MyDrive/R_Website/R_Bigenner/Data/"
save(rice.data, Variety, Yield,  file= paste0(dataFolder,"rice_data.RData"))

If you specify `save.image(file = "R_objects.RData")` Export all objects (the workspace image).

To save only one object it is more recommended saving it as RDS with the `saveRDS()` function:

In [None]:
%%R
saveRDS(rice.data,  file= paste0(dataFolder,"rice_data.rds"))

If you specify `compress = TRUE` as argument of the above functions the file will be compressed by default as gzip.

## Export to Other Statistcal Software


### Write STATA File (.dta)

If you want export data from R to STATA, you will need to use the `write.dta()` function of the **foreign** package. This package provides functions for r

In [None]:
%%R
dataFolder<-"/content/drive/MyDrive/R_Website/R_Bigenner/Data/"
foreign::write.dta(rice.data, file= paste0(dataFolder,"rice_data.dta"))

### SPSS File

Haven enables R to read and write various data formats used by other statistical packages by wrapping with [ReadStat](https://github.com/WizardMac/ReadStat) C library. written b Haven is part of the tidyverse. Current it support **SAS**, **SPSS** and **Stata** files

The `write_sav()` function of **haven** package can be used to export R-object to SPSS

In [None]:
%%R
dataFolder<-"/content/drive/MyDrive/R_Website/R_Bigenner/Data/"
# write .sav file
haven::write_sav(rice.data, "/content/drive/MyDrive/R_Website/R_Bigenner/Data/rice_data.sav")

### Write as a SAS File (.sas7bdat)

The `write_sas()` function of **haven** package can be used to export R-object to SAS (.sas7bdat).

In [None]:
%%R
haven::write_sas(rice.data, "/content/drive/MyDrive/R_Website/R_Bigenner/Data/rice_data.sas7bdat")

## Summary

This comprehensive tutorial provides an in-depth guide on exporting data from R, emphasizing the critical role of effective data dissemination in the data analysis workflow. The tutorial covers various methods to export data to different formats and other statistical software, including CSV, Excel, SAS, SPSS, and Stata. Understanding these techniques is pivotal for sharing your findings and insights with others or integrating your results into different applications. In addition to the methods for exporting data, the tutorial also covers essential considerations during the export process. For instance, it highlights the importance of handling factors like row and column names, specifying delimiters, and addressing potential challenges that may arise. By paying attention to these details, you can ensure that the exported data maintains its integrity and is readily usable by others. The tutorial also emphasizes that the ability to export data efficiently is not just about generating output but also about facilitating collaboration and communication. Whether you're sharing results with colleagues, collaborating on research, or integrating data into external systems, the skills learned in this tutorial empower you to present your findings in a clear and accessible manner. As your data analysis journey progresses, the tutorial encourages you to explore advanced export techniques, such as connecting to databases for direct exports or automating data export processes. Expanding your knowledge in this area will enhance your capabilities and streamline your workflow, allowing you to spend more time analyzing data and less time on data management tasks. Ultimately, the tutorial stresses that effective data export is crucial to transforming your analysis into actionable insights. To achieve this, it recommends practicing these export techniques on diverse datasets, adapting them to your specific needs, and embracing the opportunities to communicate your findings effectively. With these skills, you'll be well-equipped to share your data-driven discoveries with the world, making a meaningful impact in your field.

## Further Reading

1.  [How do I read data into R?](https://www.datafiles.samhsa.gov/get-help/format-specific-issues/how-do-i-read-data-r)

2.  [R Coder](https://r-coder.com/export-data-r/)

3.  [Introduction to bioinformatics](https://uclouvain-cbio.github.io/WSBIM1207/sec-bioinfo.html)

4.  [Many Ways of Reading Data Into R --- 1](https://medium.com/analytics-vidhya/many-ways-of-reading-data-into-r-1-52b02825cb27)