<a href="https://colab.research.google.com/github/zia207/r-colab/blob/main/NoteBook/R_Beginner/01-02-02-readr-data-import-export.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

![alt text](http://drive.google.com/uc?export=view&id=1bLQ3nhDbZrCCqy_WCxxckOne2lgVvn3l)

#  Data Import and Export with readr in R



[**readr**](https://readr.tidyverse.org/) offers a fast and user-friendly method of reading rectangular-shaped data from delimited files, including comma-separated values (CSV) and tab-separated values (TSV). Its design is intended to parse various data types encountered in real-world scenarios, and it provides a comprehensive problem report that informs you when parsing leads to unexpected outcomes

![alt text](http://drive.google.com/uc?export=view&id=1oWBJ6wsDdVaThjQnOzkbSu-aNERJanSD)

### Installation and use

The easiest way to get readr is to install the [**tidyverse**](https://www.tidyverse.org/). The tidyverse is a collection of R packages designed for data science. It includes several packages that work well together to facilitate data manipulation, visualization, and analysis in a consistent and coherent manner. Some key packages within the tidyverse include {ggplot2} for plotting, {dplyr} for data manipulation, {tidyr} for data tidying, {readr} for data import, and more.



![alt text](http://drive.google.com/uc?export=view&id=1nIxQ3b5no2Pk3ivCYzgiuabCSf4djdgP)



The core tidyverse includes the packages that you're likely to use in everyday data analyses. As of tidyverse 1.3.0, the following packages are included in the core tidyverse:

![alt text](http://drive.google.com/uc?export=view&id=1oR3AgKt5MVj4eUn1HswZ60YUapVmqXns)



As well as {readr**, for reading flat files, the tidyverse package installs a number of other packages for reading data:

-   **DBI** for relational databases. You'll need to pair DBI with a database specific backends like *RSQLite*, *RMariaDB*, *RPostgres*, or *odbc*. Learn more at https://db.rstudio.com.

-   **haven** for SPSS, Stata, and SAS data.

-   **httr** for web APIs.

-   **readxl** for .xls and .xlsx sheets.

-   **googlesheets4** for Google Sheets via the Sheets API v4.

-   **googledrive** for Google Drive files.

-   **rvest** for web scraping.

-   **jsonlite** for JSON. (Maintained by Jeroen Ooms.)

-   **xml2** for XML


## Install rpy2

Easy way to run R in Colab with Python runtime using **rpy2** python package. We have to install this package using the pip command:

In [1]:
!pip uninstall rpy2 -y
! pip install rpy2==3.5.1
%load_ext rpy2.ipython

Found existing installation: rpy2 3.5.17
Uninstalling rpy2-3.5.17:
  Successfully uninstalled rpy2-3.5.17
Collecting rpy2==3.5.1
  Downloading rpy2-3.5.1.tar.gz (201 kB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m201.7/201.7 kB[0m [31m3.6 MB/s[0m eta [36m0:00:00[0m
[?25h  Preparing metadata (setup.py) ... [?25l[?25hdone
Building wheels for collected packages: rpy2
  Building wheel for rpy2 (setup.py) ... [?25l[?25hdone
  Created wheel for rpy2: filename=rpy2-3.5.1-cp311-cp311-linux_x86_64.whl size=314973 sha256=a1f46f6ef2af12de37637c9c2f27d530fbe255a38b1563e08a72b17ab412928b
  Stored in directory: /root/.cache/pip/wheels/e9/55/d1/47be85a5f3f1e1f4d1e91cb5e3a4dcb40dd72147f184c5a5ef
Successfully built rpy2
Installing collected packages: rpy2
Successfully installed rpy2-3.5.1


##  Mount Google Drive

Then you must create a folder in Goole drive named "R" to install all packages permanently. Before installing R-package in Python runtime. You have to mount Google Drive and follow on-screen instruction:

In [2]:
from google.colab import drive
drive.mount('/content/drive')

Mounted at /content/drive


## Check and Install Required R Packages

In [3]:
%%R
packages <- c(
          'tidyverse'
)

In [None]:
%%R
# Install missing packages
new_packages <- packages[!(packages %in% installed.packages()[,"Package"])]
if(length(new_packages)) install.packages(new_packages)

# Verify installation
cat("Installed packages:\n")
print(sapply(packages, requireNamespace, quietly = TRUE))

## Load R-packages

In [4]:
%%R
# Load packages with suppressed messages
invisible(lapply(packages, function(pkg) {
  suppressPackageStartupMessages(library(pkg, character.only = TRUE))
}))


In [6]:
%%R
# Check loaded packages
cat("Successfully loaded packages:\n")
print(search()[grepl("package:", search())])

Successfully loaded packages:
 [1] "package:lubridate" "package:forcats"   "package:stringr"  
 [4] "package:dplyr"     "package:purrr"     "package:readr"    
 [7] "package:tidyr"     "package:tibble"    "package:ggplot2"  
[10] "package:tidyverse" "package:tools"     "package:stats"    
[13] "package:graphics"  "package:grDevices" "package:utils"    
[16] "package:datasets"  "package:methods"   "package:base"     


## Data


All data set use in this exercise can be downloaded from my [Dropbox](https://www.dropbox.com/scl/fo/fohioij7h503duitpl040/h?rlkey=3voumajiklwhgqw75fe8kby3o&dl=0) or from my [Github](https://github.com/zia207/r-colab/tree/main/Data/R_Beginners) accounts.



## Read CSV files with readr

A `tibble`, or `tbl_df`, is the latest method for reimagining of modern data-frame and It keeps all the crucial features regarding the data frame. Since R is an old language, and some things that were useful 10 or 20 years ago now get in your way. It's difficult to change base R without breaking existing code, so most innovation occurs in `tibble()` data-frame with tibble package.

*Key features of Tibble*

-   A Tibble never alters the input type.

-   With Tibble, there is no need for us to be bothered about the automatic changing of characters to strings.

-   Tibbles can also contain columns that are the lists.

-   We can also use non-standard variable names in Tibble.

-   We can start the name of a Tibble with a number, or we can also contain space.

-   To utilize these names, we must mention them in backticks.

-   Tibble only recycles the vectors with a length of 1.

-   Tibble can never generate the names of rows.

source: https://www.educative.io/answers/what-is-tibble-versus-data-frame-in-r

We can use following functions {readr} package to import tabular data into R as tibble.

`read_csv()` and `read_tsv()` are special cases of the more general `read_delim()`. They're useful for reading the most common types of flat file data, comma separated values and tab separated values, respectively. `read_csv2()` uses ⁠;⁠ for the field separator and  for the decimal point. This format is common in some European countries.

For example, we will use `read_csv()` to import CSV file and see use `glimpse()` functions of {dplyr} package to explore the file structure.



In [7]:
%%R
df.chem_01<-readr::read_csv("https://github.com/zia207/r-colab/raw/main/Data/R_Beginners/PAHdata.csv")
dplyr::glimpse(df.chem_01)


Rows: 20 Columns: 23
── Column specification ────────────────────────────────────────────────────────
Delimiter: ","
chr  (1): Subject
dbl (22): Napthalene, 1-Methyl Napthalene, 2-Methyl Napthalene, Acenapthylen...

ℹ Use `spec()` to retrieve the full column specification for this data.
ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
Rows: 20
Columns: 23
$ Subject                     <chr> "P1", "P3", "P4", "P5", "P6", "P7", "P8", …
$ Napthalene                  <dbl> 0.8993, 3.6257, 3.3921, 3.5772, 4.4907, NA…
$ `1-Methyl Napthalene`       <dbl> 4.9681, 4.6941, 3.5386, 4.7475, 5.1147, NA…
$ `2-Methyl Napthalene`       <dbl> 2.1508, 3.9316, 1.6955, 2.9361, 3.9976, NA…
$ Acenapthylene               <dbl> 0.0131, 3.0151, 1.3859, 3.3943, 6.6593, NA…
$ `1,2 Dimethyl napthalene`   <dbl> NA, NA, 1.2389, 2.6427, 2.1442, NA, 0.3623…
$ `1,6 Dimethyl Napthalene`   <dbl> 0.7003, 2.6382, 1.3807, 1.1006, 2.2575, NA…
$ Fluorene                    <dbl> 2.2481, 7.349

## Write CSV files with readr

The `write()`⁠ family functions of are an improvement to analogous function such as `write.csv()` because they are approximately twice as fast. Unlike `write.csv()`, these functions do not include row names as a column in the written file. A generic function, output_column(), is applied to each variable to coerce columns to suitable output.

We can use following functions **readr** package to extort tabular data from R:

![alt text](http://drive.google.com/uc?export=view&id=1oJZIXaocGY6FoaHmXInqTNx6_j-TKYJd)

In [8]:
%%R
dataFolder<- "/content/drive/MyDrive/R_Website/R_Bigenner/Data/"
# write as xlsx file
readr::write_csv(df.chem_01, paste0(dataFolder,"df.chem_01.csv"))

We can also use `as_tibble()` function of **tibble** package

In [9]:
%%R
df.chem_03<-tibble::as_tibble(read.csv(paste0(dataFolder,"PAHdata.csv"), check.names = FALSE))
str(df.chem_03)

tibble [20 × 23] (S3: tbl_df/tbl/data.frame)
 $ Subject                  : chr [1:20] "P1" "P3" "P4" "P5" ...
 $ Napthalene               : num [1:20] 0.899 3.626 3.392 3.577 4.491 ...
 $ 1-Methyl Napthalene      : num [1:20] 4.97 4.69 3.54 4.75 5.11 ...
 $ 2-Methyl Napthalene      : num [1:20] 2.15 3.93 1.7 2.94 4 ...
 $ Acenapthylene            : num [1:20] 0.0131 3.0151 1.3859 3.3943 6.6593 ...
 $ 1,2 Dimethyl napthalene  : num [1:20] NA NA 1.24 2.64 2.14 ...
 $ 1,6 Dimethyl Napthalene  : num [1:20] 0.7 2.64 1.38 1.1 2.26 ...
 $ Fluorene                 : num [1:20] 2.25 7.35 7.16 8.44 9.24 ...
 $ 1,6,7 Trimethylnapthalene: num [1:20] 5.1 6.79 6.52 4.68 6.46 ...
 $ Anthracene               : num [1:20] 10.17 9.64 22.4 26.38 20.96 ...
 $ Dibenzothiopene          : num [1:20] 1.16 4.12 4.23 3.99 3.26 ...
 $ 2-Methyl Anthracene      : num [1:20] 0.541 4.519 8.401 13.01 4.49 ...
 $ 1-Methylphenanthrene     : num [1:20] 14.96 12.09 19.49 11.21 2.03 ...
 $ 2-Methylphenanthrene     : num [

## Summary and Conclusion

This tutorial provides an overview of the data import-export capabilities offered by the R package **readr**. By simplifying the process of reading and writing data in R, **readr** offers a seamless experience for handling various file formats. The tutorial begins by demonstrating how to use `read_csv()`, `read_tsv()`, and `read_delim()` functions to read data from CSV, TSV, and custom-delimited files respectively. The automatic data type inference and streamlined reading process make **readr** a valuable tool for handling diverse datasets. In addition, the tutorial covers the export functionalities of **readr**, demonstrating how to use `write_csv()`, `write_tsv()`, and `write_delim()` functions to write data frames to external files in a straightforward manner. The consistency in syntax across reading and writing functions simplifies the data import-export process. Another highlight is readr's support for handling large datasets efficiently, making it an ideal choice for projects involving extensive data volumes. As you integrate **readr** into your data analysis workflow, it is recommended that you explore its additional features, such as custom column types, locale settings, and flexible options for handling missing or malformed data. Leveraging these capabilities will enhance your ability to handle various data scenarios.

## References

1.  [Many Ways of Reading Data Into R 1](https://medium.com/analytics-vidhya/many-ways-of-reading-data-into-r-1-52b02825cb27)

2.  [readr](https://readr.tidyverse.org/)