Skip to content
Permalink
Branch: master
Find file Copy path
Find file Copy path
Fetching contributors…
Cannot retrieve contributors at this time
109 lines (84 sloc) 4.42 KB
---
title: "Data Wrangling"
author: "Robert Gilmore"
date: "`r Sys.Date()`"
output: rmarkdown::html_vignette
vignette: >
%\VignetteIndexEntry{Data Wrangling}
%\VignetteEngine{knitr::rmarkdown}
%\VignetteEncoding{UTF-8}
---
```{r setup, include = FALSE}
knitr::opts_chunk$set(
collapse = TRUE,
comment = "#>"
)
library(MicrobiomeR)
pkg.data <- MicrobiomeR:::pkg.private
```
# Import
Importing data is done via the [phyloseq](https://github.com/joey711/phyloseq) package. [MicrobiomeR](https://github.com/vallenderlab/MicrobiomeR) was developed by testing data from [Nephele's
16S Qiime](https://nephele.niaid.nih.gov/user_guide_pipes/#amplicon_pipes) pipeline, which will influences the workflow for these vignettes. While one can use any phyloseq object generated by phyloseq for MicrobiomeR, we suggest using the function, `MicrobiomeR::create_phyloseq()`, and we will demonstrate using it to import the Nephele data, which includes a [biom
file](http://biom-format.org/), a phylogenetic tree file, and the metadata file ([Nephele mapping
file](https://nephele.niaid.nih.gov/user_guide/)).
file.
(_**Note**_: *Test data can be found in a private environment variable called with MicrobiomeR:::pkg.private*)
```{r message=FALSE, warning=FALSE}
# Get the data files from package
input_files <- pkg.data$input_files
biom_file <- input_files$biom_files$silva # Path to silva biom file
tree_file <- input_files$tree_files$silva # Path to silva tree file
metadata_file <- input_files$metadata$two_groups # Path to Nephele metadata
parse_func <- parse_taxonomy_silva_128 # A custom phyloseq parsing function for silva annotations
# Get the phyloseq object
phyloseq_object <- create_phyloseq(
biom_file = biom_file,
tree_file = tree_file,
metadata_file = metadata_file,
parse_func = parse_func
)
```
# Formatting
After importing amplicon data with phyloseq, you will need to convert the phyloseq object to a
[taxa::taxmap](https://github.com/ropensci/taxa) object for use with the
[metacoder](https://github.com/grunwaldlab/metacoder) package. MicrobiomeR has added additional
"formatting" to the taxmap object in the form of specific tables names. For more information please
consult the Formatting help page in your R console via `?MicrobiomeR::MicrobiomeR_Formats`.
```{r message=FALSE, warning=FALSE}
p_obj <- phyloseq_object
# Create the various formats
phy_format <- as_MicrobiomeR_format(obj = p_obj, format = "phyloseq_format")
raw_format <- as_MicrobiomeR_format(obj = p_obj, format = "raw_format")
basic_format <- as_MicrobiomeR_format(obj = p_obj, format = "basic_format")
analyzed_format <- as_MicrobiomeR_format(obj = p_obj, format = "analyzed_format")
```
The various formats that have been defined include the "phyloseq", "raw", "basic", and "analyzed"
formats. These formats define the level or stage that the taxmap object is in. The only difference
between MicrobiomeR taxmap objects and regular taxmap objects is the naming convention used for the
observation data (e.g. `names(MicrobiomeR_obj$data)`).
```{r message=FALSE, warning=FALSE}
# Show the difference in MicrobiomeR formats
print(names(phy_format$data))
print(names(raw_format$data))
print(names(basic_format$data))
print(names(analyzed_format$data))
```
MicrobiomeR provides a group of helpful functions that can be used to format taxmap objects including:
* `which_format()` for returning the format of the taxmap object.
* `is_*_format()` for testing if an object is in a specific format.
* `as_*_format()` for converting an object to a specific format.
* `order_metacoder_data()` for ordering the observation data for analysis.
* `validate_MicrobiomeR_format()` for validating that the taxmap object is in a specific format.
```{r message=FALSE, warning=FALSE}
# Determine format
which_format(analyzed_format)
is_analyzed_format(analyzed_format)
# Forcing validation
low_level_format <- validate_MicrobiomeR_format(obj = raw_format, valid_formats = c("analyzed_format", "basic_format"), force_format = TRUE, min_or_max = min)
which_format(low_level_format)
high_level_format <- validate_MicrobiomeR_format(obj = raw_format, valid_formats = c("analyzed_format", "basic_format"), force_format = TRUE, min_or_max = max)
which_format(high_level_format)
```
However, most of the time you will simply use `as_MicrobiomeR_format(format=...)` to get an object
into the proper format.
(**Please note that MicrobiomeR _expects_ the data to be imported with phyloseq and then converted to a taxmap object**)
You can’t perform that action at this time.