This has the information and description of the code here, which were used to
generate results from given data. This is written in Markdown (.md
). So, it
looks better in a Markdown viewer such as Typora.
- Title : Blood biomarkers in chronic tinnitus
- Principal Investigator: Christopher Cederroth (christopher.cederroth@ki.se)
- Organization : Karolinska Institutet
- NBIS experts : Mun-Gwan Hong (mungwan.hong@nbis.se)
Copied from Redmine
The overarching aim of this project is to identify novel blood biomarkers for tinnitus, which is part of the aims from the UNITI EU project and among the first research priority from the BTA. Biomarkers for tinnitus are currently lacking and these are essential in order to i) understand the pathophysiology of tinnitus; ii) stratify patients; iii) provide read-outs for clinical trials. Similarity in the neuropathophysiology between tinnitus and pain led us to hypothesize that inflammation is involved in tinnitus. A preliminary analysis on 548 cases of constant tinnitus and 548 age/sex matched controls showed 10% of the proteins had a strong correlation with smoking, and that 50% of proteins were strongly correlated with age and with self-reported hearing ability (Bonferroni adjusted P < 0.05). We performed a linear regression and adjusted for age, sex, BMI, smoking status, sample collection site, and hearing ability. This allowed us to identify 5 proteins with close to significant adjusted p values, namely FGF-21, MCP-4, CXCL9, GDNF, and MCP-1 (Fig. 1). These proteins were found higher in the tinnitus group and did not associate with stress, anxiety or depression, nor temporomandibular joint disorder, headache or hyperacusis. The goal of this project is to expand the analysis to the neurological panel from O'link on the same samples.
Every script has a header at the top of it. It has a description about the script as well as basic info. All input and output file names are listed in the header.
Those input and output files are supposed to be stored in the following neighbor folders. Every script is written to be executed on this current folder accessing those folders using relative path.
../data/raw_internal # raw input data
/ # intermediate derived data from raw data
/raw_external # data from external sources, e.g. public database
../reports # all the reports
../results/ # main output folder
/figures
/tables
All R functions from installed packages except those listed below, are
called with double colons (::
) to specify source package clearly.
- Packages in
R-core
(e.g.base
,stats
,utils
, ...), - Packages in
tidyverse
(e.g.dplyr
,ggplot2
,purrr
, ...) kabelExtra
janitor
- Packages of the function unable to be accessed using
::
ggfortify
#ggplot2::autoplot
for PCAlme4
#predict
for classmerMod
All R code followed
the tidyverse
R style guide.
Different software environment, e.g. different versions of R packages, from the
one during the development of the code, can yield unexpected error message or
generate dissimilar analysis results. To make the results reproducible, the
software environment around the code are saved. It can be restored ahead of
using the code following the steps below. Here, the process was facilitated by
conda
management system and renv
R package. The environment handling
software conda
is assumed to be pre-installed. Please refer to
Conda for questions about the installation.
-
Open Terminal (Mac / Linux) or Command line (Windows)
-
Navigate to the directory where this README.md file is located using the
cd
command. -
Run the commands below. It creates the same Conda environment and activate it (Note : The
nbis5797
can be chosen as your preference for your project name).conda env create -n nbis5797 -f environment.yml conda activate nbis5797
-
Run
R
and execute this command inR
to restore the same R environment.renv::init() renv::restore()
Please refer to
"introduction to renv
",
for the R environment handling package, renv
.
The Conda environment was stored in this file.
environment.yml
The R environment was saved in the folder and file below. They are supposed to
be managed only by the renv
package under R.
renv/
renv.lock
Run these lines below in an R console, which will generate all intermediate data files for data analysis and the final report. Please make sure the input files are ready in expected folders. The files were listed below the code lines.
source("master_data_generation.R") # data generation
source("master_heavy_analyses.R") # data generation
rmarkdown::render("report-5797-Tinnitus.Rmd") # main report
rmarkdown::render("report-5797-Tinnitus-Inflammation.Rmd") # Inflamation panel
rmarkdown::render("report-5797-Tinnitus-Neurology.Rmd") # Neurology panel
../data/raw_internal/20210504/All STOP questionnaire data 180118_v14_BloodAnalysis_v2.xlsx
../data/raw_internal/20210505/Variable Key STOP Frågeformulär LG 2018.xls
../data/raw_internal/20210505/Variablekey_ESITSQ.xlsx
../data/raw_internal/20210504/cederroth_tinnitus_protein_profiling_NPX_belowLOD.xlsx
../data/raw_internal/20210505/Tinnitus2_Cederroth_NPX_below_LOD.xlsx
The dates in the folder names above indicate when the files were transferred to NBIS. If multiple versions of one file were transferred, please use the one delivered on that day.
The master file for R data generation. This executes following scripts in the proper order.
gen-s1-clinc.v01.R
: Read clinical info datagen-s1-olink_proteomic.v01.R
: Read Olink proteomic datagen-s1-olink_proteomic.v02.R
: QC and preprocess - details inreport-QC-Olink_proteomic_data.Rmd
gen-s1-clinc.v02.R
: Limit samples to those with proteomic data
The master script for computationally heavy analyses.
anal-lm_resample_t.R
: T statistics from resampling
The master R markdown file to create the final report. Individual chapters were written in separate R markdown files listed below. This master file runs all of them in the right order after adding the information about the project and NBIS support.
report-sample_info.Rmd
: About samples and clinical inforeport-QC-clinical_info.Rmd
: Clinical info table QCreport-QC-Olink_proteomic_data.Rmd
: About QC and preprocessing of Olink proteomic datareport-overview_after_QC.Rmd
: Overview of the data after QCreport-assoc-overall_samples.Rmd
: Association of individual proteins - overall samplesreport-assoc-by_sex.Rmd
: Association of individual proteins - females/males only
A main R markdown for analysis for Olink inflammation panel data. This reuses some the files above.
A main R markdown for analysis for Olink neurology panel data. This reuses some the files above.
styles.css
: CSS style, used by Rmarkdown (.Rmd
) files for reportsutils.R
: A collection of useful R functionsanal-assoc_protein-utils.R
: Functions used in the reports, mainly for association testscitations_in_report.bib
: Bibliography for the reports
export_clean_data.R
: cleaned clinical info + proteomic data
export-assoc_results.R
: Export association test results