## Project Proposal

### Integrative Analysis of the Cervicovaginal Microbiome and Systemic Inflammation in HPV Persistence among Nigerian Women

#### By Tiffany Tang in collaboration with Mykhalo Usyk

**State the problem**: Persistent infection with high-risk human papillomavirus (hrHPV), particularly types 16 and 18, is a major driver of cervical cancer. However, not all hrHPV infections persist; many are cleared naturally. Understanding the factors that influence viral persistence is critical for identifying individuals at elevated risk. This project investigates how the cervicovaginal microbiome and systemic inflammatory cytokines impact the persistence or clearance of hrHPV in a Nigerian cohort.


**What’s its biological relevance:** Cervical cancer remains a significant cause of morbidity and mortality, particularly in sub-Saharan Africa. The biological mechanisms by which the host immune environment and microbiome influence HPV infection outcomes are not fully understood. Identifying microbial or cytokine biomarkers predictive of hrHPV persistence could improve early detection and stratification of high-risk individuals, and potentially guide targeted interventions or vaccine development.


**Where are the resources for analysis:** This project will leverage a rich, multi-layered dataset that includes:
- ~2,000 cervicovaginal samples with 16S rRNA V4 amplicon sequencing (FASTQ format)


- Serum cytokine data (~30 analytes) for ~900 samples


- hrHPV PCR results (presence/absence of types 16 and 18)


- Clinical and demographic metadata (e.g., age, parity, sexual history, contraceptive use)


- All raw sequencing data and metadata are housed on a secure high-performance computing (HPC) cluster.


**What tools exist for analysis:** The project will use a combination of R and Python tools:
R packages:


- DADA2 for ASV inference and taxonomy assignment


- phyloseq, vegan, and ggplot2 for diversity analysis, ordination, and visualization


- DESeq2 or ANCOM-BC for differential abundance testing


Python libraries:


- pandas and numpy for data handling


- scikit-learn for predictive modeling (random forest, logistic regression)


Data integration (microbiome + cytokines) and correlation network analyses will also be performed using these tools.


**Why perform analysis:** By combining microbiome profiles and systemic immune signatures, this project aims to uncover host–microbiome interactions that influence hrHPV persistence. Such integrative analyses are underexplored in African populations and can contribute to a more comprehensive understanding of cervical cancer risk. The outcomes may provide candidate biomarkers for prevention, prognosis, and future mechanistic studies.

**How will you apply your knowledge of NGS, Machine Learning, Transcriptomics, Python and R to the project:** This project will draw upon my training in the following areas:
*NGS:* I will process and analyze raw 16S sequencing data using tools like DADA2 and phyloseq, building quality-controlled taxonomic profiles.


*Machine Learning:* I will use supervised models (e.g., random forests) to predict HPV persistence based on microbial and cytokine features.


*Transcriptomics:* Although this project is not based on RNA-seq, the principles of high-dimensional data normalization, integration, and differential expression analysis (e.g., DESeq2) are directly applicable to microbiome data.


*Python and R:* I will use R for microbiome analysis and visualization, and Python for data integration and machine learning modeling.
