Project portfolio for Statistical Learning for Big Data containing the written report and respective scripts.
This repository contains my analyses and exploratory methods for the Statistical Learning for Big Data spring 2018 class with Prof. Rebecka Jörnsten at Chalmers University of Technology/Gothenburg University.
The exam was broken into three sections:
- MINI review and assignments
- TCGA genomics data and mislabeled data
- Simulation studies of K clusters and L classes
The assignment questions are available in: Exam2018.pdf
Further information is available at http://www.math.chalmers.se/Stat/Grundutb/GU/MSA220/S18/
The MINI assignments were weekly or bi-weekly and were aimed to help introduce various statistical/machine learning methods. I used various datasets, and also generated some of my own data. The R scripts are in this main directory, with associated data in the data/
directory.
data/mini3/
contains artificial data and the Python script used to generate it.
MINI3 script is quite messy