# Machine learning for medicine
## Understanding Correlation

## Overview
In medicine, we care about how differents physiologic processes *relate* to each other.
Correlation is one way to measure how related two things are.
In this notebook we get hands on with correlation.

## Code Setup

In [28]:
import numpy as np
import scipy

from ipywidgets import interact, interactive, fixed, interact_manual
import ipywidgets as widgets
import scipy.stats as stats
from example_systems import *

import matplotlib
import matplotlib.pyplot as plt
import seaborn as sns
sns.set(style="darkgrid")
matplotlib.rcParams['figure.figsize'] = [15, 10]

## What is correlation
Correlations are the backbone of science.
Correlations are one way to assess whether two variables are *related* to each other.

Correlation checks to see whether there's a *linear* relationship between two variables $X$ and $Y$.
In order words: if we *double* $X$ do we double $Y$?
For some things, this is reasonable.

In [29]:
interact(simple_eg,slope=(-5,5,0.1),noise=(0.0,10.0,0.5),samples=fixed(100));

interactive(children=(FloatSlider(value=1.0, description='slope', max=5.0, min=-5.0), FloatSlider(value=0.0, d…

We can play around with these two variables (blue and red).

The red variable *isn't* linearly correlated with x.
No matter what we set x at, negative, zero, positive, the value of y is 0.

The blue variable *is* linearly correlated with x.

The slope of the relationship tells us a bit about how *robust* the relationship is.
It doesn't mean much in the absence of noise.

Now, start adding some *noise*.
The key thing here is that noise doesn't change the underlying relationship between our variables.
It may *look* like it does, but that's just because other things are interfering with what we care about.
Imagine static over the radio, or buffering in your music stream.

## Limited sample size

Let's do the same sort of analysis, but change the number of samples we have available to us.

In [30]:
interact(simple_eg,slope=fixed(0.2),noise=fixed(2.0),samples=(2,50,1));

interactive(children=(IntSlider(value=50, description='samples', max=50, min=2), Output()), _dom_classes=('wid…

In [31]:
interact(simple_eg,slope=fixed(0.2),noise=(0.0,10.0,0.1),samples=(2,50,1));

interactive(children=(FloatSlider(value=0.0, description='noise', max=10.0), IntSlider(value=50, description='…

You can think about this like the number of samples of a patient's 
Try sliding the samples down from 50 to 2 and see what happens to the p-value.

## Signal-to-Noise

The last point we'll make is an important one that applies to everything in medicine (and science).
The idea of a *signal-to-noise* ratio is important.

We can see this by looking at the previous example: we can control the *slope* and the *noise*.
Turns out the slope is what we're interested in: that's the **signal** that we're trying to understand.

Noise, on the other hand, comes in and messes things up indiscriminantly.

In [32]:
interact(simple_eg,slope=(0,5,0.01),noise=(0.0,10.0,0.1),samples=fixed(20));

interactive(children=(FloatSlider(value=1.0, description='slope', max=5.0, step=0.01), FloatSlider(value=0.0, …