# Machine learning for medicine
## Understanding Correlation

## Overview
In medicine, we care about how differents physiologic processes *relate* to each other.
Correlation is one way to measure how related two things are.
In this notebook we get hands on with correlation.

## Code Setup

In [8]:
import numpy as np
import scipy
import matplotlib.pyplot as plt
from ipywidgets import interact, interactive, fixed, interact_manual
import ipywidgets as widgets
import scipy.stats as stats

## What is correlation
Correlations are the backbone of science.
Correlations are one way to assess whether two variables are *related* to each other.

Correlation checks to see whether there's a *linear* relationship between two variables $X$ and $Y$.
In order words: if we *double* $X$ do we double $Y$?
For some things, this is reasonable.

In [9]:

def simple_eg(slope=1.0,noise=0.0,samples=100):
    x = np.random.uniform(-10,10,size=(samples,))
    y = slope * x + np.random.normal(0,noise,size=x.shape)
    
    plt.figure()
    plt.scatter(x,y)
    plt.scatter(x,np.random.normal(0,noise,size=x.shape),color='red',alpha=0.4)
    plt.ylim(-10,10)
    plt.xlim(-10,10)
    plt.axis('off')
    plt.legend(['Correlated','Uncorrelated'])
    corr_val = stats.pearsonr(x,y)
    plt.text(2,-10,s='Pearson: ' + str(corr_val[0]) + '\n p=' + str(corr_val[1]))
    plt.show()

In [12]:
interact(simple_eg,slope=(-5,5,0.1),noise=(0.0,10.0,0.5),samples=fixed(100));

interactive(children=(FloatSlider(value=1.0, description='slope', max=5.0, min=-5.0), FloatSlider(value=0.0, d…

## Limited sample size

Let's do the same sort of analysis, but change the number of samples we have available to us.

In [24]:
interact(simple_eg,slope=(0,5,0.01),noise=(0.0,10.0,0.1),samples=(2,50,1));

interactive(children=(FloatSlider(value=1.0, description='slope', max=5.0, step=0.01), FloatSlider(value=0.0, …