# Machine Learning for Medicine: Workshop
## What is a distribution?

### Overview
Data is the way we study our patients. Whether its lab values, MRIs, our physical examination, or even our history taking, data is the window into what's happening in our patients.

In this post we'll talk a bit more. This will be a little heavier in math than some might be comfortable with, but if you're interested in getting a deeper insight into ML then this is a good starting point. 

#### How to run this notebook
This Jupyter notebook can be run simply by going to the menu up top, selecting 'Kernel -> Restart and Run All'.

You'll need some basic libraries to get this to work.

In [1]:
import numpy as np
import matplotlib.pyplot as plt
import ipywidgets as widgets
from ipywidgets import interact, interact_manual

### Evidence and Data
Data and evidence are related but they're not actually the same. Data is a list of observations, typically objective.

Evidence is data paired with a *hypothesis*. In the case of EBM, we focus our attention only on a single hypothesis: the *null* hypothesis.

This approach, while rigorous if you have an eternity, is far from ideal for the setting of medicine.

ML provides a different approach to *inferring* relationships we care about, about building up *evidence* for and against hypotheses.


### What is ML?


### tPA use in strokes: an example
We're taught a hard and fast rule: if it's been 4 hours since the onset of stroke symptoms, don't give tPA.

This is a gross misunderstanding of how to interpret studies that look at outcomes following an intervention.

First, let's characterize the hard-and-fast rule.

In [23]:
xset = np.random.normal(0.0,1.0,size=1000)
mfig = plt.figure()
ax = mfig.add_subplot(1,1,1)
histo = ax.hist(xset)
line = plt.axvline(x=0)
    
@interact
def simple_distr(x=5000):
    line.set_x(x)
    mfig.canvas.draw()

<IPython.core.display.Javascript object>

interactive(children=(IntSlider(value=5000, description='x', max=15000, min=-5000), Output()), _dom_classes=('…

In [18]:
%matplotlib notebook

import numpy as np
import matplotlib.pyplot as plt
import time

def pltsin(ax, colors=['b']):
    x = np.linspace(0,1,100)
    if ax.lines:
        for line in ax.lines:
            line.set_xdata(x)
            y = np.random.random(size=(100,1))
            line.set_ydata(y)
    else:
        for color in colors:
            y = np.random.random(size=(100,1))
            ax.plot(x, y, color)
    fig.canvas.draw()

fig,ax = plt.subplots(1,1)
ax.set_xlabel('X')
ax.set_ylabel('Y')
ax.set_xlim(0,1)
ax.set_ylim(0,1)
for f in range(5):
    pltsin(ax, ['b', 'r'])
    time.sleep(1)

<IPython.core.display.Javascript object>