# Baysian Statistics with Stan in R and Python

Welcome to the Environmental Baysian Statistics course! We will utilize the STAN package in Python and R to perform spatio-temporal statistics.

# Preparing for the workshop: Requirements

Knowledge requirements: Basic knowedge of Python OR R (working with data, plotting, etc.)

For this workshop we use Jupyter notebooks, which provides a convenient way to share scripts and run them right in the notebook! Jupyter notebooks comes autoamtically with the Anaconda python platform. Download the latest version here: https://www.continuum.io/downloads and follow the instructions to install. 

Once you have installed Anaconda, you should also install the required package for today: pystan, the python Stan package. You can use the Conda command in command-line to install the package:

In [None]:
sudo conda install pystan

After you put in your password, the package will be installed. 

You can also launch Jupyter by simply typing "jupyter notebook" in command line. This will open up the notebook in your browser, and you can simply navigate to the local notebooks on your computer and open them. So first you need to clone or clone this notebook from Github:

Go to the course Github page (DSI Environemtnal Baysian Stats) and click on Clone or Download. The easiest option is to just download the whole repostory as a zip file ("Download ZIP" button). 

Remember where you save the repostory. After opening the Jupyter notebook, navigate to this location and opem up the notebook.

# Course Organization: Schedule

# Session 1

# Example Code

In [1]:
#-- Import Packages
import pystan
import numpy as np

In [2]:
#-- Define a model: Normal Distribution with average mu and std 1.
Nmod = """
data {
    int<lower=1> N;
    real y[N];
}
parameters {
    real mu;
}
model {
    y ~ normal(mu, 1);
}
"""

In [3]:
#-- Now implement the model.
mod = pystan.StanModel(model_code=Nmod)

INFO:pystan:COMPILING THE C++ CODE FOR MODEL anon_model_3aaa1aff3be33470f8a5bfa56085d51c NOW.


In [4]:
#-- Create a random sample of 30 points from a flat distribution.
sample = np.random.normal(size=20)
print(sample.mean())

-0.153842632755


In [5]:
#-- Now we want to maximize the posterior by using teh opimizing function.
#-- We will get the true population mean from the sample.
population = mod.optimizing(data=dict(y=sample, N=len(sample)))
print(population)

OrderedDict([('mu', array(-0.15384263275511367))])


In [None]:
fit = mod.sampling(data=data=dict(y=sample, N=len(sample)))