# `patsy` package

`patsy` is a Python package for describing statistical models and building design matrices. The models are described using the formula mini-language inspired by formula language used in R.

In [1]:
from patsy import ModelDesc

In [2]:
ModelDesc.from_formula("y~x")

ModelDesc(lhs_termlist=[Term([EvalFactor('y')])],
          rhs_termlist=[Term([]), Term([EvalFactor('x')])])

In [3]:
ModelDesc.from_formula("y~1+x")

ModelDesc(lhs_termlist=[Term([EvalFactor('y')])],
          rhs_termlist=[Term([]), Term([EvalFactor('x')])])

In [4]:
ModelDesc.from_formula("y~x-1")

ModelDesc(lhs_termlist=[Term([EvalFactor('y')])],
          rhs_termlist=[Term([EvalFactor('x')])])

In [5]:
ModelDesc.from_formula("y~0+x")

ModelDesc(lhs_termlist=[Term([EvalFactor('y')])],
          rhs_termlist=[Term([EvalFactor('x')])])

In [6]:
ModelDesc.from_formula("y~x1+x2")

ModelDesc(lhs_termlist=[Term([EvalFactor('y')])],
          rhs_termlist=[Term([]),
                        Term([EvalFactor('x1')]),
                        Term([EvalFactor('x2')])])

In [7]:
ModelDesc.from_formula("y~x1*x2")

ModelDesc(lhs_termlist=[Term([EvalFactor('y')])],
          rhs_termlist=[Term([]),
                        Term([EvalFactor('x1')]),
                        Term([EvalFactor('x2')]),
                        Term([EvalFactor('x1'), EvalFactor('x2')])])

In [10]:
ModelDesc.from_formula("y~x1:x2")

ModelDesc(lhs_termlist=[Term([EvalFactor('y')])],
          rhs_termlist=[Term([]), Term([EvalFactor('x1'), EvalFactor('x2')])])

In [11]:
ModelDesc.from_formula("y~x1/x2")

ModelDesc(lhs_termlist=[Term([EvalFactor('y')])],
          rhs_termlist=[Term([]),
                        Term([EvalFactor('x1')]),
                        Term([EvalFactor('x1'), EvalFactor('x2')])])

In [12]:
ModelDesc.from_formula("y~x+ x*x")

ModelDesc(lhs_termlist=[Term([EvalFactor('y')])],
          rhs_termlist=[Term([]), Term([EvalFactor('x')])])

In [13]:
ModelDesc.from_formula("y~x+ I(x*x)")

ModelDesc(lhs_termlist=[Term([EvalFactor('y')])],
          rhs_termlist=[Term([]),
                        Term([EvalFactor('x')]),
                        Term([EvalFactor('I(x * x)')])])

In [15]:
ModelDesc.from_formula("y~ (x+y)*z")

ModelDesc(lhs_termlist=[Term([EvalFactor('y')])],
          rhs_termlist=[Term([]),
                        Term([EvalFactor('x')]),
                        Term([EvalFactor('y')]),
                        Term([EvalFactor('z')]),
                        Term([EvalFactor('x'), EvalFactor('z')]),
                        Term([EvalFactor('y'), EvalFactor('z')])])

## Constructing Design matrices

patsy provides the function `dmatrices` to construct design matrices for specified model formulae.


In [1]:
from patsy import demo_data, dmatrices
data = demo_data("a", "b", "x1", "x2", "y", "z column") # Generate demo data
data

{'a': ['a1', 'a1', 'a2', 'a2', 'a1', 'a1', 'a2', 'a2'],
 'b': ['b1', 'b2', 'b1', 'b2', 'b1', 'b2', 'b1', 'b2'],
 'x1': array([ 1.76405235,  0.40015721,  0.97873798,  2.2408932 ,  1.86755799,
        -0.97727788,  0.95008842, -0.15135721]),
 'x2': array([-0.10321885,  0.4105985 ,  0.14404357,  1.45427351,  0.76103773,
         0.12167502,  0.44386323,  0.33367433]),
 'y': array([ 1.49407907, -0.20515826,  0.3130677 , -0.85409574, -2.55298982,
         0.6536186 ,  0.8644362 , -0.74216502]),
 'z column': array([ 2.26975462, -1.45436567,  0.04575852, -0.18718385,  1.53277921,
         1.46935877,  0.15494743,  0.37816252])}

### Design matrix for a linear model

In [4]:
y, X = dmatrices("y ~ x1 + x2", data) # dmatrices returns a tuple
y

DesignMatrix with shape (8, 1)
         y
   1.49408
  -0.20516
   0.31307
  -0.85410
  -2.55299
   0.65362
   0.86444
  -0.74217
  Terms:
    'y' (column 0)

In [5]:
X

DesignMatrix with shape (8, 3)
  Intercept        x1        x2
          1   1.76405  -0.10322
          1   0.40016   0.41060
          1   0.97874   0.14404
          1   2.24089   1.45427
          1   1.86756   0.76104
          1  -0.97728   0.12168
          1   0.95009   0.44386
          1  -0.15136   0.33367
  Terms:
    'Intercept' (column 0)
    'x1' (column 1)
    'x2' (column 2)

### Design matrix for a quadratic model

If we want just the design matrix comprising of data on regressors, we can use the `dmatrix` function.

In [13]:
from patsy import dmatrix
dmatrix("x1 + I(x1**2)", data)

DesignMatrix with shape (8, 3)
  Intercept        x1  I(x1 ** 2)
          1   1.76405     3.11188
          1   0.40016     0.16013
          1   0.97874     0.95793
          1   2.24089     5.02160
          1   1.86756     3.48777
          1  -0.97728     0.95507
          1   0.95009     0.90267
          1  -0.15136     0.02291
  Terms:
    'Intercept' (column 0)
    'x1' (column 1)
    'I(x1 ** 2)' (column 2)