## Introduction to Statsmodels API: Part A

1.   List item
2.   List item



1.   List item
2.   List item



Statsmodels is a package that is built specifically for statistics. It's built on SciPy and NumPy. We can use it to fit models. It's also great for statistical analysis.

## Step 1: Importing Libraries



Let's go ahead and import data and build a model on top of it using statsmodels.

- Import the NumPy library for numerical operations
- Import the statsmodels.api and statsmodels.formula.api libraries for statistical modeling


In [None]:
import numpy as np
import statsmodels.api as sm
import statsmodels.formula.api as smf

## Step 2: Loading the Dataset

- Load the inbuilt dataset **Guerry** from the **HistData** package
- Display the first few rows of the data using **head()**


In [None]:
data = sm.datasets.get_rdataset('Guerry','HistData').data

In [None]:
data.head()

Unnamed: 0,dept,Region,Department,Crime_pers,Crime_prop,Literacy,Donations,Infants,Suicides,MainCity,...,Crime_parents,Infanticide,Donation_clergy,Lottery,Desertion,Instruction,Prostitutes,Distance,Area,Pop1831
0,1,E,Ain,28870,15890,37,5098,33120,35039,2:Med,...,71,60,69,41,55,46,13,218.372,5762,346.03
1,2,N,Aisne,26226,5521,51,8901,14572,12831,2:Med,...,4,82,36,38,82,24,327,65.945,7369,513.0
2,3,C,Allier,26747,7925,13,10973,17044,114121,2:Med,...,46,42,76,66,16,85,34,161.927,7340,298.26
3,4,E,Basses-Alpes,12935,7289,46,2733,23018,14238,1:Sm,...,70,12,37,80,32,29,2,351.399,6925,155.9
4,5,E,Hautes-Alpes,17488,8174,69,6962,23076,16171,1:Sm,...,22,23,64,79,35,7,1,320.28,5549,129.1


## Step 3: Building the Model

Let's build a model by using the natural log of one of the regressions that are available in statsmodels.

- Call **ols()** function from statsmodels.formula.api to build a linear regression model using the inbuilt dataset
- The dependent variable is **Lottery**
- The independent variables are **Literacy** and the natural logarithm of **Pop1831**
- Call the **fit()** method to fit the model to the data


In [None]:
results = smf.ols('Lottery ~ Literacy + np.log(Pop1831)',data = data).fit()

## Step 4: Analyzing the Model Results

Now, let's display the summary of the regression results.


In [None]:
#Display the summary of the regression results
results.summary()

0,1,2,3
Dep. Variable:,Lottery,R-squared:,0.348
Model:,OLS,Adj. R-squared:,0.333
Method:,Least Squares,F-statistic:,22.2
Date:,"Wed, 25 May 2022",Prob (F-statistic):,1.9e-08
Time:,06:17:13,Log-Likelihood:,-379.82
No. Observations:,86,AIC:,765.6
Df Residuals:,83,BIC:,773.0
Df Model:,2,,
Covariance Type:,nonrobust,,

0,1,2,3,4,5,6
,coef,std err,t,P>|t|,[0.025,0.975]
Intercept,246.4341,35.233,6.995,0.000,176.358,316.510
Literacy,-0.4889,0.128,-3.832,0.000,-0.743,-0.235
np.log(Pop1831),-31.3114,5.977,-5.239,0.000,-43.199,-19.424

0,1,2,3
Omnibus:,3.713,Durbin-Watson:,2.019
Prob(Omnibus):,0.156,Jarque-Bera (JB):,3.394
Skew:,-0.487,Prob(JB):,0.183
Kurtosis:,3.003,Cond. No.,702.0


**Observation**

The OLS regression results give the details about the following:
- The model used
- The dependent variable
- The value of R-squared, adjusted R-squared, AIC, BIC and other statistical information needed to judge the model