In [1]:
from __future__ import division
import pandas as pd
import numpy as np
import statsmodels.api as sm
import statsmodels.formula.api as smf
import os
import matplotlib.pyplot as plt

In [2]:
os.chdir('/Users/minpan/Desktop/Data Analysis  Data') # change working di
d = pd.read_csv("gss_2021.csv")
d.head()

Unnamed: 0,year,id,wrkslf,wrkgovt,occ10,prestg10,indus10,marital,martype,divorce,...,relitennv,biblenv,postlifenv,kidssolnv,uscitznnv,fucitznnv,fepolnv,scibnftsv,abanyg,fileversion
0,2021,0,1,-1,265,21,196,0,-1,1,...,1,1,1,3,-1,-1,-1,1,1,0
1,2021,1,1,-1,3,40,179,2,-1,-1,...,3,-1,-1,-1,-1,-1,-1,0,-1,0
2,2021,2,1,-1,341,18,108,4,-1,-1,...,-1,-1,-1,-1,-1,-1,1,-1,0,0
3,2021,3,0,-1,223,18,208,1,-1,1,...,-1,-1,-1,-1,-1,-1,-1,-1,-1,0
4,2021,4,1,-1,282,21,166,4,-1,-1,...,1,1,0,-1,0,-1,1,-1,0,0


#### Variables of Interests 

data description:

Y: eqwlth

Some people think that the government in Washington ought to reduce the income differences between the rich and the poor, perhaps by raising the taxes of wealthy families or by giving income assistance to the poor. Others think that the government should not concern itself with reducing this income difference between the rich and the poor. Here is a card with a scale from 1 to 7. Think of a score of 1 as meaning that the government ought to reduce the income differences between rich and poor, and a score of 7 meaning that the government should not concern itself with reducing income differences. What score between 1 and 7 comes closest to the way you feel?

Xs: 

educ: years of education that respondent obtained;

poliviews: seven-point scale on which the political views that people might hold are arranged from extremely liberal--point 1--to extremely conservative--point 7;

prestg10: This standard prestige score is a simple mean value of ratings for each occupation category, converted to a scale of 0 (bottom) to 100 (top).

I selected these variables with the aim of exploring how education, political views, and occupational prestige impact people's views on government intervention in reducing income differences. 

In [3]:
# get rid of all missings; necessary for predictions later ##
sub = d.dropna(subset = ["eqwlth", "polviews", "prestg10", "educ"])

In [4]:
sub.head()

Unnamed: 0,year,id,wrkslf,wrkgovt,occ10,prestg10,indus10,marital,martype,divorce,...,relitennv,biblenv,postlifenv,kidssolnv,uscitznnv,fucitznnv,fepolnv,scibnftsv,abanyg,fileversion
0,2021,0,1,-1,265,21,196,0,-1,1,...,1,1,1,3,-1,-1,-1,1,1,0
1,2021,1,1,-1,3,40,179,2,-1,-1,...,3,-1,-1,-1,-1,-1,-1,0,-1,0
2,2021,2,1,-1,341,18,108,4,-1,-1,...,-1,-1,-1,-1,-1,-1,1,-1,0,0
3,2021,3,0,-1,223,18,208,1,-1,1,...,-1,-1,-1,-1,-1,-1,-1,-1,-1,0
4,2021,4,1,-1,282,21,166,4,-1,-1,...,1,1,0,-1,0,-1,1,-1,0,0


In [5]:
#recode eqwlth to binary outcome (0 or 1), representing 
# whether individuals support (1) or do not support (0) government intervention.
conditions = [
    (sub['eqwlth'] == 1) ,
    (sub['eqwlth'] > 1)]
choices = [1, 0]
sub['eqw'] = np.select(conditions, choices, default=np.nan)

In [6]:
pd.crosstab(index=sub["eqw"], columns="count")  ## check that the recode worked okay ##

col_0,count
eqw,Unnamed: 1_level_1
0.0,1537
1.0,242


# 1. Multiple Linear Probability Model 

In [7]:
lm1 = smf.ols(formula = 'eqw ~ polviews + educ + prestg10', data = sub).fit()
print (lm1.summary())

                            OLS Regression Results                            
Dep. Variable:                    eqw   R-squared:                       0.114
Model:                            OLS   Adj. R-squared:                  0.113
Method:                 Least Squares   F-statistic:                     76.15
Date:                Tue, 28 Nov 2023   Prob (F-statistic):           2.46e-46
Time:                        13:38:58   Log-Likelihood:                -512.10
No. Observations:                1779   AIC:                             1032.
Df Residuals:                    1775   BIC:                             1054.
Df Model:                           3                                         
Covariance Type:            nonrobust                                         
                 coef    std err          t      P>|t|      [0.025      0.975]
------------------------------------------------------------------------------
Intercept      0.2095      0.040      5.244      0.0

##### Data Analysis: 

I hypothesized that people who hold more liberal political views would demonstrate a greater inclination to support government intervention to reduce income differences, given the common association of liberal ideologies with the advocacy for social and economic equality;

Similarly, individuals who are more educated and those with more prestigious occupations might tend to agree with and support government intervention to reduce income differences. This is because they are more likely to have a greater awareness of societal issues and may be more willing to endorse measures that aim to address economic disparities.

The regression output shows that for each one-unit increase in political views (moving towards more conservative perspectives), the predicted probability of supporting government intervention decreases by 7.27%, holding other variables constant. This finding supports the notion that more liberal views are associated with a greater likelihood of endorsing government intervention to address income inequality.

Likewise, for each additional year of education, the predicted probability of supporting government intervention increases by 0.87%, and for each one-unit increase in occupational prestige, the predicted probability of support rises by 0.14%, holding other variables constant. These results suggest that higher education and occupational prestige are factors positively correlated with a greater likelihood of supporting government intervention.

All independent variables (polviews, educ, prestg10) have p-values less than 0.05, indicating that their coefficients are statistically significant. The model, as indicated by the R-squared value of 11.4%, explains a modest but statistically significant portion of the variance in the dependent variable.

In summary, the analysis affirms the initial hypotheses. 

# 2. Multiple (binary) Logistic Model

In [8]:
logit1 = sm.formula.logit(formula = "eqw ~ polviews + educ + prestg10", data = sub).fit()
print (logit1.summary()) 

Optimization terminated successfully.
         Current function value: 0.339536
         Iterations 7
                           Logit Regression Results                           
Dep. Variable:                    eqw   No. Observations:                 1779
Model:                          Logit   Df Residuals:                     1775
Method:                           MLE   Df Model:                            3
Date:                Tue, 28 Nov 2023   Pseudo R-squ.:                  0.1462
Time:                        13:38:58   Log-Likelihood:                -604.04
converged:                       True   LL-Null:                       -707.50
Covariance Type:            nonrobust   LLR p-value:                 1.346e-44
                 coef    std err          z      P>|z|      [0.025      0.975]
------------------------------------------------------------------------------
Intercept     -1.7154      0.402     -4.269      0.000      -2.503      -0.928
polviews      -0.6598      0.

##### Data Analysis: 

With each incremental shift in political views towards conservatism, the respondent's inclination to support government intervention on equal wealth diminishes, reflected by a 0.66 decrease in their logit; Simultaneously, for every additional unit of education, there is a corresponding 0.095 increase in the log-odds of supporting government intervention on equal wealth;Similarly, with each unit rise in occupational prestige, there is a 0.0105 uptick in the log-odds of supporting government intervention on equal wealth. These associations hold firm when all other variables are held constant.

The logistic regression results affirm that more conservative political views are associated with decreased odds of supporting government intervention, while higher education and occupational prestige are associated with increased odds of support. 

The model explains approximately 14.6% of the variance in the dependent variable, indicating a moderate improvement compared to the linear regression model.

# 3. Odds Ratios

In [9]:
np.exp(logit1.params)

Intercept    0.179898
polviews     0.516956
educ         1.099208
prestg10     1.010512
dtype: float64

##### Data Analysis: 

For every one-unit increase in 'polviews'(towards conservative), the odds of supporting government intervention to reduce income differences decrease by approximately 48.3%.
This is calculated as (0.5169 -1)x 100% =48.3%;

For each additional year of education, the odds of supporting government intervention increase by approximately 9.9%, holding other variables constant.
(1.0992 - 1) x 100% = 9.9% ; 

For each one-unit increase in occupational prestige, the odds of supporting government intervention increase by approximately 1.1%, while holding other variables constant.
(1.0105 - 1) x 100% = 1.1%.

# 4. Get Predicted Probabilities

#### Define a predicted probabilities function and utilize the parameters from above logit on it.

In [10]:
def logit2prob (logit):
    odds = np.exp(logit)
    prob = odds / (1 + odds) 
    return(prob);


intercept = logit1.params.Intercept
## CHOOSE REPRESENTATIVE VALUES FOR ALL Xs ##
b_poliv = logit1.params.polviews
b_educ = logit1.params.educ
b_prestg10 = logit1.params.prestg10

In [11]:
## CHOOSE REPRESENTATIVE VALUES FOR ALL Xs ##
logits_eqw = intercept + (1 * b_poliv) + (16 * b_educ) + (70 * b_prestg10)
logit2prob(logits_eqw)

0.46761283428833644

##### Data Analysis: 
For an individual who is extremely liberal, possesses 16 years of education, and has a occupational prestige score of 70. the predicted probability of supporting governemnt invention on reducing income difference is about 47.

In [12]:
logits_eqw = intercept + (1 * b_poliv) + (sub.educ.mean() * b_educ) + (sub.prestg10.mean() * b_prestg10) 
logit2prob(logits_eqw)

0.3303955380180103

##### Data Analysis: 
The predicted probability of supporting government intervention to reduce income differences is 33 for a person who has mostly average values on the Xs, except for liberal views.