Mortgage lenders are interested in determining borrower and loan factors that may lead to delinquency or foreclosure. In the file $\textit{lasvegas.dat}$ are 1,000 observations on mortgages for single family homes in Las Vegas, Nevada during 2008. The variable of interest is $\textit{DELINQUENT}$, an indicator variable $= 1$ if the borrower missed at least three payments ($90+$ days late), but $0$ otherwise. Explanatory variables: are $\textit{LVR}=$  the ratio of the loan amount to the value of the property; $\textit{REF}=1$ if purpose of the loan was a ‘‘refinance’’ and $=0$ if loan was for a purchase; $\textit{INSUR}=1$ if mortgage carries mortgage insurance, $0$ otherwise; $\textit{RATE}=$ initial interest rate of the mort- gage; $\textit{AMOUNT}$  dollar value of mortgage (in $\$100,000$); $\textit{CREDIT}=$ credit score, $\textit{TERM}=$ number of years between disbursement of the loan and the date it is expected to be fully repaid, $\textit{ARM}=1$  if mortgage has an adjustable rate, and $=0$ if mortgage has a fixed rate.

Estimate the linear probability (regression) model explaining DELINQUENT as a function of the remaining variables. Use White robust standard errors. Are the signs of the estimated coefficients reasonable?

In [1]:
clear all
use https://www.stata.com/data/s4poe4/lasvegas.dta
reg delinquent lvr ref insur rate amount credit term arm
predict phat





      Source |       SS           df       MS      Number of obs   =     1,000
-------------+----------------------------------   F(8, 991)       =     62.77
       Model |  53.6062717         8  6.70078397   Prob > F        =    0.0000
    Residual |  105.792728       991   .10675351   R-squared       =    0.3363
-------------+----------------------------------   Adj R-squared   =    0.3309
       Total |     159.399       999  .159558559   Root MSE        =    .32673

------------------------------------------------------------------------------
  delinquent |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
         lvr |   .0016239   .0007846     2.07   0.039     .0000843    .0031634
         ref |  -.0593237   .0238299    -2.49   0.013    -.1060865   -.0125609
       insur |  -.4815849   .0236365   -20.37   0.000    -.5279683   -.4352015
        rate |   .0343761   .0085999     4.00  

(b) Use probit to estimate the model in (a). Are the signs and significance of the estimated coefficients the same as for the linear probability model?

In [9]:
probit delinquent lvr ref insur rate amount credit term arm
predict phat2



Iteration 0:   log likelihood =   -499.013  
Iteration 1:   log likelihood = -338.38904  
Iteration 2:   log likelihood = -332.81547  
Iteration 3:   log likelihood = -332.79661  
Iteration 4:   log likelihood = -332.79661  

Probit regression                               Number of obs     =      1,000
                                                LR chi2(8)        =     332.43
                                                Prob > chi2       =     0.0000
Log likelihood = -332.79661                     Pseudo R2         =     0.3331

------------------------------------------------------------------------------
  delinquent |      Coef.   Std. Err.      z    P>|z|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
         lvr |   .0076007   .0045911     1.66   0.098    -.0013977    .0165991
         ref |  -.2884561   .1259446    -2.29   0.022    -.5353029   -.0416092
       insur |  -1.772714   .1158088   -15.31   0.000    -1.

r(110);
r(110);






lvr, credit are significant in the first model but not in the second. 
amount is significant in the second but not the first.

Everything else was in agreement

(c) Compute the predicted value of DELINQUENT for the 500th and 1,000th observations using both the linear probability model and the probit model. Interpret the values.

In [3]:
di phat[500]
di phat2[500]

di phat[1000]
di phat2[1000]


.18278283

.1404525

.57852966

.6167872


(d) Construct a histogram of CREDIT. Using both linear probability and probit models, calculate the probability of delinquency for $\textit{CREDIT} = 500, 600, \text{and } 700$ for a loan of $ \$250,000  ( \textit{AMOUNT} = 2.5)$. For the other variables, loan to value ratio (LVR) is $80\%$, initial interest rate is $8\%$, indicator variables take the value one, and $\textit{TERM} =30$. Discuss similarities and differences among the predicted probabilities from the two models.

The linear probability model is
$$ \textit{DELINQUENT} = .6884913 + (.0016239)\textit{LVR} + (-.0593237)\textit{REF} + (-.4815849)\textit{INSUR} + (.0343761)\textit{RATE} + (-.0004419)\textit{CREDIT} + (-.0126195)\textit{TERM} + (.8091109)\textit{ARM}$$

Which, for a person above with a credit score of 500, 600 and 700 respectively, are $0.7620794, 0.7178894, 0.6736994$ respectively.

And the probit model is 
$$ z = (-.2884561)\textit{REF} + (-1.772714)\textit{INSUR} + (.1711988)\textit{RATE} + (.121236)\textit{AMOUNT} + (-.0775769)\textit{TERM} + (.8091109)\textit{ARM}$$
$$ DELINQUENT = \Phi(z) = \frac{1}{\sqrt{2 \pi}} e^{-0.5 z^2}$$

Which, for a person above with a credit score of 500, 600 and 700 respectively, are $z = -1.9066858  \Rightarrow  \frac{1}{\sqrt{2 \pi}} e^{-0.5 (-1.9066858)^2}  = 0.06478$

As expected from the linear probability model, the probability of deliquency reduces as the credit score increases. However the projected probability of deliquency is very high, compared to in the probit model, which does not take the credit score significantly into account.


Compute the marginal effect of $\textit{CREDIT}$ on the probability of delinquency for $\textit{CREDIT}=  500, 600, \text{ and } 700$, given that the other explanatory variables take the values in (d). Discuss the interpretation of the marginal effect.


In [11]:
margins, at (credit = 500(100)700)

(100) invalid statistic


r(198);





Using a data introduced in one of the past pre-class work questions, check whether **margins **command in Stata correctly estimates the marginal effect of an interaction term in a nonlinear model. Note: Use a probit model and interact two continuous variables.

In [1]:
clear all
use https://www.stata.com/data/s4poe4/nels_small.dta

In [2]:
summ


    Variable |        Obs        Mean    Std. Dev.       Min        Max
-------------+---------------------------------------------------------
   psechoice |      1,000       2.305    .8103281          1          3
      hscath |      1,000        .019     .136593          0          1
      grades |      1,000     6.53039    2.265855       1.74      12.33
      faminc |      1,000     51.3935    40.16579          0        250
      famsiz |      1,000       4.206    1.291988          1         10
-------------+---------------------------------------------------------
     parcoll |      1,000        .308    .4618976          0          1
      female |      1,000        .496    .5002342          0          1
       black |      1,000        .056    .2300368          0          1


In [5]:
gen college = (psechoice == 2 | psechoice == 3)
summ 




    Variable |        Obs        Mean    Std. Dev.       Min        Max
-------------+---------------------------------------------------------
   psechoice |      1,000       2.305    .8103281          1          3
      hscath |      1,000        .019     .136593          0          1
      grades |      1,000     6.53039    2.265855       1.74      12.33
      faminc |      1,000     51.3935    40.16579          0        250
      famsiz |      1,000       4.206    1.291988          1         10
-------------+---------------------------------------------------------
     parcoll |      1,000        .308    .4618976          0          1
      female |      1,000        .496    .5002342          0          1
       black |      1,000        .056    .2300368          0          1
     college |      1,000        .778    .4157991          0          1


In [6]:
reg college grades faminc famsiz parcoll female black


      Source |       SS           df       MS      Number of obs   =     1,000
-------------+----------------------------------   F(6, 993)       =     38.80
       Model |  32.8040232         6  5.46733721   Prob > F        =    0.0000
    Residual |  139.911977       993  .140898265   R-squared       =    0.1899
-------------+----------------------------------   Adj R-squared   =    0.1850
       Total |     172.716       999  .172888889   Root MSE        =    .37536

------------------------------------------------------------------------------
     college |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
      grades |  -.0674628   .0056661   -11.91   0.000    -.0785817   -.0563439
      faminc |   .0008605   .0003307     2.60   0.009     .0002115    .0015096
      famsiz |  -.0091176   .0092538    -0.99   0.325    -.0272768    .0090416
     parcoll |   .0914926   .0293783     3.11   0.

In [15]:
margins, at(grades = 5 faminc = 51.39 famsiz = 5 parcoll = 1 female = 1 black =1)


Adjusted predictions                            Number of obs     =      1,000
Model VCE    : OIM

Expression   : Pr(college), predict()
at           : grades          =           5
               faminc          =       51.39
               famsiz          =           5
               parcoll         =           1
               female          =           1
               black           =           1

------------------------------------------------------------------------------
             |            Delta-method
             |     Margin   Std. Err.      z    P>|z|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
       _cons |   .9904545   .0067251   147.28   0.000     .9772735    1.003636
------------------------------------------------------------------------------


In [14]:
margins, dydx(grades faminc)


Average marginal effects                        Number of obs     =      1,000
Model VCE    : OIM

Expression   : Pr(college), predict()
dy/dx w.r.t. : grades faminc

------------------------------------------------------------------------------
             |            Delta-method
             |      dy/dx   Std. Err.      z    P>|z|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
      grades |  -.0689669   .0053522   -12.89   0.000     -.079457   -.0584768
      faminc |   .0012627   .0004222     2.99   0.003     .0004353    .0020902
------------------------------------------------------------------------------


In [13]:
probit college grades faminc famsiz parcoll female black


Iteration 0:   log likelihood = -529.42766  
Iteration 1:   log likelihood = -421.36098  
Iteration 2:   log likelihood = -416.23435  
Iteration 3:   log likelihood = -416.21967  
Iteration 4:   log likelihood = -416.21967  

Probit regression                               Number of obs     =      1,000
                                                LR chi2(6)        =     226.42
                                                Prob > chi2       =     0.0000
Log likelihood = -416.21967                     Pseudo R2         =     0.2138

------------------------------------------------------------------------------
     college |      Coef.   Std. Err.      z    P>|z|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
      grades |  -.2945521   .0274883   -10.72   0.000    -.3484281   -.2406761
      faminc |    .005393   .0018099     2.98   0.003     .0018457    .0089404
      famsiz |  -.0531059   .0374572    -1.42   0.156    -.12

In [9]:
logit college grades faminc famsiz parcoll female black


Iteration 0:   log likelihood = -529.42766  
Iteration 1:   log likelihood = -428.80877  
Iteration 2:   log likelihood = -415.34069  
Iteration 3:   log likelihood = -414.97662  
Iteration 4:   log likelihood = -414.97656  
Iteration 5:   log likelihood = -414.97656  

Logistic regression                             Number of obs     =      1,000
                                                LR chi2(6)        =     228.90
                                                Prob > chi2       =     0.0000
Log likelihood = -414.97656                     Pseudo R2         =     0.2162

------------------------------------------------------------------------------
     college |      Coef.   Std. Err.      z    P>|z|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
      grades |  -.5174378   .0504639   -10.25   0.000    -.6163452   -.4185304
      faminc |   .0130416   .0038975     3.35   0.001     .0054027    .0206804
      famsiz |  