+ - 0:00:00
Notes for current slide
Notes for next slide

ScPoEconometrics: Advanced

Binary Response Models

Bluebery Planterose

SciencesPo Paris
2023-04-04

1 / 27

Where Are We At?

Last Time

  • Panel Data Estimation

  • The fixed effects estimator

  • fixest package

2 / 27

Where Are We At?

Last Time

  • Panel Data Estimation

  • The fixed effects estimator

  • fixest package

Today

  1. Binary Response Models!

  2. Another cool app! 😎

2 / 27

Binary Response Models

3 / 27

Binary Response Models

So far, our models looked like this:

y=b0+b1x+eeN(0,σ2)

  • The distributional assumption on e:

  • In priniciple implies that yR.

  • test scores, earnings, crime rates, etc. are all continuous outcomes. ✅

4 / 27

Binary Response Models

So far, our models looked like this:

y=b0+b1x+eeN(0,σ2)

  • The distributional assumption on e:

  • In priniciple implies that yR.

  • test scores, earnings, crime rates, etc. are all continuous outcomes. ✅

But some outcomes are clearly binary (i.e., either TRUE or FALSE):

  • You either work or you don't,

  • You either have children or you don't,

  • You either bought a product or you didn't,

  • You flipped a coin and it came up either heads or tails.

4 / 27

Binary Outcomes

  • Outcomes restricted to FALSE vs TRUE, or 0 vs 1.

  • We'd have y{0,1}.

  • In those situations we are primarily interested in estimating the response probability or the probability of success:

p(x)=Pr(y=1|x)

  • how does p(x) change as we change x?

  • we ask

    If we increase x by one unit, how would the probability of y=1 change?

5 / 27

Remembering Bernoulli Fun

Remember the Bernoulli Distribution?: We call a random variable y{0,1} such that

Pr(y=1)=pPr(y=0)=1pp[0,1]

a Bernoulli random variable.

6 / 27

Remembering Bernoulli Fun

Remember the Bernoulli Distribution?: We call a random variable y{0,1} such that

Pr(y=1)=pPr(y=0)=1pp[0,1]

a Bernoulli random variable.

For us: condition those probabilities on a covariate x

Pr(y=1|X=x)=p(x)Pr(y=0|X=x)=1p(x)p(x)[0,1]

  • Partcularly: expected value (i.e. the average) of Y given x

E[y|x]=p(x)×1+(1p(x))×0=p(x)

  • We often model conditional expectations 😉
6 / 27

The Linear Probability Model (LPM)

  • The simplest option. Model the response probability as

Pr(y=1|x)=p(x)=β0+β1x1++βKxK

  • Interpretation: a 1 unit change in x1, say, results in a change of p(x) of β1.

Example: Mroz (1987)

  • Female labor market participation

  • How does inlf (in labor force) status depend on non-wife household income, her education, age and number of small children?

7 / 27

Mroz 1987

data(mroz, package = "wooldridge")
plot(factor(inlf) ~ age, data = mroz,
ylevels = 2:1,
ylab = "in labor force?")

8 / 27

Running the LPM

LPM = lm(inlf ~ nwifeinc + educ + exper
+ I(exper^2) + age +I(age^2) + kidslt6, mroz)
broom::tidy(LPM)
## # A tibble: 8 × 5
## term estimate std.error statistic p.value
## <chr> <dbl> <dbl> <dbl> <dbl>
## 1 (Intercept) 0.322 0.486 0.662 5.08e- 1
## 2 nwifeinc -0.00343 0.00145 -2.36 1.86e- 2
## 3 educ 0.0375 0.00735 5.10 4.33e- 7
## 4 exper 0.0383 0.00577 6.63 6.44e-11
## 5 I(exper^2) -0.000565 0.000189 -2.98 2.96e- 3
## 6 age -0.00112 0.0225 -0.0497 9.60e- 1
## 7 I(age^2) -0.000182 0.000258 -0.706 4.80e- 1
## 8 kidslt6 -0.260 0.0341 -7.64 6.72e-14
  • identical to our previous linear regression models

  • Just inlf takes on only two values, 0 or 1.

  • Results: non-wife income increases by 10 (i.e 10,000 USD), p(x) falls by 0.034 (that's a small effect!),

  • an additional small child would reduce the probability of work by 0.26 (that's large).

  • So far, so simple. ✌️

9 / 27

LPM: Predicting negative probabilities?!



  • LPM predictions of p(x) are not guaranteed to lie in unit interval [0,1].

  • Remember: eN(0,σ2)

  • here, some probs smaller than zero!

  • Particularly annoying if you want predictions: What is a probability of -0.3? 🤔

10 / 27

LPM in Saturated Model: No Problem!

library(dplyr)
mroz %<>%
# classify age into 3 and huswage into 2 classes
mutate(age_fct = cut(age,breaks = 3,labels = FALSE),
huswage_fct = cut(huswage, breaks = 2,labels = FALSE)) %>%
mutate(classes = paste0("age_",age_fct,"_hus_",huswage_fct))
LPM_saturated = mroz %>%
lm(inlf ~ classes, data = .)
broom::tidy(LPM_saturated)
## # A tibble: 6 × 5
## term estimate std.error statistic p.value
## <chr> <dbl> <dbl> <dbl> <dbl>
## 1 (Intercept) 0.611 0.0277 22.0 2.98e-83
## 2 classesage_1_hus_2 -0.611 0.350 -1.75 8.11e- 2
## 3 classesage_2_hus_1 -0.0257 0.0404 -0.635 5.25e- 1
## 4 classesage_2_hus_2 -0.277 0.203 -1.37 1.72e- 1
## 5 classesage_3_hus_1 -0.149 0.0494 -3.01 2.72e- 3
## 6 classesage_3_hus_2 -0.111 0.350 -0.317 7.51e- 1
  • saturated model : only have dummy explanatory variables

  • Each class: p(x) within that cell.

11 / 27

LPM in Saturated Model: No Problem!

  • Each line segment: p(x) within that cell.

  • E.g. women from the youngest age category and lowest husband income (class age_1_hus_1) have the highest probability of working (0.611).

12 / 27

Task 1 (10 Minutes): Saturated LPM

Define a saturated LPM as before

Pr(y=1|x)=p(x)=β0+β1x1++βKxK but restrict all xj{0,1}.

  1. Create a binary indicator age_lt_50 = 1 for age smaller than 50 and 0 else and same for husage_lt_50.

  2. Run a full interactions model (use the * syntax in your formula) of age_lt_50 = 1 interacted with husage_lt_50. I.e. run the following LPM: Pr(y=1|x)=β0+β1age_lt_50+β2husage_lt_50+β3×age_lt_50×husage_lt_50

  3. predict Pr(y=1|x) for each observation using your LPM.

  4. What's the probability for a woman younger than 50 with a husband younger than 50?

  5. make a plot similar to the one on the previous slide.

13 / 27

Nonlinear Binary Response Models

In this class of models we change the way we model the response probability p(x). Instead of the simple linear structure from above, we write

Pr(y=1|x)=p(x)=G(β0+β1x1++βKxK)

  • almost identical to LPM!

  • except the linear index β0+β1x1++βKxK is now inside some function G().

  • Main property of G: transforms any zR into a number in the interval (0,1).

  • This immediately solves our problem of getting weird predictions for probabilities.

14 / 27

G: probit and logit


For both probit and logit we see that:

  1. any value x results in a value p(x) between 0 and 1

  2. the higher x, the higher the resulting p(x).

  3. Logit has fatter tails than Probit.

15 / 27

Running probit and logit in R: the glm function

  • We use the glm function to run a generalized linear model

  • This generalizes our standard linear model. We have to specify a family and a link:

probit <- glm(inlf ~ age,
data = mroz,
family = binomial(link = "probit"))
logit <- glm(inlf ~ age,
data = mroz,
family = binomial(link = "logit"))
16 / 27

Interpretation

modelsummary::modelsummary(list("probit" = probit,"logit" = logit))
probit logit
(Intercept) 0.707 1.136
(0.248) (0.398)
age −0.013 −0.020
(0.006) (0.009)
Num.Obs. 753 753
AIC 1028.9 1028.9
BIC 1038.1 1038.1
Log.Lik. −512.442 −512.431
F 4.828 4.858
RMSE 0.49 0.49
  • probit coefficient for age is -0.013

  • logit: -0.02 for logit,

  • impact of age on the prob of working is negative

  • However, how negative? We can't tell!

17 / 27

Interpretation

The model is

Pr(y=1|age)=G(xβ)=G(β0+β1age) and the marginal effect of age on the response probability is

Pr(y=1|age)age=g(β0+β1age)β1

  • function g is defined as g(z)=dGdz(z) - the first derivative function of G (i.e. the slope of G).

  • given G that is nonlinear, this means that g will be non-constant. You are able to try this out yourself using this app here:

ScPoApps::launchApp("marginal_effects_of_logit_probit")

or online

18 / 27

Interpretation

So you can see that there is not one single marginal effect in those models, as that depends on where we evaluate the previous expression. In practice, there are two common approaches:

  1. report effect at the average values of x: g(x¯β)βj

  2. report the sample average of all marginal effects: 1ni=1Ng(xiβ)βj

Thankfully there are packages available that help us to compute those marginal effects fairly easily. One of them is called mfx, and we would use it as follows:

19 / 27

Interpretation

f <- "inlf ~ age + kidslt6 + nwifeinc" # setup a formula
glms <- list()
glms$probit <- glm(formula = f,
data = mroz,
family = binomial(link = "probit"))
glms$logit <- glm(formula = f,
data = mroz,
family = binomial(link = "logit"))
# now the marginal effects versions
glms$probitMean <- mfx::probitmfx(formula = f,
data = mroz, atmean = TRUE)
glms$probitAvg <- mfx::probitmfx(formula = f,
data = mroz, atmean = FALSE)
glms$logitMean <- mfx::logitmfx(formula = f,
data = mroz, atmean = TRUE)
glms$logitAvg <- mfx::logitmfx(formula = f,
data = mroz, atmean = FALSE)
20 / 27

Interpretation

probit logit probitMean probitAvg logitMean logitAvg
(Intercept) 2.080*** 3.394*** 2.080*** 2.080*** 3.394*** 3.394***
(0.309) (0.516) (0.309) (0.309) (0.516) (0.516)
age −0.035*** −0.057*** −0.014*** −0.013*** −0.014*** −0.013***
−0.035*** −0.057*** −0.014*** −0.013*** −0.014*** −0.057***
−0.035*** −0.057*** −0.014*** −0.013*** −0.057*** −0.013***
−0.035*** −0.057*** −0.014*** −0.013*** −0.057*** −0.057***
−0.035*** −0.057*** −0.014*** −0.035*** −0.014*** −0.013***
−0.035*** −0.057*** −0.014*** −0.035*** −0.014*** −0.057***
−0.035*** −0.057*** −0.014*** −0.035*** −0.057*** −0.013***
−0.035*** −0.057*** −0.014*** −0.035*** −0.057*** −0.057***
−0.035*** −0.057*** −0.035*** −0.013*** −0.014*** −0.013***
−0.035*** −0.057*** −0.035*** −0.013*** −0.014*** −0.057***
−0.035*** −0.057*** −0.035*** −0.013*** −0.057*** −0.013***
−0.035*** −0.057*** −0.035*** −0.013*** −0.057*** −0.057***
−0.035*** −0.057*** −0.035*** −0.035*** −0.014*** −0.013***
−0.035*** −0.057*** −0.035*** −0.035*** −0.014*** −0.057***
−0.035*** −0.057*** −0.035*** −0.035*** −0.057*** −0.013***
−0.035*** −0.057*** −0.035*** −0.035*** −0.057*** −0.057***
(0.007) (0.011) (0.003) (0.002) (0.003) (0.003)
(0.007) (0.011) (0.003) (0.002) (0.003) (0.011)
(0.007) (0.011) (0.003) (0.002) (0.011) (0.003)
(0.007) (0.011) (0.003) (0.002) (0.011) (0.011)
(0.007) (0.011) (0.003) (0.007) (0.003) (0.003)
(0.007) (0.011) (0.003) (0.007) (0.003) (0.011)
(0.007) (0.011) (0.003) (0.007) (0.011) (0.003)
(0.007) (0.011) (0.003) (0.007) (0.011) (0.011)
(0.007) (0.011) (0.007) (0.002) (0.003) (0.003)
(0.007) (0.011) (0.007) (0.002) (0.003) (0.011)
(0.007) (0.011) (0.007) (0.002) (0.011) (0.003)
(0.007) (0.011) (0.007) (0.002) (0.011) (0.011)
(0.007) (0.011) (0.007) (0.007) (0.003) (0.003)
(0.007) (0.011) (0.007) (0.007) (0.003) (0.011)
(0.007) (0.011) (0.007) (0.007) (0.011) (0.003)
(0.007) (0.011) (0.007) (0.007) (0.011) (0.011)
kidslt6 −0.800*** −1.313*** −0.314*** −0.290*** −0.322*** −0.292***
−0.800*** −1.313*** −0.314*** −0.290*** −0.322*** −1.313***
−0.800*** −1.313*** −0.314*** −0.290*** −1.313*** −0.292***
−0.800*** −1.313*** −0.314*** −0.290*** −1.313*** −1.313***
−0.800*** −1.313*** −0.314*** −0.800*** −0.322*** −0.292***
−0.800*** −1.313*** −0.314*** −0.800*** −0.322*** −1.313***
−0.800*** −1.313*** −0.314*** −0.800*** −1.313*** −0.292***
−0.800*** −1.313*** −0.314*** −0.800*** −1.313*** −1.313***
−0.800*** −1.313*** −0.800*** −0.290*** −0.322*** −0.292***
−0.800*** −1.313*** −0.800*** −0.290*** −0.322*** −1.313***
−0.800*** −1.313*** −0.800*** −0.290*** −1.313*** −0.292***
−0.800*** −1.313*** −0.800*** −0.290*** −1.313*** −1.313***
−0.800*** −1.313*** −0.800*** −0.800*** −0.322*** −0.292***
−0.800*** −1.313*** −0.800*** −0.800*** −0.322*** −1.313***
−0.800*** −1.313*** −0.800*** −0.800*** −1.313*** −0.292***
−0.800*** −1.313*** −0.800*** −0.800*** −1.313*** −1.313***
(0.111) (0.188) (0.044) (0.036) (0.046) (0.047)
(0.111) (0.188) (0.044) (0.036) (0.046) (0.188)
(0.111) (0.188) (0.044) (0.036) (0.188) (0.047)
(0.111) (0.188) (0.044) (0.036) (0.188) (0.188)
(0.111) (0.188) (0.044) (0.111) (0.046) (0.047)
(0.111) (0.188) (0.044) (0.111) (0.046) (0.188)
(0.111) (0.188) (0.044) (0.111) (0.188) (0.047)
(0.111) (0.188) (0.044) (0.111) (0.188) (0.188)
(0.111) (0.188) (0.111) (0.036) (0.046) (0.047)
(0.111) (0.188) (0.111) (0.036) (0.046) (0.188)
(0.111) (0.188) (0.111) (0.036) (0.188) (0.047)
(0.111) (0.188) (0.111) (0.036) (0.188) (0.188)
(0.111) (0.188) (0.111) (0.111) (0.046) (0.047)
(0.111) (0.188) (0.111) (0.111) (0.046) (0.188)
(0.111) (0.188) (0.111) (0.111) (0.188) (0.047)
(0.111) (0.188) (0.111) (0.111) (0.188) (0.188)
nwifeinc −0.011** −0.019** −0.004** −0.004** −0.005** −0.004**
−0.011** −0.019** −0.004** −0.004** −0.005** −0.019**
−0.011** −0.019** −0.004** −0.004** −0.019** −0.004**
−0.011** −0.019** −0.004** −0.004** −0.019** −0.019**
−0.011** −0.019** −0.004** −0.011** −0.005** −0.004**
−0.011** −0.019** −0.004** −0.011** −0.005** −0.019**
−0.011** −0.019** −0.004** −0.011** −0.019** −0.004**
−0.011** −0.019** −0.004** −0.011** −0.019** −0.019**
−0.011** −0.019** −0.011** −0.004** −0.005** −0.004**
−0.011** −0.019** −0.011** −0.004** −0.005** −0.019**
−0.011** −0.019** −0.011** −0.004** −0.019** −0.004**
−0.011** −0.019** −0.011** −0.004** −0.019** −0.019**
−0.011** −0.019** −0.011** −0.011** −0.005** −0.004**
−0.011** −0.019** −0.011** −0.011** −0.005** −0.019**
−0.011** −0.019** −0.011** −0.011** −0.019** −0.004**
−0.011** −0.019** −0.011** −0.011** −0.019** −0.019**
(0.004) (0.007) (0.002) (0.001) (0.002) (0.002)
(0.004) (0.007) (0.002) (0.001) (0.002) (0.007)
(0.004) (0.007) (0.002) (0.001) (0.007) (0.002)
(0.004) (0.007) (0.002) (0.001) (0.007) (0.007)
(0.004) (0.007) (0.002) (0.004) (0.002) (0.002)
(0.004) (0.007) (0.002) (0.004) (0.002) (0.007)
(0.004) (0.007) (0.002) (0.004) (0.007) (0.002)
(0.004) (0.007) (0.002) (0.004) (0.007) (0.007)
(0.004) (0.007) (0.004) (0.001) (0.002) (0.002)
(0.004) (0.007) (0.004) (0.001) (0.002) (0.007)
(0.004) (0.007) (0.004) (0.001) (0.007) (0.002)
(0.004) (0.007) (0.004) (0.001) (0.007) (0.007)
(0.004) (0.007) (0.004) (0.004) (0.002) (0.002)
(0.004) (0.007) (0.004) (0.004) (0.002) (0.007)
(0.004) (0.007) (0.004) (0.004) (0.007) (0.002)
(0.004) (0.007) (0.004) (0.004) (0.007) (0.007)
Num.Obs. 753 753 753 753 753 753
Log.Lik. −478.395 −478.377
F 21.784 20.280
RMSE 0.47 0.47 0.47 0.47 0.47 0.47
+ p
21 / 27

Goodness of Fit in Binary Models

22 / 27

GOF in Binary Models

  • There is no universally accepted R2 for binary models.

  • We can think of a pseudo R2 which compares our model to one without any regressors:

glms$probit0 <- update(glms$probit, formula = . ~ 1) # intercept model only
1 - as.vector(logLik(glms$probit)/logLik(glms$probit0))
## [1] 0.07084972
23 / 27

GOF in Binary Models

  • There is no universally accepted R2 for binary models.

  • We can think of a pseudo R2 which compares our model to one without any regressors:

glms$probit0 <- update(glms$probit, formula = . ~ 1) # intercept model only
1 - as.vector(logLik(glms$probit)/logLik(glms$probit0))
## [1] 0.07084972
  • But that's not super informative (unlike the standard R2). Changes in likelihood value are highly non-linear, so that's not great.

  • Let's check accuracy - what's the proportion correctly predicted! round(fitted(x)) assigns 1 if the predicted prob >0.5.

prop.table(table(true = mroz$inlf, pred = round(fitted(glms$probit))))
## pred
## true 0 1
## 0 0.1699867 0.2616202
## 1 0.1221780 0.4462151
23 / 27

GOF in Binary Models: ROC Curves

  • The 0.5 cutoff is arbitrary. What if all predicted probs are >0.5 but in the data there are about 50% of zeros?

  • Let's choose an arbitrary cutoff c(0,1) and check accuracy for each value. This gives a better overview.

24 / 27

GOF in Binary Models: ROC Curves

  • The 0.5 cutoff is arbitrary. What if all predicted probs are >0.5 but in the data there are about 50% of zeros?

  • Let's choose an arbitrary cutoff c(0,1) and check accuracy for each value. This gives a better overview.

  • Also, we can confront the true positives rate (TPR) with the false positives rate (FPR).

    1. TPR: number of women correctly predicted to work divided by num of working women.
    2. FPR: number of women incorrectly predicted to work divided by num of non-working women.
24 / 27

GOF in Binary Models: ROC Curves

  • The 0.5 cutoff is arbitrary. What if all predicted probs are >0.5 but in the data there are about 50% of zeros?

  • Let's choose an arbitrary cutoff c(0,1) and check accuracy for each value. This gives a better overview.

  • Also, we can confront the true positives rate (TPR) with the false positives rate (FPR).

    1. TPR: number of women correctly predicted to work divided by num of working women.
    2. FPR: number of women incorrectly predicted to work divided by num of non-working women.
  • Plotting FPR vs TPR for each c defines the ROC (Receiver Operating Characteristics) Curve.

  • A good model has a ROC curve in the upper left corner: FPR = 0, TPR = 1.

24 / 27

GOF in Binary Models: ROC Curves

library(ROCR)
pred <- prediction(fitted(glms$probit), mroz$inlf)
par(mfrow = c(1,2), mar = lowtop)
plot(performance(pred,"acc"))
plot(performance(pred,"tpr","fpr"))
abline(0,1,lty = 2, col = "red")



  • Best accuracy at around c=0.6

  • ROC always above 45 deg line. Better than random assignment (flipping a coin)! Yeah!

25 / 27

Task 2 (10 Minutes): SwissLabor

  1. Load the SwissLabor Dataset from the AER package with data(SwissLabor, package = "AER")

  2. skim the data to get a quick overview. How many foreigners are in the data?

  3. Run a probit model of participation on all other variables plus age squared. Which age has the largest impact on participation?

  4. What is the marginal effect at the mean of all x of being a foreigner on participation?

  5. Produce a ROC curve of this probit model and discuss it!

26 / 27

END

bluebery.planterose@sciencespo.fr
Original Slides from Florian Oswald
Book
@ScPoEcon
@ScPoEcon
27 / 27

Where Are We At?

Last Time

  • Panel Data Estimation

  • The fixed effects estimator

  • fixest package

2 / 27
Paused

Help

Keyboard shortcuts

, , Pg Up, k Go to previous slide
, , Pg Dn, Space, j Go to next slide
Home Go to first slide
End Go to last slide
Number + Return Go to specific slide
b / m / f Toggle blackout / mirrored / fullscreen mode
c Clone slideshow
p Toggle presenter mode
t Restart the presentation timer
?, h Toggle this help
Esc Back to slideshow