Applied inference tools to regression analysis
Standard error of regression coefficients
Statistical significance of regression coefficients
Applied inference tools to regression analysis
Standard error of regression coefficients
Statistical significance of regression coefficients
Exploits changes in policy over time that don't affect everyone
Need to find (or construct) appropriate control group(s)
Key assumption: parallel trends
Empirical application: impact of minimum wage on employment
Multiple regression often does not provide causal estimates because of selection on unobservables.
RCTs are one way to solve this problem but they are often impossible to do.
Multiple regression often does not provide causal estimates because of selection on unobservables.
RCTs are one way to solve this problem but they are often impossible to do.
Four main causal evaluation methods used in economics:
Multiple regression often does not provide causal estimates because of selection on unobservables.
RCTs are one way to solve this problem but they are often impossible to do.
Four main causal evaluation methods used in economics:
These methods are used to identify causal relationships between treatments and outcomes.
Multiple regression often does not provide causal estimates because of selection on unobservables.
RCTs are one way to solve this problem but they are often impossible to do.
Four main causal evaluation methods used in economics:
These methods are used to identify causal relationships between treatments and outcomes.
In this lecture, we will cover a popular and rigorous program evaluation method: differences-in-differences.
2 time periods: before and after treatment.
2 groups:
2 time periods: before and after treatment.
2 groups:
2 time periods: before and after treatment.
2 groups:
control group: never receives treatment,
treatment group: initially untreated and then fully treated.
2 time periods: before and after treatment.
2 groups:
control group: never receives treatment,
treatment group: initially untreated and then fully treated.
Under certain assumptions, control group can be used as the counterfactual for treatment group
Imagine you are interested in assessing the causal impact of increasing the minimum wage on (un)employment.
Why is this not that straightforward? What should the control group be?
Imagine you are interested in assessing the causal impact of increasing the minimum wage on (un)employment.
Why is this not that straightforward? What should the control group be?
Seminal 1994 paper by prominent labor economists David Card and Alan Krueger entitled "Minimum Wages and Employment: A Case Study of the Fast-Food Industry in New Jersey and Pennsylvania"
Imagine you are interested in assessing the causal impact of increasing the minimum wage on (un)employment.
Why is this not that straightforward? What should the control group be?
Seminal 1994 paper by prominent labor economists David Card and Alan Krueger entitled "Minimum Wages and Employment: A Case Study of the Fast-Food Industry in New Jersey and Pennsylvania"
Estimates the effect of an increase in the minimum wage on the employment rate in the fast-food industry. Why this industry?
In the US, there is a national minimum wage, but states can depart from it.
April 1, 1992: New Jersey minimum wage increases from $4.25 to $5.05 per hour.
In the US, there is a national minimum wage, but states can depart from it.
April 1, 1992: New Jersey minimum wage increases from $4.25 to $5.05 per hour.
Neighboring Pennsylvania did not change its minimum wage level.
In the US, there is a national minimum wage, but states can depart from it.
April 1, 1992: New Jersey minimum wage increases from $4.25 to $5.05 per hour.
Neighboring Pennsylvania did not change its minimum wage level.
In the US, there is a national minimum wage, but states can depart from it.
April 1, 1992: New Jersey minimum wage increases from $4.25 to $5.05 per hour.
Neighboring Pennsylvania did not change its minimum wage level.
Pennsylvania and New Jersey are very similar: similar institutions, similar habits, similar consumers, similar incomes, similar weather, etc.
Surveyed 410 fast-food establishments in New Jersey (NJ) and eastern Pennsylvania
Timing:
Surveyed 410 fast-food establishments in New Jersey (NJ) and eastern Pennsylvania
Timing:
Surveyed 410 fast-food establishments in New Jersey (NJ) and eastern Pennsylvania
Timing:
Surveyed 410 fast-food establishments in New Jersey (NJ) and eastern Pennsylvania
Timing:
What comparisons do you think they did?
Surveyed 410 fast-food establishments in New Jersey (NJ) and eastern Pennsylvania
Timing:
What comparisons do you think they did?
Let's take a closer at their data
# install package that contains the cleaned dataremotes::install_github("b-rodrigues/diffindiff") # load packagelibrary(diffindiff) # load datack1994 <- njmin
Surveyed 410 fast-food establishments in New Jersey (NJ) and eastern Pennsylvania
Timing:
What comparisons do you think they did?
Let's take a closer at their data
# install package that contains the cleaned dataremotes::install_github("b-rodrigues/diffindiff") # load packagelibrary(diffindiff) # load datack1994 <- njmin
ck1994 %>% select(sheet,chain,state,observation,empft,emppt) %>% head()
## # A tibble: 6 × 6## sheet chain state observation empft emppt## <chr> <chr> <chr> <chr> <dbl> <dbl>## 1 46 bk Pennsylvania February 1992 30 15 ## 2 49 kfc Pennsylvania February 1992 6.5 6.5## 3 506 kfc Pennsylvania February 1992 3 7 ## 4 56 wendys Pennsylvania February 1992 20 20 ## 5 61 wendys Pennsylvania February 1992 6 26 ## 6 62 wendys Pennsylvania February 1992 0 31
Take a look at the dataset and list the variables. Check the variable definitions with ?njmin
.
Tabulate the number of stores by state
and by survey wave (observation
). Does it match what's in Table 1 of the paper?
Create a full-time equivalent (FTE) employees variable called empfte
equal to empft
+ 0.5*emppt
+ nmgrs
. empft
and emppt
correspond respectively to the number of full-time and part-time employees. nmgrs
corresponds to the number of managers. This is how Card and Krueger compute their full-time equivalent (FTE) employment variable (p.775 of the paper).
Compute the average number of FTE employment, average percentage of FT employees (out of the number of FTE employees), and average starting wage (wage_st
) by state and by survey wave. Compare your results with Table 2 of the paper.
How different are New Jersey and Pennsylvania's fast-food restaurants before the minimum wage increase?
Average Employment Per Store Before and After the Rise in NJ Minimum Wage
Variables | Pennsylvania | New Jersey |
---|---|---|
FTE employment before | 23.33 | 20.44 |
FTE employment after | 21.17 | 21.03 |
Change in mean FTE employment | -2.17 | 0.59 |
Average Employment Per Store Before and After the Rise in NJ Minimum Wage
Variables | Pennsylvania | New Jersey |
---|---|---|
FTE employment before | 23.33 | 20.44 |
FTE employment after | 21.17 | 21.03 |
Change in mean FTE employment | -2.17 | 0.59 |
Differences-in-differences causal estimate: 0.59−(−2.17)=2.76
Average Employment Per Store Before and After the Rise in NJ Minimum Wage
Variables | Pennsylvania | New Jersey |
---|---|---|
FTE employment before | 23.33 | 20.44 |
FTE employment after | 21.17 | 21.03 |
Change in mean FTE employment | -2.17 | 0.59 |
Differences-in-differences causal estimate: 0.59−(−2.17)=2.76
Average Employment Per Store Before and After the Rise in NJ Minimum Wage
Variables | Pennsylvania | New Jersey |
---|---|---|
FTE employment before | 23.33 | 20.44 |
FTE employment after | 21.17 | 21.03 |
Change in mean FTE employment | -2.17 | 0.59 |
Differences-in-differences causal estimate: 0.59−(−2.17)=2.76
Yes the essence of differences-in-differences is that simple! 😀
Average Employment Per Store Before and After the Rise in NJ Minimum Wage
Variables | Pennsylvania | New Jersey |
---|---|---|
FTE employment before | 23.33 | 20.44 |
FTE employment after | 21.17 | 21.03 |
Change in mean FTE employment | -2.17 | 0.59 |
Differences-in-differences causal estimate: 0.59−(−2.17)=2.76
Yes the essence of differences-in-differences is that simple! 😀
Let's look at these results graphically.
In practice, DiD is usually estimated on more than 2 periods (4 observations)
There are more data points before and after the policy change
In practice, DiD is usually estimated on more than 2 periods (4 observations)
There are more data points before and after the policy change
3 ingredients:
In practice, DiD is usually estimated on more than 2 periods (4 observations)
There are more data points before and after the policy change
3 ingredients:
In practice, DiD is usually estimated on more than 2 periods (4 observations)
There are more data points before and after the policy change
3 ingredients:
Treatment dummy variable: TREATs where the s subscript reminds us that the treatment is at the state level
Post-treatment periods dummy variables: POSTt where the t subscript reminds us that this variable varies over time
In practice, DiD is usually estimated on more than 2 periods (4 observations)
There are more data points before and after the policy change
3 ingredients:
Treatment dummy variable: TREATs where the s subscript reminds us that the treatment is at the state level
Post-treatment periods dummy variables: POSTt where the t subscript reminds us that this variable varies over time
Interaction term between the two: TREATs×POSTt 👉 the coefficient on this term is the DiD causal effect!
Treatment dummy variable TREATs={0if s=Pennsylvania1if s=New Jersey
Treatment dummy variable TREATs={0if s=Pennsylvania1if s=New Jersey
Post-treatment periods dummy variable POSTt={0if t<April 1, 19921if t≥April 1, 1992
Treatment dummy variable TREATs={0if s=Pennsylvania1if s=New Jersey
Post-treatment periods dummy variable POSTt={0if t<April 1, 19921if t≥April 1, 1992
Which observations correspond to TREATs×POSTt=1?
Treatment dummy variable TREATs={0if s=Pennsylvania1if s=New Jersey
Post-treatment periods dummy variable POSTt={0if t<April 1, 19921if t≥April 1, 1992
Which observations correspond to TREATs×POSTt=1?
Let's put all these ingredients together: EMPst=α+βTREATs+γPOSTt+δ(TREATs×POSTt)+εst
δ: causal effect of the minimum wage increase on employment
EMPst=α+βTREATs+γPOSTt+δ(TREATs×POSTt)+εst
EMPst=α+βTREATs+γPOSTt+δ(TREATs×POSTt)+εst
We have the following:
EMPst=α+βTREATs+γPOSTt+δ(TREATs×POSTt)+εst
We have the following:
E(EMPst|TREATs=0,POSTt=0)=α
EMPst=α+βTREATs+γPOSTt+δ(TREATs×POSTt)+εst
We have the following:
E(EMPst|TREATs=0,POSTt=0)=α
E(EMPst|TREATs=0,POSTt=1)=α+γ
EMPst=α+βTREATs+γPOSTt+δ(TREATs×POSTt)+εst
We have the following:
E(EMPst|TREATs=0,POSTt=0)=α
E(EMPst|TREATs=0,POSTt=1)=α+γ
E(EMPst|TREATs=1,POSTt=0)=α+β
EMPst=α+βTREATs+γPOSTt+δ(TREATs×POSTt)+εst
We have the following:
E(EMPst|TREATs=0,POSTt=0)=α
E(EMPst|TREATs=0,POSTt=1)=α+γ
E(EMPst|TREATs=1,POSTt=0)=α+β
E(EMPst|TREATs=1,POSTt=1)=α+β+γ+δ
EMPst=α+βTREATs+γPOSTt+δ(TREATs×POSTt)+εst
We have the following:
E(EMPst|TREATs=0,POSTt=0)=α
E(EMPst|TREATs=0,POSTt=1)=α+γ
E(EMPst|TREATs=1,POSTt=0)=α+β
E(EMPst|TREATs=1,POSTt=1)=α+β+γ+δ
[E(EMPst|TREATs=1,POSTt=1)−E(EMPst|TREATs=1,POSTt=0)]−[E(EMPst|TREATs=0,POSTt=1)−E(EMPst|TREATs=0,POSTt=0)]=δ
EMPst=α+βTREATs+γPOSTt+δ(TREATs×POSTt)+εst
In table form:
Pre mean | Post mean | Δ(post - pre) | |
---|---|---|---|
Pennsylvania (PA) | α | α+γ | γ |
New Jersey (NJ) | α+β | α+β+γ+δ | γ+δ |
Δ(NJ - PA) | β | β+δ | δ |
EMPst=α+βTREATs+γPOSTt+δ(TREATs×POSTt)+εst
In table form:
Pre mean | Post mean | Δ(post - pre) | |
---|---|---|---|
Pennsylvania (PA) | α | α+γ | γ |
New Jersey (NJ) | α+β | α+β+γ+δ | γ+δ |
Δ(NJ - PA) | β | β+δ | δ |
This table generalizes to other settings by substituting Pennsylvania with Control and New Jersey with Treatment
Create a dummy variable, treat
, equal to FALSE
if state
is Pennsylvania and TRUE
if New Jersey.
Create a dummy variable, post
, equal to FALSE
if observation
is February 1992 and TRUE
otherwise.
Estimate the following regression model. Do you obtain the same results as in slide 9?
empftest=α+βtreats+γpostt+δ(treats×postt)+εst
Common or parallel trends assumption: absent any minimum wage increase, Pennsylvania's fast-food employment trend would have been what we should have expected to see in New Jersey.
Common or parallel trends assumption: absent any minimum wage increase, Pennsylvania's fast-food employment trend would have been what we should have expected to see in New Jersey.
Common or parallel trends assumption: absent any minimum wage increase, Pennsylvania's fast-food employment trend would have been what we should have expected to see in New Jersey.
This assumption states that Pennsylvania's fast-food employment trend between February and November 1992 provides a reliable counterfactual employment trend New Jersey's fast-food industry would have experienced had New Jersey not increased its minimum wage.
Impossible to completely validate or invalidate this assumption.
Intuitive check: compare trends before policy change (and after policy change if no expected medium-term effects)
Here is the actual trends for Pennsylvania and New Jersey
Here is the actual trends for Pennsylvania and New Jersey
Let:
Let:
Y1ist: fast food employment at restaurant i in state s at time t if there is a high state MW;
Y0ist: fast food employment at restaurant i in state s at time t if there is a low state MW;
Let:
Y1ist: fast food employment at restaurant i in state s at time t if there is a high state MW;
Y0ist: fast food employment at restaurant i in state s at time t if there is a low state MW;
These are potential outcomes, you can only observe one of the two.
Let:
Y1ist: fast food employment at restaurant i in state s at time t if there is a high state MW;
Y0ist: fast food employment at restaurant i in state s at time t if there is a low state MW;
These are potential outcomes, you can only observe one of the two.
The key assumption underlying DiD estimation is that, in the no-treatment state, restaurant i's outcome in state s at time t is given by:
E[Y0ist|s,t]=γs+λt
2 implicit assumptions:
Selection bias: relates to fixed state characteristics (γ)
Time trend: same time trend for treatment and control group (λ)
Outcomes in the comparison group:
E[Yist|s=Pennsylvania,t=Feb]=γPA+λFeb
Outcomes in the comparison group:
E[Yist|s=Pennsylvania,t=Feb]=γPA+λFeb E[Yist|s=Pennsylvania,t=Nov]=γPA+λNov
Outcomes in the comparison group:
E[Yist|s=Pennsylvania,t=Feb]=γPA+λFeb E[Yist|s=Pennsylvania,t=Nov]=γPA+λNov
E[Yist|s=Pennsylvania,t=Nov]−E[Yist|s=Pennsylvania,t=Feb]=γPA+λNov−(γPA+λFeb)=λNov−λFeb
Outcomes in the comparison group:
E[Yist|s=Pennsylvania,t=Feb]=γPA+λFeb
E[Yist|s=Pennsylvania,t=Nov]=γPA+λNov
E[Yist|s=Pennsylvania,t=Nov]−E[Yist|s=Pennsylvania,t=Feb]=γPA+λNov−(γPA+λFeb)=λNov−λFebtime trend
Outcomes in the comparison group:
E[Yist|s=Pennsylvania,t=Feb]=γPA+λFeb
E[Yist|s=Pennsylvania,t=Nov]=γPA+λNov
E[Yist|s=Pennsylvania,t=Nov]−E[Yist|s=Pennsylvania,t=Feb]=γPA+λNov−(γPA+λFeb)=λNov−λFebtime trend
→ the comparison group allows to estimate the time trend.
Let δ denote the true impact of the minimum wage increase:
E[Y1ist−Y0ist|s,t]=δ
Let δ denote the true impact of the minimum wage increase:
E[Y1ist−Y0ist|s,t]=δ
Outcomes in the treatment group:
E[Yist|s=New Jersey,t=Feb]=γNJ+λFeb
Let δ denote the true impact of the minimum wage increase:
E[Y1ist−Y0ist|s,t]=δ
Outcomes in the treatment group:
E[Yist|s=New Jersey,t=Feb]=γNJ+λFeb E[Yist|s=New Jersey,t=Nov]=γNJ+δ+λNov
Let δ denote the true impact of the minimum wage increase:
E[Y1ist−Y0ist|s,t]=δ
Outcomes in the treatment group:
E[Yist|s=New Jersey,t=Feb]=γNJ+λFeb E[Yist|s=New Jersey,t=Nov]=γNJ+δ+λNov E[Yist|s=New Jersey,t=Nov]−E[Yist|s=New Jersey,t=Feb]=γNJ+δ+λNov−(γNJ+λFeb)=δ+λNov−λFeb
Let δ denote the true impact of the minimum wage increase:
E[Y1ist−Y0ist|s,t]=δ
Outcomes in the treatment group:
E[Yist|s=New Jersey,t=Feb]=γNJ+λFeb
E[Yist|s=New Jersey,t=Nov]=γNJ+δ+λNov
E[Yist|s=New Jersey,t=Nov]−E[Yist|s=New Jersey,t=Feb]=γNJ+δ+λNov−(γNJ+λFeb)=δ+λNov−λFebtime trend
Therefore we have:
E[Yist|s=PA,t=Nov]−E[Yist|s=PA,t=Feb]=λNov−λFebtime trend
Therefore we have:
E[Yist|s=PA,t=Nov]−E[Yist|s=PA,t=Feb]=λNov−λFebtime trend
E[Yist|s=NJ,t=Nov]−E[Yist|s=NJ,t=Feb]=δ+λNov−λFebtime trend
Therefore we have:
E[Yist|s=PA,t=Nov]−E[Yist|s=PA,t=Feb]=λNov−λFebtime trend
E[Yist|s=NJ,t=Nov]−E[Yist|s=NJ,t=Feb]=δ+λNov−λFebtime trend
DD=E[Yist|s=NJ,t=Nov]−E[Yist|s=NJ,t=Feb]−(E[Yist|s=PA,t=Nov]−E[Yist|s=PA,t=Feb])=δ+λNov−λFeb−(λNov−λFeb)=δ
Applied inference tools to regression analysis
Standard error of regression coefficients
Statistical significance of regression coefficients
Keyboard shortcuts
↑, ←, Pg Up, k | Go to previous slide |
↓, →, Pg Dn, Space, j | Go to next slide |
Home | Go to first slide |
End | Go to last slide |
Number + Return | Go to specific slide |
b / m / f | Toggle blackout / mirrored / fullscreen mode |
c | Clone slideshow |
p | Toggle presenter mode |
t | Restart the presentation timer |
?, h | Toggle this help |
Esc | Back to slideshow |