IV estimation
Some important applications
Some pitfalls
IV estimation
Some important applications
Some pitfalls
Today
Revisit the Ability Bias in an App 😎
Introduce Panel Data
So far, we dealt with data that looks like this:
County | CrimeRate | ProbofArrest |
---|---|---|
1 | 0.0398849 | 0.289696 |
3 | 0.0163921 | 0.202899 |
5 | 0.0093372 | 0.406593 |
7 | 0.0219159 | 0.431095 |
9 | 0.0075178 | 0.631579 |
We have a unit identifier (like County
here),
Observables on each unit.
Usually called a cross-sectional dataset
Provides single snapshot view
Each row, in other words, is one observation.
Now, let's add a time
index: Year
.
County | Year | CrimeRate | ProbofArrest |
---|---|---|---|
1 | 81 | 0.0398849 | 0.289696 |
1 | 82 | 0.0383449 | 0.338111 |
1 | 83 | 0.0303048 | 0.330449 |
1 | 84 | 0.0347259 | 0.362525 |
1 | 85 | 0.0365730 | 0.325395 |
1 | 86 | 0.0347524 | 0.326062 |
1 | 87 | 0.0356036 | 0.298270 |
3 | 81 | 0.0163921 | 0.202899 |
3 | 82 | 0.0190651 | 0.162218 |
Next to the unit identifier (County
) we now have Year
Now a pair (County
,Year
) indexes one observation.
We call this a panel or longitudinal dataset
We can track units over time.
The above data can be loaded with
data(crime4,package = "wooldridge")
They are from C. Cornwell and W. Trumball (1994), “Estimating the Economic Model of Crime with Panel Data”.
The above data can be loaded with
data(crime4,package = "wooldridge")
They are from C. Cornwell and W. Trumball (1994), “Estimating the Economic Model of Crime with Panel Data”.
One question here: how big is the deterrent effect of law enforcement? If you know you are more likely to get arrested, will you be less likely to commit a crime?
The above data can be loaded with
data(crime4,package = "wooldridge")
They are from C. Cornwell and W. Trumball (1994), “Estimating the Economic Model of Crime with Panel Data”.
One question here: how big is the deterrent effect of law enforcement? If you know you are more likely to get arrested, will you be less likely to commit a crime?
This is tricky: Does high crime cause stronger police response, which acts as a deterrent, or is crime low because deterrent is strong to start with?
This is sometimes called a simultaneous equation model situation: police response impacts crime, but crime impacts police response
police=α0+α1crimecrime=β0+β1police
Most literature prior to that paper estimated simultaneous equations off cross sectional data
Cornwell and Trumball are worried about unobserved heterogeneity between jurisdictions.
Why? What could possibly go wrong?
Let's pick out 4 counties from their dataset
Let's look at the crime rate vs probability of arrest relationship
First for all of them together as a single cross section
Then taking advantage of the panel structure (i.e. each county over time).
Subset data to 4 counties
plot probability of arrest vs crime rate.
css = crime4 %>% filter(county %in% c(1,3,145, 23)) ggplot(css, aes(x = prbarr, y = crmrte)) + geom_point() + geom_smooth(method="lm", se=FALSE) + theme_bw() + labs(x = 'Probability of Arrest', y = 'Crime Rate')
We see an upward sloping line!
Higher probability of arrest is associated to higher crime rates.
How strong is the effect?
We see an upward sloping line!
Higher probability of arrest is associated to higher crime rates.
How strong is the effect?
xsection = lm(crmrte ~ prbarr, css)coef(xsection)[2] # gets slope coef
## prbarr ## 0.06480104
Increasing probability of arrest by 1 unit (i.e. 100 percentage point), increases the crime rate by 0.064801. So, if we double the probability of arrest, crime would increase by 0.064 crimes per person.
Increase of 10 percentage points in the probability of arrest (e.g. prbarr
goes from 0.2 to 0.3) ...
... is associated with an increase in crime rate from 0.021 to 0.028, or a 33.33 percent increase in the crime rate.
Literally: counties with a higher probability of being arrested also have a higher crime rate.
So, does it mean that as there is more crime in certain areas, the police become more efficient at arresting criminals, and so the probability of getting arrested on any committed crime goes up?
What does police efficiency depend on?
Does the poverty level in a county matter for this?
The local laws?
🤯 wow, there seem to be too many things left out of this simple picture.
Fixed Characteristics: vary by county
LocalStuff
are things that describe the County, like geography, and other persistent features.LawAndOrder
: commitment to law and order politics of local politiciansCivilRights
: how many civil rights you haveTime-varying Characteristics: vary by county and by year
Police
budget: an elected politician has some discretion over police spending
Poverty
level varies with the national/global state of the economy.
You will often hear the terms within and between variation in panel data contexts.
things that change within each group over time:
here we said police budgets
and poverty levels would change within each group and over time.
Things that are fixed for each group over time:
LocalStuff
LawAndOrder
and
CivilRights
differ only across or between groups
Let's add the mean of prbarr
and crmrte
for each of those counties to the scatter plot!
And then a regression through those 4 points!
Let's add the mean of prbarr
and crmrte
for each of those counties to the scatter plot!
And then a regression through those 4 points!
Collect all group-specific time-invariant features in the factor County
.
Takes care of all factors which do not vary over time within each unit.
We can net out the group effect!
We call County
a fixed effect.
R
We've seen unobserved variable bias (OVB). For example, if the true model read:
yi=β0+β1xi+ci+ui if ci unobservable and Cov(xi,ci)≠0⇒E[ui+ci|xi]≠0, with ui+ci total unobserved component.
We've seen unobserved variable bias (OVB). For example, if the true model read:
yi=β0+β1xi+ci+ui if ci unobservable and Cov(xi,ci)≠0⇒E[ui+ci|xi]≠0, with ui+ci total unobserved component.
where c=Ai and x=s was schooling.
ability bias.
Find IV correlated with schooling but not ability
We've seen unobserved variable bias (OVB). For example, if the true model read:
yi=β0+β1xi+ci+ui if ci unobservable and Cov(xi,ci)≠0⇒E[ui+ci|xi]≠0, with ui+ci total unobserved component.
where c=Ai and x=s was schooling.
ability bias.
Find IV correlated with schooling but not ability
yit=β1xit+ci+uit,t=1,2,...T
ci: individual fixed effect, unobserved effect or unobserved heterogeneity.
ci: is fixed over time (ability Ai for example), but can be correlated with xit!
Simplest approach: include a dummy variable for each group i.
This is literally controlling for county i
Each i has basically their own intercept ci
In R
you achieve this like so:
yit=β1xit+ci+uit,t=1,2,...T
mod = list()mod$dummy <- lm(crmrte ~ prbarr + factor(county), css) # i is the unit IDbroom::tidy(mod$dummy)
## # A tibble: 5 × 5## term estimate std.error statistic p.value## <chr> <dbl> <dbl> <dbl> <dbl>## 1 (Intercept) 0.0449 0.00456 9.87 9.85e-10## 2 prbarr -0.0284 0.0136 -2.08 4.86e- 2## 3 factor(county)3 -0.0250 0.00254 -9.82 1.07e- 9## 4 factor(county)23 -0.00850 0.00166 -5.13 3.41e- 5## 5 factor(county)145 -0.00650 0.00160 -4.07 4.70e- 4
Within each county, now is a negative relationship!!
Different intercepts (county 1 is the reference group),
Unique slope coefficient β. (you observe that the lines are parallel).
We are shifting lines down from the reference group 1.
If we only had T=2 periods, we could just difference both periods, basically leaving us with
yi1=β1xi1+ci+ui1yi2=β1xi2+ci+ui2⇒yi1−yi2=β1(xi1−xi2)+ci−ci+ui1−ui2Δyi=β1Δxi+Δui
where Δ means difference over time of and to recover the parameter of interest β1 we would run
lm(deltay ~ deltax, diff_data)
With T>2 we need a different approach
One important concept is called the within transformation
So, controlling for group identity and only looking at time variation
Remember DAG!
¯xi=1TT∑t=1xit
R
: Manual SolutionThis works for our problem with fixed effect ci because ci is not time varying by assumption! hence it drops out:
yit−¯yi=β1(xit−¯xi)+ci−ci+uit−¯ui
It's easy to do yourself! First let's compute the demeaned values:
cdata <- css %>% group_by(county) %>% mutate(mean_crime = mean(crmrte), mean_prob = mean(prbarr)) %>% mutate(demeaned_crime = crmrte - mean_crime, demeaned_prob = prbarr - mean_prob)
Then, run both models with simple OLS:
mod$xsect <- lm(crmrte ~ prbarr, data = cdata)mod$demeaned <- lm(demeaned_crime ~ demeaned_prob, data = cdata)
R
: Manual SolutionWe get this table:
xsect | dummy | demeaned | |
---|---|---|---|
(Intercept) | 0.009 | 0.045 | 0.000 |
(0.005) | (0.005) | (0.001) | |
prbarr | 0.065 | -0.028 | |
(0.016) | (0.014) | ||
demeaned_prob | -0.028 | ||
(0.013) | |||
R2 | 0.390 | 0.893 | 0.159 |
Estimate for prbarr
is positive in the cross-section
Taking care of the unobservered heterogeneity ci...
...either by including an intercept for each i or by time-demeaning the data
we obtain: -0.028 .
How to interpret those negative slopes?
We look at a single unit i and ask:
if the arrest probability in i increases by 10 percentage points (i.e. from 0.2 to 0.3) from year t to t+1, we expect crimes per person to fall from 0.039 to 0.036, or by -7.69 percent (in the reference county number 1).
R
: use a Package!In real life you will hardly ever perform the within-transformation by yourself
and use a package instead!
There are several options (fixest
is fastest). In our context:
mod$FE = fixest::feols(crmrte ~ prbarr | county, cdata)
Notice the similar setup to the estimatr::iv_robust
two-part formula. Here the fixed effects come after the |
.
Also, we can have more than one fixed effect! For a cool example with three fixed effects see the package vignette
R
: use fixest
🙂xsect | dummy | demeaned | FE | |
---|---|---|---|---|
(Intercept) | 0.009 | 0.045 | 0.000 | |
(0.005) | (0.005) | (0.001) | ||
prbarr | 0.065 | -0.028 | -0.028 | |
(0.016) | (0.014) | (0.005) | ||
demeaned_prob | -0.028 | |||
(0.013) | ||||
R2 | 0.390 | 0.893 | 0.159 | 0.893 |
Same estimates! 😅
Notice the standard errors: robust?!
fixest
computes cluster-robust se's.
We suspect there is strong correlation in residuals within each county (over time).
The within transformation centers the data!
By time-demeaning y and x, we project out the fixed factors related to county
Only within county variation is left.
Made by Nick C Huntington-Klein. 🙏
IV estimation
Some important applications
Some pitfalls
Keyboard shortcuts
↑, ←, Pg Up, k | Go to previous slide |
↓, →, Pg Dn, Space, j | Go to next slide |
Home | Go to first slide |
End | Go to last slide |
Number + Return | Go to specific slide |
b / m / f | Toggle blackout / mirrored / fullscreen mode |
c | Clone slideshow |
p | Toggle presenter mode |
t | Restart the presentation timer |
?, h | Toggle this help |
Esc | Back to slideshow |