Who Am I
This Course
Recap 1 of topics from intro course
Quiz 1 (before next time)
Recap 2
I'm an PhD candidate at the Paris School of Economics. Check out my website!
I work on tax evasion, climate policies, and macro topics:
Acceptability of climate policies: who support/oppose climate policies and why?
Offshore real-estate in Dubai using leaked data: how large is it, who owns it, and what does it tell us about global offshore real-estate?
Excess Profit Tax: how to tax excess profits from energy firms that benefited from the war in Ukraine?
This course is the follow-up to Introduction to Econometrics with R which is taught to 2nd years.
You are supposed to be familiar with all the econometrics material from the slides of that course and/or chapters 1-9 in our textbook.
We also assume you have basic R
working knowledge at the level of the intro course!
data.frame
manipulation with dplyr
lm
ggplot2
There will be four quizzes on Moodle roughly every two weeks => 40%
There will be two take home exams / case studies => 60%
There will be no final exam 😅.
There will be four quizzes on Moodle roughly every two weeks => 40%
There will be two take home exams / case studies => 60%
There will be no final exam 😅.
Book chapter 10 onwards
The Slides
The interactive shiny apps
Quizzes on Moodle
Intro, Recap 1 (Quiz 1)
Recap 2 (Quiz 2)
Intro, Difference-in-Differences
Tools: Rmarkdown
and data.table
Instrumental Variables 1 (Quiz 3)
Instrumental Variables 2 (Midterm exam)
7. Panel Data 1
8. Panel Data 2 (Quiz 4)
9. Discrete Outcomes
10. Intro to Machine Learning 1
11. Intro to Machine Learning 2
12. Recap / Buffer (Final Project)
12. Recap / Buffer (Final Project)
Let's get cracking! 💪
We write our (simple) population model
yi=β0+β1xi+ui
and our sample-based estimated regression model as
yi=^β0+^β1xi+ei
An estimated regression model produces estimates for each observation:
^yi=^β0+^β1xi
which gives us the best-fit line through our dataset.
(A lot of this set slides - in particular: pictures! - have been taken from Ed Rubin's outstanding material. Thanks Ed 🙏)
Load data here. in dta
format. (Hint: use haven::read_dta("filename")
to read this format.)
Obtain common summary statistics for the variables classize
, avgmath
and avgverb
. Hint: use the skimr
package.
Estimate the linear model avgmathi=β0+classizeixi+ui
Load the data
grades = haven::read_dta(file ="https://www.dropbox.com/s/wwp2cs9f0dubmhr/grade5.dta?dl=1")
Describe the dataset:
library(dplyr)grades %>% select(classize,avgmath,avgverb) %>% skimr::skim()
Run OLS to estimate the relationship between class size and student achievement?
summary(lm(formula = avgmath ~ classize, data = grades))
Population
Population
Population relationship
yi=2.53+0.57xi+ui
yi=β0+β1xi+ui
Sample 1: 30 random individuals
Sample 1: 30 random individuals
Population relationship
yi=2.53+0.57xi+ui
Sample relationship
^yi=2.36+0.61xi
Sample 2: 30 random individuals
Population relationship
yi=2.53+0.57xi+ui
Sample relationship
^yi=2.79+0.56xi
Sample 3: 30 random individuals
Population relationship
yi=2.53+0.57xi+ui
Sample relationship
^yi=3.21+0.45xi
Let's repeat this 10,000 times.
(This exercise is called a (Monte Carlo) simulation.)
Question: Why do we care about population vs. sample?
On average, our regression lines match the population line very nicely.
However, individual lines (samples) can really miss the mark.
Differences between individual samples and the population lead to uncertainty for the econometrician.
Question: Why do we care about population vs. sample?
Question: Why do we care about population vs. sample?
Answer: Uncertainty matters.
Every random sample of data is different.
Our (OLS) estimators are computed from those samples of data.
If there is sampling variation, there is variation in our estimates.
Question: Why do we care about population vs. sample?
Answer: Uncertainty matters.
Every random sample of data is different.
Our (OLS) estimators are computed from those samples of data.
If there is sampling variation, there is variation in our estimates.
OLS inference depends on certain assumptions.
If violated, our estimates will be biased or imprecise.
Or both. 😧
We can estimate a regression line in R
(lm(y ~ x, my_data)
) and stata
(reg y x
). But where do these estimates come from?
A few slides back:
^yi=^β0+^β1xi which gives us the best-fit line through our dataset.
But what do we mean by "best-fit line"?
Question: What do we mean by best-fit line?
Answers:
SSE=∑ni=1e2i where ei=yi−^yi
Let's consider the dataset we previously generated.
For any line (^y=^β0+^β1x)
For any line (^y=^β0+^β1x), we can calculate errors: ei=yi−^yi
For any line (^y=^β0+^β1x), we can calculate errors: ei=yi−^yi
For any line (^y=^β0+^β1x), we can calculate errors: ei=yi−^yi
SSE squares the errors (∑e2i): bigger errors get bigger penalties.
The OLS estimate is the combination of ^β0 and ^β1 that minimize SSE.
ScPoApps::launchApp("reg_simple")
In simple linear regression, the OLS estimator comes from choosing the ^β0 and ^β1 that minimize the sum of squared errors (SSE), i.e.,
min^β0,^β1SSE
In simple linear regression, the OLS estimator comes from choosing the ^β0 and ^β1 that minimize the sum of squared errors (SSE), i.e.,
min^β0,^β1SSE
but we already know SSE=∑ie2i. Now use the definitions of ei and ^y.
e2i=(yi−^yi)2=(yi−^β0−^β1xi)2=y2i−2yi^β0−2yi^β1xi+^β20+2^β0^β1xi+^β21x2i
In simple linear regression, the OLS estimator comes from choosing the ^β0 and ^β1 that minimize the sum of squared errors (SSE), i.e.,
min^β0,^β1SSE
but we already know SSE=∑ie2i. Now use the definitions of ei and ^y.
e2i=(yi−^yi)2=(yi−^β0−^β1xi)2=y2i−2yi^β0−2yi^β1xi+^β20+2^β0^β1xi+^β21x2i
Recall: Minimizing a multivariate function requires (1) first derivatives equal zero (the 1st-order conditions) and (2) second-order conditions (concavity).
ScPoApps::launchApp("SSR_cone")
We skipped the maths.
We now have the OLS estimators for the slope
^β1=∑i(xi−¯¯¯x)(yi−¯¯¯y)∑i(xi−¯¯¯x)2
and the intercept
^β0=¯¯¯y−^β1¯¯¯x
Remember that those two formulae are amongst the very few ones from the intro course that you should know by heart! ❤️
We skipped the maths.
We now have the OLS estimators for the slope
^β1=∑i(xi−¯¯¯x)(yi−¯¯¯y)∑i(xi−¯¯¯x)2
and the intercept
^β0=¯¯¯y−^β1¯¯¯x
Remember that those two formulae are amongst the very few ones from the intro course that you should know by heart! ❤️
We now turn to the assumptions and (implied) properties of OLS.
Question: What properties might we care about for an estimator?
Question: What properties might we care about for an estimator?
Tangent: Let's review statistical properties first.
Refresher: Density functions
Recall that we use probability density functions (PDFs) to describe the probability a continuous random variable takes on a range of values. (The total area = 1.)
These PDFs characterize probability distributions, and the most common/famous/popular distributions get names (e.g., normal, t, Gamma).
Here is the definition of a PDF fXfor a continuous RV X:
Pr[a≤X≤b]≡∫bafX(x)dx
Refresher: Density functions
The probability a standard normal random variable takes on a value between -2 and 0: P(−2≤X≤0)=0.48
Refresher: Density functions
The probability a standard normal random variable takes on a value between -1.96 and 1.96: P(−1.96≤X≤1.96)=0.95
Refresher: Density functions
The probability a standard normal random variable takes on a value beyond 2: P(X>2)=0.023
Imagine we are trying to estimate an unknown parameter β, and we know the distributions of three competing estimators. Which one would we want? How would we decide?
Question: What properties might we care about for an estimator?
Question: What properties might we care about for an estimator?
Answer one: Bias.
On average (after many samples), does the estimator tend toward the correct value?
More formally: Does the mean of estimator's distribution equal the parameter it estimates?
Biasβ(^β)=E[^β]−β
Answer one: Bias.
Unbiased estimator: E[^β]=β
Answer one: Bias.
Unbiased estimator: E[^β]=β
Biased estimator: E[^β]≠β
Answer two: Variance.
The central tendencies (means) of competing distributions are not the only things that matter. We also care about the variance of an estimator.
Var(^β)=E[(^β−E[^β])2]
Lower variance estimators mean we get estimates closer to the mean in each sample.
Answer two: Variance.
Answer one: Bias.
Answer two: Variance.
Subtlety: The bias-variance tradeoff.
Should we be willing to take a bit of bias to reduce the variance?
In econometrics, we generally stick with unbiased (or consistent) estimators. But other disciplines (especially computer science) think a bit more about this tradeoff.
As you might have guessed by now,
But... these (very nice) properties depend upon a set of assumptions:
The population relationship is linear in parameters with an additive disturbance.
Our X variable is exogenous, i.e., E[u|X]=0.
The X variable has variation. And if there are multiple explanatory variables, they are not perfectly collinear.
The population disturbances ui are independently and identically distributed as normal random variables with mean zero (E[u]=0) and variance σ2 (i.e., E[u2]=σ2). Independently distributed and mean zero jointly imply E[uiuj]=0 for any i≠j.
Different assumptions guarantee different properties:
We will discuss solutions to violations of these assumptions. See also our discussion in the book
For many applications, our most important assumption is exogeneity, i.e., E[u|X]=0 but what does it actually mean?
For many applications, our most important assumption is exogeneity, i.e., E[u|X]=0 but what does it actually mean?
One way to think about this definition:
For any value of X, the mean of the residuals must be zero.
E.g., E[u|X=1]=0 and E[u|X=100]=0
E.g., E[u|X2=Female]=0 and E[u|X2=Male]=0
Notice: E[u|X]=0 is more restrictive than E[u]=0
Graphically...
Valid exogeneity, i.e., E[u|X]=0
Invalid exogeneity, i.e., E[u|X]≠0
Who Am I
This Course
Recap 1 of topics from intro course
Quiz 1 (before next time)
Recap 2
Keyboard shortcuts
↑, ←, Pg Up, k | Go to previous slide |
↓, →, Pg Dn, Space, j | Go to next slide |
Home | Go to first slide |
End | Go to last slide |
Number + Return | Go to specific slide |
b / m / f | Toggle blackout / mirrored / fullscreen mode |
c | Clone slideshow |
p | Toggle presenter mode |
t | Restart the presentation timer |
?, h | Toggle this help |
Esc | Back to slideshow |