ScPoEconometrics: Advanced
Intro and Recap 1
Bluebery Planterose
SciencesPo Paris 
 2023-01-24
1 / 42

Welcome to ScPoEconometrics: Advanced!

Today

Who Am I
This Course
Recap 1 of topics from intro course

Next time

Quiz 1 (before next time)
Recap 2

2 / 42

Who Am I

I'm an PhD candidate at the Paris School of Economics. Check out my website!
I work on tax evasion, climate policies, and macro topics:
1. Acceptability of climate policies: who support/oppose climate policies and why?
2. Offshore real-estate in Dubai using leaked data: how large is it, who owns it, and what does it tell us about global offshore real-estate?
3. Excess Profit Tax: how to tax excess profits from energy firms that benefited from the war in Ukraine?

3 / 42

This Course

Prerequisites

This course is the follow-up to Introduction to Econometrics with R which is taught to 2nd years.
You are supposed to be familiar with all the econometrics material from the slides of that course and/or chapters 1-9 in our textbook.
We also assume you have basic R working knowledge at the level of the intro course!
- basic data.frame manipulation with dplyr
- simple linear models with lm
- basic plotting with ggplot2
- Quiz 1 will try and test for that 😉, so be on top of this chapter

4 / 42

This Course

Grading

There will be four quizzes on Moodle roughly every two weeks => 40%
There will be two take home exams / case studies => 60%
There will be no final exam 😅.

5 / 42

This Course

Grading

There will be four quizzes on Moodle roughly every two weeks => 40%
There will be two take home exams / case studies => 60%
There will be no final exam 😅.

Course Materials

Book chapter 10 onwards
The Slides
The interactive shiny apps
Quizzes on Moodle

5 / 42

Syllabus

Intro, Recap 1 (Quiz 1)
Recap 2 (Quiz 2)
Intro, Difference-in-Differences
Tools: Rmarkdown and data.table
Instrumental Variables 1 (Quiz 3)
Instrumental Variables 2 (Midterm exam)

7. Panel Data 1

8. Panel Data 2 (Quiz 4)

9. Discrete Outcomes

10. Intro to Machine Learning 1

11. Intro to Machine Learning 2

12. Recap / Buffer (Final Project)

6 / 42

Course Organization

7 / 42

Recap 1

Let's get cracking! 💪

8 / 42

Population vs. sample

Models and notation

We write our (simple) population model

$y_{i} = β_{0} + β_{1} x_{i} + u_{i}$

and our sample-based estimated regression model as

$y_{i} = {\hat{β}}_{0} + {\hat{β}}_{1} x_{i} + e_{i}$

An estimated regression model produces estimates for each observation:

${\hat{y}}_{i} = {\hat{β}}_{0} + {\hat{β}}_{1} x_{i}$

which gives us the best-fit line through our dataset.

(A lot of this set slides - in particular: pictures! - have been taken from Ed Rubin's outstanding material. Thanks Ed 🙏)

9 / 42

Task 1: Run Simple OLS (4 minutes)

Load data here. in dta format. (Hint: use haven::read_dta("filename") to read this format.)
Obtain common summary statistics for the variables classize, avgmath and avgverb. Hint: use the skimr package.
Estimate the linear model ${avgmath}_{i} = β_{0} + {classize}_{i} x_{i} + u_{i}$

10 / 42

Task 1: Solution

Load the data

grades = haven::read_dta(file ="https://www.dropbox.com/s/wwp2cs9f0dubmhr/grade5.dta?dl=1")

Describe the dataset:

library(dplyr)
grades %>% 
  select(classize,avgmath,avgverb) %>%
  skimr::skim()

Run OLS to estimate the relationship between class size and student achievement?
```
summary(lm(formula = avgmath ~ classize, data = grades))
```

11 / 42

Question: Why do we care about population vs. sample?

Population

12 / 42

Question: Why do we care about population vs. sample?

Population

Population relationship

$y_{i} = 2.53 + 0.57 x_{i} + u_{i}$

$y_{i} = β_{0} + β_{1} x_{i} + u_{i}$

12 / 42

Question: Why do we care about population vs. sample?

Sample 1: 30 random individuals

13 / 42

Question: Why do we care about population vs. sample?

Sample 1: 30 random individuals

Population relationship
$y_{i} = 2.53 + 0.57 x_{i} + u_{i}$

Sample relationship
${\hat{y}}_{i} = 2.36 + 0.61 x_{i}$

13 / 42

Question: Why do we care about population vs. sample?

Sample 2: 30 random individuals

Population relationship
$y_{i} = 2.53 + 0.57 x_{i} + u_{i}$

Sample relationship
${\hat{y}}_{i} = 2.79 + 0.56 x_{i}$

13 / 42

Question: Why do we care about population vs. sample?

Sample 3: 30 random individuals

Population relationship
$y_{i} = 2.53 + 0.57 x_{i} + u_{i}$

Sample relationship
${\hat{y}}_{i} = 3.21 + 0.45 x_{i}$

13 / 42

Let's repeat this 10,000 times.

(This exercise is called a (Monte Carlo) simulation.)

14 / 42

Population vs. sample

Question: Why do we care about population vs. sample?

On average, our regression lines match the population line very nicely.
However, individual lines (samples) can really miss the mark.
Differences between individual samples and the population lead to uncertainty for the econometrician.

15 / 42

Population vs. sample

Question: Why do we care about population vs. sample?

16 / 42

Population vs. sample

Question: Why do we care about population vs. sample?

Answer: Uncertainty matters.

Every random sample of data is different.
Our (OLS) estimators are computed from those samples of data.
If there is sampling variation, there is variation in our estimates.

16 / 42

Population vs. sample

Question: Why do we care about population vs. sample?

Answer: Uncertainty matters.

Every random sample of data is different.
Our (OLS) estimators are computed from those samples of data.
If there is sampling variation, there is variation in our estimates.

OLS inference depends on certain assumptions.
If violated, our estimates will be biased or imprecise.
Or both. 😧

16 / 42

Linear regression

The estimator

We can estimate a regression line in R (lm(y ~ x, my_data)) and stata (reg y x). But where do these estimates come from?

A few slides back:

${\hat{y}}_{i} = {\hat{β}}_{0} + {\hat{β}}_{1} x_{i}$ which gives us the best-fit line through our dataset.

But what do we mean by "best-fit line"?

17 / 42

Being the "best"

Question: What do we mean by best-fit line?

Answers:

In general (econometrics), best-fit line means the line that minimizes the sum of squared errors (SSE):

$SSE = \sum_{i = 1}^{n} e_{i}^{2}$ where $e_{i} = y_{i} - {\hat{y}}_{i}$

Ordinary least squares (OLS) minimizes the sum of the squared errors.
Based upon a set of (mostly palatable) assumptions, OLS
- Is unbiased (and consistent)
- Is the best (minimum variance) linear unbiased estimator (BLUE)

18 / 42

OLS vs. other lines/estimators

Let's consider the dataset we previously generated.

19 / 42

OLS vs. other lines/estimators

For any line $(\hat{y} = {\hat{β}}_{0} + {\hat{β}}_{1} x)$

19 / 42

OLS vs. other lines/estimators

For any line $(\hat{y} = {\hat{β}}_{0} + {\hat{β}}_{1} x)$ , we can calculate errors: $e_{i} = y_{i} - {\hat{y}}_{i}$

19 / 42

OLS vs. other lines/estimators

For any line $(\hat{y} = {\hat{β}}_{0} + {\hat{β}}_{1} x)$ , we can calculate errors: $e_{i} = y_{i} - {\hat{y}}_{i}$

19 / 42

OLS vs. other lines/estimators

For any line $(\hat{y} = {\hat{β}}_{0} + {\hat{β}}_{1} x)$ , we can calculate errors: $e_{i} = y_{i} - {\hat{y}}_{i}$

19 / 42

OLS vs. other lines/estimators

SSE squares the errors $(\sum e_{i}^{2})$ : bigger errors get bigger penalties.

19 / 42

OLS vs. other lines/estimators

The OLS estimate is the combination of ${\hat{β}}_{0}$ and ${\hat{β}}_{1}$ that minimize SSE.

19 / 42

ScPoApps::launchApp("reg_simple")

20 / 42

OLS

Formally

In simple linear regression, the OLS estimator comes from choosing the ${\hat{β}}_{0}$ and ${\hat{β}}_{1}$ that minimize the sum of squared errors (SSE), i.e.,

$min_{{\hat{β}}_{0}, {\hat{β}}_{1}} SSE$

21 / 42

OLS

Formally

In simple linear regression, the OLS estimator comes from choosing the ${\hat{β}}_{0}$ and ${\hat{β}}_{1}$ that minimize the sum of squared errors (SSE), i.e.,

$min_{{\hat{β}}_{0}, {\hat{β}}_{1}} SSE$

but we already know $SSE = \sum_{i} e_{i}^{2}$ . Now use the definitions of $e_{i}$ and $\hat{y}$ .

$\begin{aligned} e_{i}^{2} & = {(y_{i} - {\hat{y}}_{i})}^{2} = {(y_{i} - {\hat{β}}_{0} - {\hat{β}}_{1} x_{i})}^{2} \\ = y_{i}^{2} - 2 y_{i} {\hat{β}}_{0} - 2 y_{i} {\hat{β}}_{1} x_{i} + {\hat{β}}_{0}^{2} + 2 {\hat{β}}_{0} {\hat{β}}_{1} x_{i} + {\hat{β}}_{1}^{2} x_{i}^{2} \end{aligned}$

21 / 42

OLS

Formally

In simple linear regression, the OLS estimator comes from choosing the ${\hat{β}}_{0}$ and ${\hat{β}}_{1}$ that minimize the sum of squared errors (SSE), i.e.,

$min_{{\hat{β}}_{0}, {\hat{β}}_{1}} SSE$

but we already know $SSE = \sum_{i} e_{i}^{2}$ . Now use the definitions of $e_{i}$ and $\hat{y}$ .

Recall: Minimizing a multivariate function requires (1) first derivatives equal zero (the 1st-order conditions) and (2) second-order conditions (concavity).

21 / 42

OLS

Interactively

ScPoApps::launchApp("SSR_cone")

22 / 42

OLS

Interactively

We skipped the maths.

We now have the OLS estimators for the slope

${\hat{β}}_{1} = \frac{\sum_{i} (x_{i} - \bar{x}) (y_{i} - \bar{y})}{\sum_{i} (x_{i} - \bar{x})^{2}}$

and the intercept

${\hat{β}}_{0} = \bar{y} - {\hat{β}}_{1} \bar{x}$

Remember that those two formulae are amongst the very few ones from the intro course that you should know by heart! ❤️

23 / 42

OLS

Interactively

We skipped the maths.

We now have the OLS estimators for the slope

${\hat{β}}_{1} = \frac{\sum_{i} (x_{i} - \bar{x}) (y_{i} - \bar{y})}{\sum_{i} (x_{i} - \bar{x})^{2}}$

and the intercept

${\hat{β}}_{0} = \bar{y} - {\hat{β}}_{1} \bar{x}$

Remember that those two formulae are amongst the very few ones from the intro course that you should know by heart! ❤️

We now turn to the assumptions and (implied) properties of OLS.

23 / 42

OLS: Assumptions and properties

Question: What properties might we care about for an estimator?

24 / 42

OLS: Assumptions and properties

Question: What properties might we care about for an estimator?

Tangent: Let's review statistical properties first.

24 / 42

OLS: Assumptions and properties

Refresher: Density functions

Recall that we use probability density functions (PDFs) to describe the probability a continuous random variable takes on a range of values. (The total area = 1.)

These PDFs characterize probability distributions, and the most common/famous/popular distributions get names (e.g., normal, t, Gamma).

Here is the definition of a PDF $f_{X}$ for a continuous RV $X$ :

$Pr [a \leq X \leq b] \equiv \int_{a}^{b} f_{X} (x) d x$

25 / 42

OLS: Assumptions and properties

Refresher: Density functions

The probability a standard normal random variable takes on a value between -2 and 0: $P (- 2 \leq X \leq 0) = 0.48$

26 / 42

OLS: Assumptions and properties

Refresher: Density functions

The probability a standard normal random variable takes on a value between -1.96 and 1.96: $P (- 1.96 \leq X \leq 1.96) = 0.95$

27 / 42

OLS: Assumptions and properties

Refresher: Density functions

The probability a standard normal random variable takes on a value beyond 2: $P (X > 2) = 0.023$

28 / 42

OLS: Assumptions and properties

Imagine we are trying to estimate an unknown parameter $β$ , and we know the distributions of three competing estimators. Which one would we want? How would we decide?

29 / 42

OLS: Assumptions and properties

Question: What properties might we care about for an estimator?

30 / 42

OLS: Assumptions and properties

Question: What properties might we care about for an estimator?

Answer one: Bias.

On average (after many samples), does the estimator tend toward the correct value?

More formally: Does the mean of estimator's distribution equal the parameter it estimates?

$\underset{β}{Bias} (\hat{β}) = E [\hat{β}] - β$

30 / 42

OLS: Assumptions and properties

Answer one: Bias.

Unbiased estimator: $E [\hat{β}] = β$

31 / 42

OLS: Assumptions and properties

Answer one: Bias.

Unbiased estimator: $E [\hat{β}] = β$

Biased estimator: $E [\hat{β}] \neq β$

31 / 42

OLS: Assumptions and properties

Answer two: Variance.

The central tendencies (means) of competing distributions are not the only things that matter. We also care about the variance of an estimator.

$Var (\hat{β}) = E [{(\hat{β} - E [\hat{β}])}^{2}]$

Lower variance estimators mean we get estimates closer to the mean in each sample.

32 / 42

OLS: Assumptions and properties

Answer two: Variance.

32 / 42

OLS: Assumptions and properties

Answer one: Bias.

Answer two: Variance.

Subtlety: The bias-variance tradeoff.

Should we be willing to take a bit of bias to reduce the variance?

In econometrics, we generally stick with unbiased (or consistent) estimators. But other disciplines (especially computer science) think a bit more about this tradeoff.

33 / 42

The bias-variance tradeoff.

34 / 42

OLS: Assumptions and properties

Properties

As you might have guessed by now,

OLS is unbiased.
OLS has the minimum variance of all unbiased linear estimators.

35 / 42

OLS: Assumptions and properties

Properties

But... these (very nice) properties depend upon a set of assumptions:

The population relationship is linear in parameters with an additive disturbance.
Our $X$ variable is exogenous, i.e., $E [u | X] = 0$ .
The $X$ variable has variation. And if there are multiple explanatory variables, they are not perfectly collinear.
The population disturbances $u_{i}$ are independently and identically distributed as normal random variables with mean zero $(E [u] = 0)$ and variance $σ^{2}$ (i.e., $E [u^{2}] = σ^{2}$ ). Independently distributed and mean zero jointly imply $E [u_{i} u_{j}] = 0$ for any $i \neq j$ .

36 / 42

OLS: Assumptions and properties

Assumptions

Different assumptions guarantee different properties:

Assumptions (1), (2), and (3) make OLS unbiased.
Assumption (4) gives us an unbiased estimator for the variance of our OLS estimator.

We will discuss solutions to violations of these assumptions. See also our discussion in the book

Non-linear relationships in our parameters/disturbances (or misspecification).
Disturbances that are not identically distributed and/or not independent.
Violations of exogeneity (especially omitted-variable bias).

37 / 42

OLS: Assumptions and properties

Conditional expectation

For many applications, our most important assumption is exogeneity, i.e., $\begin{aligned} E [u | X] = 0 \end{aligned}$ but what does it actually mean?

38 / 42

OLS: Assumptions and properties

Conditional expectation

For many applications, our most important assumption is exogeneity, i.e., $\begin{aligned} E [u | X] = 0 \end{aligned}$ but what does it actually mean?

One way to think about this definition:

For any value of $X$ , the mean of the residuals must be zero.

E.g., $E [u | X = 1] = 0$ and $E [u | X = 100] = 0$
E.g., $E [u | X_{2} = Female] = 0$ and $E [u | X_{2} = Male] = 0$
Notice: $E [u | X] = 0$ is more restrictive than $E [u] = 0$

38 / 42

Graphically...

39 / 42

Valid exogeneity, i.e., $E [u | X] = 0$

40 / 42

Invalid exogeneity, i.e., $E [u | X] \neq 0$

41 / 42

END


	bluebery.planterose@sciencespo.fr
	Original Slides from Florian Oswald
	Book
	@ScPoEcon
	@ScPoEcon

↑, ←, Pg Up, k	Go to previous slide
↓, →, Pg Dn, Space, j	Go to next slide
Home	Go to first slide
End	Go to last slide
Number + Return	Go to specific slide
b / m / f	Toggle blackout / mirrored / fullscreen mode
c	Clone slideshow
p	Toggle presenter mode
t	Restart the presentation timer
?, h	Toggle this help

ScPoEconometrics: Advanced

Intro and Recap 1

Bluebery Planterose

SciencesPo Paris 2023-01-24

Welcome to ScPoEconometrics: Advanced!

Today

Next time

Who Am I

This Course

Prerequisites

This Course

Grading

This Course

Grading

Course Materials

Syllabus

Course Organization

Recap 1

Population vs. sample

Models and notation

Task 1: Run Simple OLS (4 minutes)

Task 1: Solution

Question: Why do we care about population vs. sample?

Question: Why do we care about population vs. sample?

Question: Why do we care about population vs. sample?

Question: Why do we care about population vs. sample?

Question: Why do we care about population vs. sample?

Question: Why do we care about population vs. sample?

Population vs. sample

Population vs. sample

Population vs. sample

Population vs. sample

Linear regression

The estimator

Being the "best"

OLS vs. other lines/estimators

OLS vs. other lines/estimators

OLS vs. other lines/estimators

OLS vs. other lines/estimators

OLS vs. other lines/estimators

OLS vs. other lines/estimators

OLS vs. other lines/estimators

OLS

Formally

OLS

Formally

OLS

Formally

OLS

Interactively

OLS

Interactively

OLS

Interactively

OLS: Assumptions and properties

OLS: Assumptions and properties

OLS: Assumptions and properties

OLS: Assumptions and properties

OLS: Assumptions and properties

OLS: Assumptions and properties

OLS: Assumptions and properties

OLS: Assumptions and properties

OLS: Assumptions and properties

OLS: Assumptions and properties

OLS: Assumptions and properties

OLS: Assumptions and properties

OLS: Assumptions and properties

OLS: Assumptions and properties

The bias-variance tradeoff.

OLS: Assumptions and properties

Properties

OLS: Assumptions and properties

Properties

OLS: Assumptions and properties

Assumptions

OLS: Assumptions and properties

Conditional expectation

OLS: Assumptions and properties

Conditional expectation

END

SciencesPo Paris
2023-01-24