01.06.2024 Applications#
Rescaling Variables#
General: Scaling does not change important things:
R-Squared = no change
Significacne = no change
Residuals and SSR = change
SE and CI = change (divide by scaler)
OLS Estimates change (but after transformation in same units do not change)
Example: Salary equation with thousands of dollars instead of dollars
Why do this? make numbers easier interpretable
Functional Form Specifications#
Example: Housing prices and Nitrogen Oxide Emissions (nox)
\(\beta_1\) = elasticity price / nox (1% nox \(\uparrow\) => 0.718% \(\downarrow\) price)
\(\beta_2\) = semi-elasticity of price / rooms
multiply by 100 = approximate effect
\(\Delta room=1\) => 30% price \(\uparrow\)
Note: the approximation is not very exact for larger percentages!
Alternative: \(100 [exp(\beta_2)-1]\) (for values > 0.2)
Percentages and Percentage Points!#
\(\ln wage = \beta_1 - 0.05 \ unempl. rate\)
\(\ln wage = \beta_1 - 0.05 \ln (unempl. rate)\)
unempl. rate increase by one percentage point (8->9) => \(\uparrow\) wages by 5%
unempl. rate increasy by one percent (8.00-8.08) => \(\downarrow\) wages by 0.05%
used to capture marginal effects
find marginal effect: \(\frac{ \delta \ln wage }{\delta exper} = 0.298 - 2*0.0061* exper\)
Result: marginal diminishing effect of experience (negative sign before quadratic)
1st year = \(.298 - 2\cdot (0.0061)\cdot (0) = .298\) (no experience in first year of job)
.298 cents per hour increase
2nd year = \(.298 - 2\cdot (0.0061)\cdot (1) = .286\)
10 to 11th year = \(.298 - 2\cdot (0.0061)\cdot (10) = .176\)
Maximum of the wage: \(\frac{ -\beta_1 }{2 \beta_2} = 24.4\)
Interaction Terms#
Partial Effect of bdrms on price
\(\frac{ \delta price }{\delta bdrms} = \beta_2 + \beta_3 sqrft\)
interaction effect: if \(\beta_3 > 0\) additional bedroom => higher increase in housing price for larger house
Goodness of Fit#
Problem with R-squared: adding a variable only increases it
=> adjusted R-squared (has penalty for additiona variables)
How to go from R-squared to adjusted?
adj. R-squared can be negative!
Selection of Regressors#
Example of choosing between different models
Second Model:
better R-squarer
better adjusted R^2 (even though it has one independent variable more)
never compare two models with different specifications for dependent variable
Overcontrolling Example: Beer Tax#
Idea: Beer Tax => lower Beer Consumption => lower traffic deaths
\(fatalities = \beta_0 + \beta_1 tax + \beta_2 miles + u\)
why not include beer consumption?
is this omitted variable bias?
NO, because holding consumption fixed is not our interest!
we want to follow our idea, and therefore include tax
Confidence Intervals for \(\theta_0 = \hat{ \theta_0 }\pm 2 \cdot se(\hat{ \theta_0 })\)
variance of prediction = smallest at mean values of \(x_j\)
How to predict \(y\) if formula is only \(\ln y\), e.g \(\ln salary = \beta_0 + \beta_1 x_k + ... + u\)
For given \(x_k = 5000\) => \(\ln salary = 7.013\)
Naive: \(y = e^{\ln y} = exp(7.013) = 1110.983\) (Underestimates the result!)
\(\hat{ y } = \hat{ \alpha_0 } \cdot exp(\ln y)\)
calculate \(\hat{ \alpha_0 } = n^{-1} \sum_{i=1}^n exp(\hat{ u_i })\)
Result: \(y = 1.136 \cdot exp(7.013) = 1262.076\)
fucking complicated, create a new regression just for it
Result: \(1117 \cdot exp(7.013) = 1240.967\)
normality assumption
\(y = \exp \frac{RSS^2 }{2} \exp (\ln y) =\exp \frac{0.50477^2 }{2} \exp (7.013)= 1261.929 \)