2024-12-30 00:35:42 +01:00
\documentclass [12pt] { article}
\usepackage { natbib}
\usepackage { url}
\usepackage [utf8x] { inputenc}
\usepackage { mathtools} %
\usepackage { graphicx}
\usepackage { parskip}
\usepackage { xcolor} %
\usepackage { fancyhdr}
\usepackage { vmargin}
\usepackage { booktabs} %
\usepackage { sectsty} % for coloring sections
\setmarginsrb { 3 cm} { 2.5 cm} { 3 cm} { 2.5 cm} { 1 cm} { 1.5 cm} { 1 cm} { 1.5 cm}
% define your own custom colors
% If you want to change the colors you would need to update the RGB code in the
% last brackets. Better not change the name of the color as it is used elsewhere
\definecolor { report_ main} { HTML} { 200045}
\definecolor { report_ second} { HTML} { F39912}
\definecolor { report_ third} { HTML} { 8B0010}
\title { \color { report_ main} { Assignment Econometrics 2024} } % Title
\author { Hendrik Marcel W Tillemans} % Author
\date { \today } % Date
\makeatletter
\let \thetitle \@ title
\let \theauthor \@ author
\let \thedate \@ date
\makeatother
\pagestyle { fancy}
\fancyhf { }
\rhead { \theauthor } % header on the right
\lhead { \thetitle } % header on the left
\cfoot { \thepage } % footer in the center
\sectionfont { \color { report_ main} }
\subsectionfont { \color { report_ third} }
2024-12-30 15:44:38 +01:00
%% Add pagebreak before each section
\let \oldsection \section
\renewcommand \section { \clearpage \oldsection }
2024-12-30 00:35:42 +01:00
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
% This is where the actual document starts
%
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
\begin { document}
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
% This section details the group information
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
\begin { titlepage}
\centering
\vspace * { 0.5 cm}
\includegraphics [scale = 0.95] { ../figures/vub.png} \\ [1.0 cm] % University Logo
\textsc { \LARGE \newline \newline Free University Brussels} \\ [2.0 cm] % University Name
\textsc { \Large \color { report_ main} { Class: Econometrics} } \\ [0.5 cm] % Course Code
\rule { \linewidth } { 0.2 mm} \\ [0.4 cm]
{ \huge \bfseries \thetitle } \\
\rule { \linewidth } { 0.2 mm} \\ [1.5 cm]
\begin { minipage} { 0.5\textwidth }
\begin { flushleft} \large
\emph { Professor:} \\
Jeroen Kerkhof\\
Faculty of Economic Sciences\\
\end { flushleft}
\end { minipage} ~
\begin { minipage} { 0.4\textwidth }
\begin { flushright} \large
\emph { Group:} \\
Hendrik Marcel W Tillemans\\
\end { flushright}
\end { minipage} \\ [2 cm]
% takes the current date
\thedate
\end { titlepage}
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
% This details the inclusion (or not) of the table of contents
% and list of figures and tables.
% You can add/remove page breaks as you seem fit.
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
\tableofcontents
\pagebreak
\listoffigures
\listoftables
\pagebreak
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
% This is the start of the actual document content
% You can just write text in here as you would in any other word processor.
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
2024-12-30 15:44:38 +01:00
\section { Simulation Study}
2024-12-30 00:35:42 +01:00
2024-12-30 19:59:18 +01:00
\subsection { Question 1.1}
2024-12-30 16:19:30 +01:00
2024-12-30 15:44:38 +01:00
We investigate a linear model with noise
2024-12-30 00:35:42 +01:00
2024-12-30 15:44:38 +01:00
\[ y = \beta _ 0 + \beta _ 1 x 1 + \beta _ 2 x 2 + u \]
2024-12-30 00:35:42 +01:00
2024-12-30 15:44:38 +01:00
where
2024-12-30 00:35:42 +01:00
2024-12-30 19:59:18 +01:00
\[ x 1 \sim \mathcal { N } ( 3 , \, 36 ) \]
\[ x 2 \sim \mathcal { N } ( 2 , \, 25 ) \]
\[ u \sim \mathcal { N } ( 0 , \, 9 ) \]
2024-12-30 00:35:42 +01:00
2024-12-30 15:44:38 +01:00
In figure \ref { fig::plot_ 1_ 1} we have a 3D representation of the generated model.
\begin { figure} [hb]
\includegraphics [width=0.6\paperwidth] { ../figures/question_ 1_ 1}
\caption { Generated points for Question 1.1.}
\label { fig::plot_ 1_ 1}
\end { figure}
2024-12-30 00:35:42 +01:00
2024-12-30 19:59:18 +01:00
\subsection { Question 1.2}
2024-12-30 00:35:42 +01:00
2024-12-30 19:59:18 +01:00
We now estimate the parameters of $ \beta _ 0 $ , $ \beta _ 1 $ and $ \beta _ 2 $ using the \textbf { Ordinary Least Squares} (OLS) method. With model:
\[ y _ i = \beta _ 0 + \beta _ 1 x _ 1 + \beta _ 2 x 2 + u _ i \]
2024-12-30 21:01:10 +01:00
\begin { table} [ht]
2024-12-30 19:59:18 +01:00
\centering
2024-12-30 00:35:42 +01:00
\input { table_ 1_ 2}
\caption { Linear Fit on Generated Data}
\label { tab::table_ 1_ 2}
\end { table}
2024-12-30 16:13:24 +01:00
2024-12-30 19:59:18 +01:00
In table 1 we can see that those estimates are are close to their true values within 2\% . Because our estimation model is the same as the model used to generate the data, we have a sufficient number of points and the assumptions of OLS are satisfied. In this situation we can expect good results of OLS estimation.
2024-12-30 16:13:24 +01:00
\subsection { Question 1.3}
2024-12-30 19:59:18 +01:00
If we compare the estimates with those of question 1.2. We see that the estimate of the intersect is not close to the true value with a difference of 4.
We can explain the bias of $ \beta _ 0 $ because this new model has a new error term: $ \beta _ 2 x _ { 2 i } + u _ i $ . This error term no longer has an expacted value of 0, but in fact $ \beta _ 2 E ( x _ { 2 i } ) + E ( u _ i ) = 4 $ wich is very close to the bias we find. For $ \beta _ 1 $ , there is little to no bias. This can we explained because $ x _ 2 $ and $ u $ are stochasticaly independent from $ x _ 1 $ . The standard error is bigger because $ \beta _ 2 E ( x _ { 2 i } ) + E ( u _ i ) $ has a bigger variance than just $ u $ .
Wich model would you choose? If I have sufficient calculation power, I would choose model 1.2 as it is much more accurate. However for a very resource constraint situation model 1.3 might give acceptable estimates.
2024-12-30 00:35:42 +01:00
2024-12-30 21:01:10 +01:00
\begin { table} [ht]
2024-12-30 19:59:18 +01:00
\centering
2024-12-30 00:35:42 +01:00
\input { table_ 1_ 3}
\caption { Linear Fit with 1 Variable}
\label { tab::table_ 1_ 3}
\end { table}
2024-12-30 16:13:24 +01:00
\subsection { Question 1.4}
2024-12-30 19:59:18 +01:00
In figure \ref { fig::plot_ 1_ 4} we have a 3D representation of the generated model.
2024-12-30 21:01:10 +01:00
\begin { figure} [ht]
2024-12-30 19:59:18 +01:00
\includegraphics [width=0.6\paperwidth] { ../figures/question_ 1_ 4}
\caption { Generated points for Question 1.4.}
\label { fig::plot_ 1_ 4}
\end { figure}
The estimation results compared to the results in question 1.2 are similar, there is very little bias. It appears that $ x _ 2 ^ { new } $ is sufficiently independent from $ x _ 1 $ . We expected very little bias because $ x 2 _ { new } $ has a large independent part compared to $ x _ 1 $ . The standard errors of the estimates of $ \beta _ 1 $ and $ \beta _ 2 $ are about 25\% higher wich can be explained partly bij the lower standard deviation in $ x _ 2 ^ { new } $ .
2024-12-30 21:01:10 +01:00
\begin { table} [ht]
2024-12-30 19:59:18 +01:00
\centering
2024-12-30 00:35:42 +01:00
\input { table_ 1_ 4}
\caption { New Linear Fit on Generated Data}
\label { tab::table_ 1_ 4}
\end { table}
2024-12-30 19:59:18 +01:00
2024-12-30 16:13:24 +01:00
\subsection { Question 1.5}
2024-12-30 19:59:18 +01:00
Similar as in question 1.3 we estimated the parameter with a single independent variable.
2024-12-30 21:01:10 +01:00
\begin { table} [ht]
2024-12-30 19:59:18 +01:00
\centering
2024-12-30 00:35:42 +01:00
\input { table_ 1_ 5}
\caption { Linear Fit with 1 Variable}
\label { tab::table_ 1_ 5}
\end { table}
2024-12-30 19:59:18 +01:00
The OLS estimators for the slope coefficients are biased. We see that $ \beta _ 1 $ is $ - 3 $ instead of the true value of $ - 4 $ . We can explain this bias in the following way, lets start from the model.
\[ y ^ { new } = \beta _ 0 + \beta _ 1 x _ 1 + \beta _ 2 x _ 2 ^ { new } + u _ i \]
We now have:
\[ x _ 2 ^ { new } = 0 . 5 * x _ 1 + x _ 2 ^ { ' } \]
Where:
\[ x _ 2 ^ { ' } \sim \mathcal { N } ( 5 , \, 16 ) \]
Substituting in the model:
\[ \Longrightarrow y ^ { new } = \beta _ 0 + \beta _ 1 x _ 1 + \beta _ 2 ( 0 . 5 * x _ 1 + x _ 2 ^ { ' } ) + u _ i \]
Lets fill in the betas with the actual values:
\[ \Longrightarrow y ^ { new } = 3 + - 4 x _ 1 + 2 ( 0 . 5 * x _ 1 + x _ 2 ^ { ' } ) + u _ i \]
\[ \Leftrightarrow y ^ { new } = 3 - 4 x _ 1 + x _ 1 + 2 x _ 2 ^ { ' } ) + u _ i \]
\[ \Leftrightarrow [ y ^ { new } = 3 - 3 x _ 1 + 2 x _ 2 ^ { ' } ) + u _ i \]
Here we can see in table \ref { tab::table_ 1_ 5} easily that the OLS estimator will find -3 as the estimate for $ \beta _ 1 $ .
Similarly as in question 1.3 we can explain the bias on the intercept.
2024-12-30 16:13:24 +01:00
\subsection { Question 1.6}
2024-12-30 19:59:18 +01:00
Now we replace $ x _ 1 $ in the original model with
\[ x _ 1 \sim \mathcal { N } ( 3 , \, 1 ) \]
If we now estimate the parameters we find:
2024-12-30 21:01:10 +01:00
\begin { table} [ht]
2024-12-30 19:59:18 +01:00
\centering
2024-12-30 00:35:42 +01:00
\input { table_ 1_ 6}
\caption { Generate Data with Small Variance on x1}
\label { tab::table_ 1_ 6}
\end { table}
2024-12-30 19:59:18 +01:00
We find in table \ref { tab::table_ 1_ 6} that the parameters are essentially unbiased but have a bigger standard error for the intersect and $ \beta _ 1 $ . The standard error of $ \beta _ 1 $ is 6 times bigger (from 0.016 to 0.10). We see no difference of the estimates $ \beta _ 2 $ . Because nothing has changed in $ x _ 2 $ .
2024-12-30 21:01:10 +01:00
\begin { figure} [ht]
2024-12-30 15:44:38 +01:00
\includegraphics [width=0.6\paperwidth] { ../figures/question_ 1_ 6}
\caption { Generated points for Question 1.6.}
\label { fig::plot_ 1_ 6}
\end { figure}
2024-12-30 19:59:18 +01:00
We expected a similar estimation result as in 1.2 because there are no changes except of the standard deviation of $ x _ 1 $ . This means that the OLS assumptions are equally valid and we expect unbiased estimates.
We can explain the difference in standard error of the estimates of $ \beta _ 1 $ using the formula of $ Var ( \beta _ 1 ) $ .
\[ Var ( \beta _ 1 ) = \sigma ^ 2 ( X ^ tX ) _ { 11 } ^ { - 1 } \]
We can write this as
\[ Var ( \beta _ 1 ) = \sigma ^ 2 / Var ( x _ 1 ) \]
2024-12-30 21:01:10 +01:00
This means that $ Var ( \beta _ 1 ) \sim 1 / Var ( x _ 1 ) $ .
2024-12-30 19:59:18 +01:00
Because $ Var ( x _ 1 ) $ changed from 36 in to 1, we expect the standard error to be $ / sqrd ( 36 ) = 6 $ times bigger. Which is exactly what we found.
If the standard deviation from $ x _ 1 $ changes to 0, $ \beta _ 1 $ cannot we calculated. As we have seen with the no multicollinearity assumption.
2024-12-30 00:35:42 +01:00
\section { examples}
Some greek letters:
$ \alpha $
$ \beta $
$ \gamma $
$ \theta $
$ \varepsilon $
$ \pi $
$ \lambda $
$ \tau $
$ x = x + 27 $
x=x+27
$ A \Longrightarrow B $
$ \underbrace { abs } _ { test } $
sub and superscript
$ \beta _ 0 $
$ \sum _ { i = 1 } ^ { n } i $
In an equation:
\begin { equation}
\sum _ { j=1} { n} j^ 2 \beta
\end { equation}
Equation without number
\begin { equation*}
A \Rightarrow B
\end { equation*}
\section { Empirical Investigation}
2024-12-30 21:01:10 +01:00
\subsection { Question 2.1}
2024-12-30 00:35:42 +01:00
2024-12-30 22:05:03 +01:00
We retain 2510 observations.
2024-12-30 00:35:42 +01:00
2024-12-30 21:01:10 +01:00
\begin { table} [ht]
\centering
\input { summary_ stats}
\caption { Generate Data with Small Variance on x1}
2024-12-30 22:05:03 +01:00
\label { tab::summary_ stats}
2024-12-30 00:35:42 +01:00
\end { table}
2024-12-30 21:01:10 +01:00
2024-12-30 22:05:03 +01:00
\subsection { Question 2.2}
2024-12-30 00:35:42 +01:00
\begin { figure}
2024-12-30 22:05:03 +01:00
\includegraphics [width=0.6\paperwidth] { ../figures/question_ 2_ 2_ wage}
\caption { Histogram wage}
\label { fig::question_ 2_ 2_ wage}
2024-12-30 00:35:42 +01:00
\end { figure}
2024-12-30 22:05:03 +01:00
\begin { figure}
\includegraphics [width=0.6\paperwidth] { ../figures/question_ 2_ 2_ lwage}
\caption { Histogram lwage}
\label { fig::question_ 2_ 2_ lwage}
\end { figure}
2024-12-30 00:35:42 +01:00
2024-12-30 22:05:03 +01:00
The lwage histogram in fig \ref { fig::question_ 2_ 2_ lwage} is nicely centered so there is no need to remove any outliners. This is also close to a normal distribution. The wage historgam in fig \ref { fig::question_ 2_ 2_ wage} is not symmetrical but is leaning to the left. Clealy not normal distributed.
2024-12-30 00:35:42 +01:00
2024-12-30 22:05:03 +01:00
\subsection { Question 2.3}
2024-12-30 00:35:42 +01:00
2024-12-30 22:05:03 +01:00
\begin { table} [ht]
\centering
\input { table_ 2_ 3}
\caption { Correlation matrix}
\label { tab::table_ 2_ 3}
2024-12-30 00:35:42 +01:00
\end { table}
2024-12-30 22:05:03 +01:00
We can see that there is a positive correlation between wage and school. It means that people who go longer to school will get a higher wage. There is a negative correlation between age and school. The younger generation is higher educated than older generation. Chinese citizens are better payed than malay, indian citizens have a negative correlation with wage.
2024-12-30 00:35:42 +01:00
2024-12-30 22:05:03 +01:00
\subsection { Question 2.4}
2024-12-30 00:35:42 +01:00
\end { document}