simulation code done

This commit is contained in:
Hendrik Tillemans 2024-12-30 19:59:18 +01:00
parent f6c77af6e4
commit 7ccb2ea4e6
6 changed files with 99 additions and 18 deletions

BIN
data/assignment2025.dta Normal file

Binary file not shown.

BIN
figures/question_1_1.png Normal file

Binary file not shown.

After

Width:  |  Height:  |  Size: 78 KiB

BIN
figures/question_1_4.png Normal file

Binary file not shown.

After

Width:  |  Height:  |  Size: 88 KiB

BIN
figures/question_1_6.png Normal file

Binary file not shown.

After

Width:  |  Height:  |  Size: 18 KiB

View file

@ -112,9 +112,7 @@ Hendrik Marcel W Tillemans\\
\section{Simulation Study} \section{Simulation Study}
\subsection{Question 1.2} \subsection{Question 1.1}
Are the estimates of $\beta_0$, $\beta_1$ and $\beta_2$ close to their true values? Why (not)?
We investigate a linear model with noise We investigate a linear model with noise
@ -122,9 +120,9 @@ We investigate a linear model with noise
where where
\[x1 \sim \mathcal{N}(3,\,6)\] \[x1 \sim \mathcal{N}(3,\,36)\]
\[x2 \sim \mathcal{N}(3,\,6)\] \[x2 \sim \mathcal{N}(2,\,25)\]
\[u \sim \mathcal{N}(0,\,3)\] \[u \sim \mathcal{N}(0,\,9)\]
In figure \ref{fig::plot_1_1} we have a 3D representation of the generated model. In figure \ref{fig::plot_1_1} we have a 3D representation of the generated model.
@ -135,59 +133,130 @@ In figure \ref{fig::plot_1_1} we have a 3D representation of the generated model
\end{figure} \end{figure}
\subsection{Question 1.2}
\subsection{1.2: Linear Fit on Generated Data} We now estimate the parameters of $\beta_0$, $\beta_1$ and $\beta_2$ using the \textbf{Ordinary Least Squares} (OLS) method. With model:
\[y_i=\beta_0 + \beta_1 x_1 + \beta_2 x2 + u_i\]
\begin{table}[h] \begin{table}[h]
\centering
\input{table_1_2} \input{table_1_2}
\caption{Linear Fit on Generated Data} \caption{Linear Fit on Generated Data}
\label{tab::table_1_2} \label{tab::table_1_2}
\end{table} \end{table}
In table 1 we can see that those estimates are are close to their true values within 2\%. Because our estimation model is the same as the model used to generate the data, we have a sufficient number of points and the assumptions of OLS are satisfied. In this situation we can expect good results of OLS estimation.
\subsection{Question 1.3} \subsection{Question 1.3}
Compare your estimates with those of question 1.2. Wich model do you choose? Discuss in terms of $\beta_1$ and model prediction. If we compare the estimates with those of question 1.2. We see that the estimate of the intersect is not close to the true value with a difference of 4.
We can explain the bias of $\beta_0$ because this new model has a new error term: $\beta_2x_{2i} + u_i$. This error term no longer has an expacted value of 0, but in fact $\beta_2E(x_{2i}) + E(u_i)= 4$ wich is very close to the bias we find. For $\beta_1$, there is little to no bias. This can we explained because $x_2$ and $u$ are stochasticaly independent from $x_1$. The standard error is bigger because $\beta_2E(x_{2i}) + E(u_i)$ has a bigger variance than just $u$.
Wich model would you choose? If I have sufficient calculation power, I would choose model 1.2 as it is much more accurate. However for a very resource constraint situation model 1.3 might give acceptable estimates.
\begin{table}[h] \begin{table}[h]
\centering
\input{table_1_3} \input{table_1_3}
\caption{Linear Fit with 1 Variable} \caption{Linear Fit with 1 Variable}
\label{tab::table_1_3} \label{tab::table_1_3}
\end{table} \end{table}
\subsection{Question 1.4} \subsection{Question 1.4}
Do the results confirm what you would have expected to change in your estimation results compared to the results in question 1.2? Why (not)? How about the standard errors of the estimates of $\beta_1$ and $\beta_2$?
In figure \ref{fig::plot_1_4} we have a 3D representation of the generated model.
\begin{figure}[h]
\includegraphics[width=0.6\paperwidth]{../figures/question_1_4}
\caption{Generated points for Question 1.4.}
\label{fig::plot_1_4}
\end{figure}
The estimation results compared to the results in question 1.2 are similar, there is very little bias. It appears that $x_2^{new}$ is sufficiently independent from $x_1$. We expected very little bias because $x2_{new}$ has a large independent part compared to $x_1$. The standard errors of the estimates of $\beta_1$ and $\beta_2$ are about 25\% higher wich can be explained partly bij the lower standard deviation in $x_2^{new}$.
\begin{table}[h] \begin{table}[h]
\centering
\input{table_1_4} \input{table_1_4}
\caption{New Linear Fit on Generated Data} \caption{New Linear Fit on Generated Data}
\label{tab::table_1_4} \label{tab::table_1_4}
\end{table} \end{table}
\subsection{Question 1.5} \subsection{Question 1.5}
Are the OLS estimators for the slope coefficients biased? Why (not)?
Similar as in question 1.3 we estimated the parameter with a single independent variable.
\begin{table}[h] \begin{table}[h]
\centering
\input{table_1_5} \input{table_1_5}
\caption{Linear Fit with 1 Variable} \caption{Linear Fit with 1 Variable}
\label{tab::table_1_5} \label{tab::table_1_5}
\end{table} \end{table}
The OLS estimators for the slope coefficients are biased. We see that $\beta_1$ is $-3$ instead of the true value of $-4$. We can explain this bias in the following way, lets start from the model.
\[y^{new}=\beta_0 + \beta_1 x_1 + \beta_2 x_2^{new} + u_i\]
We now have:
\[x_2^{new} = 0.5 * x_1 + x_2^{'}\]
Where:
\[x_2^{'}\sim \mathcal{N}(5,\,16)\]
Substituting in the model:
\[\Longrightarrow y^{new}=\beta_0 + \beta_1 x_1 + \beta_2(0.5 * x_1 + x_2^{'}) + u_i\]
Lets fill in the betas with the actual values:
\[\Longrightarrow y^{new}= 3 + -4 x_1 + 2(0.5 * x_1 + x_2^{'}) + u_i\]
\[\Leftrightarrow y^{new}= 3 - 4 x_1 + x_1 + 2x_2^{'}) + u_i\]
\[\Leftrightarrow [y^{new}= 3 - 3 x_1 + 2x_2^{'}) + u_i\]
Here we can see in table \ref{tab::table_1_5} easily that the OLS estimator will find -3 as the estimate for $\beta_1$.
Similarly as in question 1.3 we can explain the bias on the intercept.
\subsection{Question 1.6} \subsection{Question 1.6}
Do the results confirm what you would have
expected to change in your estimation results compared to the results in question 1.2? Now we replace $x_1$ in the original model with
Why (not)? How about the standard errors of the estimates of $\beta_1$ ? Use the formula \[x_1 \sim \mathcal{N}(3,\,1)\]
Var$\beta_1$ to motivate your answer. What would happen if the standard deviation of x1
is equal to 0 instead of equals 1? Discuss in terms of the assumptions of the Multiple If we now estimate the parameters we find:
Linear Regression mode.
\begin{table}[h] \begin{table}[h]
\centering
\input{table_1_6} \input{table_1_6}
\caption{Generate Data with Small Variance on x1} \caption{Generate Data with Small Variance on x1}
\label{tab::table_1_6} \label{tab::table_1_6}
\end{table} \end{table}
\begin{figure}[hb] We find in table \ref{tab::table_1_6} that the parameters are essentially unbiased but have a bigger standard error for the intersect and $\beta_1$. The standard error of $\beta_1$ is 6 times bigger (from 0.016 to 0.10). We see no difference of the estimates $\beta_2$. Because nothing has changed in $x_2$.
\begin{figure}[h]
\includegraphics[width=0.6\paperwidth]{../figures/question_1_6} \includegraphics[width=0.6\paperwidth]{../figures/question_1_6}
\caption{Generated points for Question 1.6.} \caption{Generated points for Question 1.6.}
\label{fig::plot_1_6} \label{fig::plot_1_6}
\end{figure} \end{figure}
We expected a similar estimation result as in 1.2 because there are no changes except of the standard deviation of $x_1$. This means that the OLS assumptions are equally valid and we expect unbiased estimates.
We can explain the difference in standard error of the estimates of $\beta_1$ using the formula of $Var(\beta_1)$.
\[Var(\beta_1) = \sigma^2(X^tX)_{11}^{-1}\]
We can write this as
\[Var(\beta_1) = \sigma^2/Var(x_1)\]
This means that $Var(\(beta_1) \sim 1/Var(x_1)$.
Because $Var(x_1)$ changed from 36 in to 1, we expect the standard error to be $/sqrd(36) = 6$ times bigger. Which is exactly what we found.
If the standard deviation from $x_1$ changes to 0, $\beta_1$ cannot we calculated. As we have seen with the no multicollinearity assumption.
\section{examples} \section{examples}
Some greek letters: Some greek letters:

View file

@ -165,6 +165,18 @@ m = sm.OLS(y_new, X)
results = m.fit() results = m.fit()
results_to_latex_table_file('table_1_4.tex', results, beta) results_to_latex_table_file('table_1_4.tex', results, beta)
# plot the resulting data
fig = plt.figure()
ax = fig.add_subplot(projection='3d')
ax.scatter(x1, x2_new, y_new, marker='o')
ax.set_xlabel('x1')
ax.set_ylabel('x2_new')
ax.set_zlabel('y_new')
plt.savefig(FIGURE_DIR + "question_1_4.png")
plt.show()
# ----------------------------------------------------------------------------- # -----------------------------------------------------------------------------
# 1.5 # 1.5
# ----------------------------------------------------------------------------- # -----------------------------------------------------------------------------