question 2.1 issue
This commit is contained in:
parent
7ccb2ea4e6
commit
9d8f09b035
2 changed files with 30 additions and 42 deletions
|
@ -138,7 +138,7 @@ In figure \ref{fig::plot_1_1} we have a 3D representation of the generated model
|
|||
We now estimate the parameters of $\beta_0$, $\beta_1$ and $\beta_2$ using the \textbf{Ordinary Least Squares} (OLS) method. With model:
|
||||
\[y_i=\beta_0 + \beta_1 x_1 + \beta_2 x2 + u_i\]
|
||||
|
||||
\begin{table}[h]
|
||||
\begin{table}[ht]
|
||||
\centering
|
||||
\input{table_1_2}
|
||||
\caption{Linear Fit on Generated Data}
|
||||
|
@ -154,7 +154,7 @@ We can explain the bias of $\beta_0$ because this new model has a new error term
|
|||
Wich model would you choose? If I have sufficient calculation power, I would choose model 1.2 as it is much more accurate. However for a very resource constraint situation model 1.3 might give acceptable estimates.
|
||||
|
||||
|
||||
\begin{table}[h]
|
||||
\begin{table}[ht]
|
||||
\centering
|
||||
\input{table_1_3}
|
||||
\caption{Linear Fit with 1 Variable}
|
||||
|
@ -165,7 +165,7 @@ Wich model would you choose? If I have sufficient calculation power, I would cho
|
|||
|
||||
In figure \ref{fig::plot_1_4} we have a 3D representation of the generated model.
|
||||
|
||||
\begin{figure}[h]
|
||||
\begin{figure}[ht]
|
||||
\includegraphics[width=0.6\paperwidth]{../figures/question_1_4}
|
||||
\caption{Generated points for Question 1.4.}
|
||||
\label{fig::plot_1_4}
|
||||
|
@ -173,7 +173,7 @@ In figure \ref{fig::plot_1_4} we have a 3D representation of the generated model
|
|||
|
||||
The estimation results compared to the results in question 1.2 are similar, there is very little bias. It appears that $x_2^{new}$ is sufficiently independent from $x_1$. We expected very little bias because $x2_{new}$ has a large independent part compared to $x_1$. The standard errors of the estimates of $\beta_1$ and $\beta_2$ are about 25\% higher wich can be explained partly bij the lower standard deviation in $x_2^{new}$.
|
||||
|
||||
\begin{table}[h]
|
||||
\begin{table}[ht]
|
||||
\centering
|
||||
\input{table_1_4}
|
||||
\caption{New Linear Fit on Generated Data}
|
||||
|
@ -185,7 +185,7 @@ The estimation results compared to the results in question 1.2 are similar, ther
|
|||
|
||||
Similar as in question 1.3 we estimated the parameter with a single independent variable.
|
||||
|
||||
\begin{table}[h]
|
||||
\begin{table}[ht]
|
||||
\centering
|
||||
\input{table_1_5}
|
||||
\caption{Linear Fit with 1 Variable}
|
||||
|
@ -225,7 +225,7 @@ Now we replace $x_1$ in the original model with
|
|||
\[x_1 \sim \mathcal{N}(3,\,1)\]
|
||||
|
||||
If we now estimate the parameters we find:
|
||||
\begin{table}[h]
|
||||
\begin{table}[ht]
|
||||
\centering
|
||||
\input{table_1_6}
|
||||
\caption{Generate Data with Small Variance on x1}
|
||||
|
@ -234,7 +234,7 @@ If we now estimate the parameters we find:
|
|||
|
||||
We find in table \ref{tab::table_1_6} that the parameters are essentially unbiased but have a bigger standard error for the intersect and $\beta_1$. The standard error of $\beta_1$ is 6 times bigger (from 0.016 to 0.10). We see no difference of the estimates $\beta_2$. Because nothing has changed in $x_2$.
|
||||
|
||||
\begin{figure}[h]
|
||||
\begin{figure}[ht]
|
||||
\includegraphics[width=0.6\paperwidth]{../figures/question_1_6}
|
||||
\caption{Generated points for Question 1.6.}
|
||||
\label{fig::plot_1_6}
|
||||
|
@ -250,7 +250,7 @@ We can explain the difference in standard error of the estimates of $\beta_1$ us
|
|||
|
||||
\[Var(\beta_1) = \sigma^2/Var(x_1)\]
|
||||
|
||||
This means that $Var(\(beta_1) \sim 1/Var(x_1)$.
|
||||
This means that $Var(\beta_1) \sim 1/Var(x_1)$.
|
||||
|
||||
Because $Var(x_1)$ changed from 36 in to 1, we expect the standard error to be $/sqrd(36) = 6$ times bigger. Which is exactly what we found.
|
||||
|
||||
|
@ -299,24 +299,21 @@ A \Rightarrow B
|
|||
|
||||
\section{Empirical Investigation}
|
||||
|
||||
Here is some example code to create tables and graphs from the
|
||||
Python script. In order for this to work you would first need
|
||||
to run the script non\_linear\_models\_example\_report.py. Running that
|
||||
file (using the recommended file structure) creates some figures
|
||||
in the figures folder and some tables in .tex files in the report folder.
|
||||
|
||||
|
||||
\subsection{Question 3}
|
||||
\subsection{Question 2.1}
|
||||
|
||||
For instance, here the file df\_table.tex is used print the actual numbers
|
||||
in the table.
|
||||
|
||||
\begin{table}[h]
|
||||
\input{df_table}
|
||||
\caption{This tables has the estimates for ...}
|
||||
\label{tab::estimation_results}
|
||||
\begin{table}[ht]
|
||||
\centering
|
||||
\input{summary_stats}
|
||||
\caption{Generate Data with Small Variance on x1}
|
||||
\label{tab::table_2_1}
|
||||
\end{table}
|
||||
|
||||
|
||||
\subsection{Question 4: Some graphs}
|
||||
|
||||
\begin{figure}
|
||||
|
@ -325,24 +322,6 @@ in the table.
|
|||
\label{fig::example_data}
|
||||
\end{figure}
|
||||
|
||||
In Figure \ref{fig::example_data} we see the data.
|
||||
|
||||
\begin{figure}
|
||||
\includegraphics[width=0.6\paperwidth]{../figures/quadratic_model_linear}
|
||||
\caption{This is a linear fit on a quadratic model.}
|
||||
\label{fig::example_quadratic_linear}
|
||||
\end{figure}
|
||||
|
||||
In Figure \ref{fig::example_quadratic_linear} we see a linear fit.
|
||||
|
||||
|
||||
\begin{figure}
|
||||
\includegraphics[width=0.6\paperwidth]{../figures/quadratic_model_quadratic}
|
||||
\caption{This is quadratic fit on a quadratic model.}
|
||||
\label{fig::example_quadratic_quadratic}
|
||||
\end{figure}
|
||||
|
||||
In Figure \ref{fig::example_quadratic_quadratic} we see that
|
||||
|
||||
\subsection{Question 5}
|
||||
|
||||
|
|
|
@ -45,9 +45,9 @@ if not os.path.exists(report_dir):
|
|||
|
||||
|
||||
# first birthday
|
||||
bd_1 = 3112
|
||||
bd_1 = 303
|
||||
# second birthday
|
||||
bd_2 = 3112
|
||||
bd_2 = 309
|
||||
|
||||
group_seed = bd_1 * bd_2
|
||||
|
||||
|
@ -89,14 +89,23 @@ data = data_full.iloc[observations , :].copy()
|
|||
print_question('Question 2.1: Descriptive Statistics')
|
||||
|
||||
# compute the summary statistics
|
||||
# data_summary = TODO
|
||||
|
||||
data.drop(['fail', 'urban', 'unearn','househ', 'amtland', 'unearnx'],
|
||||
axis='columns',
|
||||
inplace=True)
|
||||
|
||||
data = data[data['paidwork']==1]
|
||||
|
||||
data['school'] = data['yprim']+data['ysec']
|
||||
data['wage'] = np.exp(data['lwage'])
|
||||
data_summary = data.describe()
|
||||
|
||||
# print to screen
|
||||
# print(data_summary.T) [uncomment]
|
||||
print(data_summary.T)
|
||||
|
||||
# export the summary statistics to a file
|
||||
# data_frame_to_latex_table_file(report_dir + 'summmary_stats.tex',
|
||||
# data_summary.T) [uncomment]
|
||||
data_frame_to_latex_table_file(report_dir + 'summary_stats.tex',
|
||||
data_summary.T)
|
||||
|
||||
# -----------------------------------------------------------------------------
|
||||
# Question 2.2
|
||||
|
|
Loading…
Reference in a new issue