diff --git a/report/Assignment.tex b/report/Assignment.tex index 8441b0d..ef4250c 100644 --- a/report/Assignment.tex +++ b/report/Assignment.tex @@ -138,7 +138,7 @@ In figure \ref{fig::plot_1_1} we have a 3D representation of the generated model We now estimate the parameters of $\beta_0$, $\beta_1$ and $\beta_2$ using the \textbf{Ordinary Least Squares} (OLS) method. With model: \[y_i=\beta_0 + \beta_1 x_1 + \beta_2 x2 + u_i\] -\begin{table}[h] +\begin{table}[ht] \centering \input{table_1_2} \caption{Linear Fit on Generated Data} @@ -154,7 +154,7 @@ We can explain the bias of $\beta_0$ because this new model has a new error term Wich model would you choose? If I have sufficient calculation power, I would choose model 1.2 as it is much more accurate. However for a very resource constraint situation model 1.3 might give acceptable estimates. -\begin{table}[h] +\begin{table}[ht] \centering \input{table_1_3} \caption{Linear Fit with 1 Variable} @@ -165,7 +165,7 @@ Wich model would you choose? If I have sufficient calculation power, I would cho In figure \ref{fig::plot_1_4} we have a 3D representation of the generated model. -\begin{figure}[h] +\begin{figure}[ht] \includegraphics[width=0.6\paperwidth]{../figures/question_1_4} \caption{Generated points for Question 1.4.} \label{fig::plot_1_4} @@ -173,7 +173,7 @@ In figure \ref{fig::plot_1_4} we have a 3D representation of the generated model The estimation results compared to the results in question 1.2 are similar, there is very little bias. It appears that $x_2^{new}$ is sufficiently independent from $x_1$. We expected very little bias because $x2_{new}$ has a large independent part compared to $x_1$. The standard errors of the estimates of $\beta_1$ and $\beta_2$ are about 25\% higher wich can be explained partly bij the lower standard deviation in $x_2^{new}$. -\begin{table}[h] +\begin{table}[ht] \centering \input{table_1_4} \caption{New Linear Fit on Generated Data} @@ -185,7 +185,7 @@ The estimation results compared to the results in question 1.2 are similar, ther Similar as in question 1.3 we estimated the parameter with a single independent variable. -\begin{table}[h] +\begin{table}[ht] \centering \input{table_1_5} \caption{Linear Fit with 1 Variable} @@ -225,7 +225,7 @@ Now we replace $x_1$ in the original model with \[x_1 \sim \mathcal{N}(3,\,1)\] If we now estimate the parameters we find: -\begin{table}[h] +\begin{table}[ht] \centering \input{table_1_6} \caption{Generate Data with Small Variance on x1} @@ -234,7 +234,7 @@ If we now estimate the parameters we find: We find in table \ref{tab::table_1_6} that the parameters are essentially unbiased but have a bigger standard error for the intersect and $\beta_1$. The standard error of $\beta_1$ is 6 times bigger (from 0.016 to 0.10). We see no difference of the estimates $\beta_2$. Because nothing has changed in $x_2$. -\begin{figure}[h] +\begin{figure}[ht] \includegraphics[width=0.6\paperwidth]{../figures/question_1_6} \caption{Generated points for Question 1.6.} \label{fig::plot_1_6} @@ -250,7 +250,7 @@ We can explain the difference in standard error of the estimates of $\beta_1$ us \[Var(\beta_1) = \sigma^2/Var(x_1)\] -This means that $Var(\(beta_1) \sim 1/Var(x_1)$. +This means that $Var(\beta_1) \sim 1/Var(x_1)$. Because $Var(x_1)$ changed from 36 in to 1, we expect the standard error to be $/sqrd(36) = 6$ times bigger. Which is exactly what we found. @@ -299,24 +299,21 @@ A \Rightarrow B \section{Empirical Investigation} -Here is some example code to create tables and graphs from the -Python script. In order for this to work you would first need -to run the script non\_linear\_models\_example\_report.py. Running that -file (using the recommended file structure) creates some figures -in the figures folder and some tables in .tex files in the report folder. -\subsection{Question 3} +\subsection{Question 2.1} For instance, here the file df\_table.tex is used print the actual numbers in the table. -\begin{table}[h] -\input{df_table} -\caption{This tables has the estimates for ...} -\label{tab::estimation_results} +\begin{table}[ht] +\centering +\input{summary_stats} +\caption{Generate Data with Small Variance on x1} +\label{tab::table_2_1} \end{table} + \subsection{Question 4: Some graphs} \begin{figure} @@ -325,24 +322,6 @@ in the table. \label{fig::example_data} \end{figure} -In Figure \ref{fig::example_data} we see the data. - -\begin{figure} -\includegraphics[width=0.6\paperwidth]{../figures/quadratic_model_linear} -\caption{This is a linear fit on a quadratic model.} -\label{fig::example_quadratic_linear} -\end{figure} - -In Figure \ref{fig::example_quadratic_linear} we see a linear fit. - - -\begin{figure} -\includegraphics[width=0.6\paperwidth]{../figures/quadratic_model_quadratic} -\caption{This is quadratic fit on a quadratic model.} -\label{fig::example_quadratic_quadratic} -\end{figure} - -In Figure \ref{fig::example_quadratic_quadratic} we see that \subsection{Question 5} diff --git a/scripts/empirical.py b/scripts/empirical.py index 48aaca8..fb73ebb 100644 --- a/scripts/empirical.py +++ b/scripts/empirical.py @@ -45,9 +45,9 @@ if not os.path.exists(report_dir): # first birthday -bd_1 = 3112 +bd_1 = 303 # second birthday -bd_2 = 3112 +bd_2 = 309 group_seed = bd_1 * bd_2 @@ -89,14 +89,23 @@ data = data_full.iloc[observations , :].copy() print_question('Question 2.1: Descriptive Statistics') # compute the summary statistics -# data_summary = TODO + +data.drop(['fail', 'urban', 'unearn','househ', 'amtland', 'unearnx'], + axis='columns', + inplace=True) + +data = data[data['paidwork']==1] + +data['school'] = data['yprim']+data['ysec'] +data['wage'] = np.exp(data['lwage']) +data_summary = data.describe() # print to screen -# print(data_summary.T) [uncomment] +print(data_summary.T) # export the summary statistics to a file -# data_frame_to_latex_table_file(report_dir + 'summmary_stats.tex', -# data_summary.T) [uncomment] +data_frame_to_latex_table_file(report_dir + 'summary_stats.tex', + data_summary.T) # ----------------------------------------------------------------------------- # Question 2.2