# ECON 490: Panel Data Regressions (17)

## Prerequisites:
---
1. Run OLS Regressions.

## Learning objectives:
---

1. Prepare your data for time series analysis. 
2. Run panel data regressions.
3. Conduct post regression tests for panel data regressions.
4. Correct for heteroscedasticity and serial correlation.

This module is undertaken using the [Penn World Tables](https://www.rug.nl/ggdc/productivity/pwt/?lang=en) which measures income, output, input and productivity, covering 183 countries between 1950 and 2019. Before beginning this module you shoudl download that data in Stata format.

## 17.1 When do we need Panel Data Regressions?

Panel data regressions allow us to answer empirical questions that cannot be answered with other types of data sets such as cross-sectional date or time-series data. 

Cross-sectional data sets contain observations that are only measured at one point in time. There may be several different versions of the data set that are collected over time (monthly, annually, etc.), but each different version includes an entirely different set of individuals.

Cross-sectional data allows us to explore variations between individuals at one point in time but does not allow us to explore variations overtime for those same individuals. 

Time-series data sets contain observations that are measured over several years on only one country, state, province etc. For example, national measures of income, output, unemployment, fertility rates are time-series data. 

Time-series data allows us to explore variations over time for one individual country (for example) but does not allow us variations between individual countries at one point in time.

Panel data sets include observations on the same variables from the same cross-sectional sample from two or more different time periods. These data set allow us to answer questions that we cannot answer with time series and cross-sectional data; they allow us to simultaneously explore variations over time for individual countries (for example) and variations between individuals at one point in time. 

This approach is extremely productive for two reasons:

The data sets are large, much larger than if were to use data that was collected at one point in time.
Panel data regressions control for variables that do not change over time and are difficult to measure - such as geography and culture. 
 

## 17.2 Time Series Variables



To run a panel regression, we have to specify both panel and time variables. In our Penn World data set (from the previous module), the panel variable is “country” and a time variable is “year”.  Specifying the panel and time variables requires that both of the variables we are using are coded as numeric variables. This is not a problem for the year variable, since year is just coded as a number (i.e. 2019). It is problem for country however since it is coded as a string variable - a series of letters such as Canada.

This is a simple problem to fix. Here we have to execute the encode command that creates a new variable that is a numeric version of the variable country:


Now we can proceed with specifing both panel and time variables by using the command xtset and listing first the panel variable and then the time variable:
 

In [None]:
encode country, generate(code)

In [None]:
xtset code year, yearly