# Production Technology

The dataset contains `N = 441` firms observed over `T = 12` years, 1968-1979. There variables are: 
* `lcap`: Log of capital stock, $k_{it}$ 
* `lemp`: log of employment, $\ell_{it}$ 
* `ldsa`: log of deflated sales, $y_{it}$
* `year`: the calendar year of the observation, `year` $ = 1968, ..., 1979$, 
* `firmid`: anonymized indicator variable for the firm, $i = 1, ..., N$, with $N=441$. 

In [1]:
import pandas as pd 
import numpy as np
import seaborn as sns

In [2]:
dat = pd.read_csv('firms.csv')

In [3]:
dat.sample(5)

Unnamed: 0,firmid,year,lcap,lemp,ldsa
2419,202,1975,-1.23397,-0.106743,-0.518984
2650,221,1978,1.270197,0.315139,0.825031
5101,426,1969,0.779941,0.524042,0.176016
3140,262,1976,-0.38715,-0.806097,-0.769552
795,67,1971,0.879103,1.306345,0.973246


In [4]:
dat.year.unique()

array([1968, 1969, 1970, 1971, 1972, 1973, 1974, 1975, 1976, 1977, 1978,
       1979], dtype=int64)

# Descriptives

In [None]:
dat.describe()

In [None]:
dat[['lcap','lemp','ldsa']].hist();

In [None]:
sns.scatterplot(x='lemp', y='ldsa', data=dat); 

# Converting data to numpy format 

In [None]:
dat.ldsa.values.shape

In [None]:
N = dat.firmid.unique().size
T = dat.year.unique().size
assert dat.shape[0] == N*T, f'Error: data is not a balanced panel'
print(f'Data has N={N} and T={T}')

Extract data from `pandas` to `numpy` arrays. 

In [None]:
y = dat.ldsa.values.reshape((N*T,1))

ones = np.ones((N*T,1))
l = dat.lemp.values.reshape((N*T,1))
k = dat.lcap.values.reshape((N*T,1))
X = np.hstack([ones, l, k])