# ECON 490: Locals and Globals (4)

## Prerequisites 

1. View the characteristics of any dataset using the command `describe`.
2. Use `help` to learn how to run commands.
3. Understand the Stata command syntax using the command `summarize`.
4. Create loops using the commands `for`, `while`, `forvalues` and `foreach`.

## Learning Outcomes

1. Recognize the difference between data set variables and Stata variables.
2. Recognize the difference between local and global Stata variables.
3. Use the command `local` to create temporary macros.
4. Use the command `global` to create permanent macros.
5. Forecast how you will use macros in your own research.

## 4.1 Stata Variables

In ECON 325 and ECON 326, you learned that "variables" are characteristics of a data set. For example, if we had a data set that included all of the countries in the world, we might have a variable which indicates each country's population. As another example, if we had a data set that included a sample of persons in Canada, we might have a variable which indicates each person's marital status. These are data set variables, and they can be qualitative (strings) or quantitative (numeric). 

In Stata, there is a separate category of variables available for use which we call "macros". Macros work as placeholders for values that we want to store either temporarily or permanently. Locals are macros that store data temporarily, while globals are macros that store data permanently, or at least as long as we have Stata open on our computer. We can think of Stata macros as analogous to workspace objects in Python or R. Below, you are going to learn how to use these macros in your own research.

## 4.2 Locals 

The first use of local macros is to store results of your code. To help you understand how powerful this is, you should be aware that most Stata commands have hidden results stored after they are run. Consider the following example

In [1]:
sysuse auto,clear

summarize price


(1978 automobile data)


    Variable |        Obs        Mean    Std. dev.       Min        Max
-------------+---------------------------------------------------------
       price |         74    6165.257    2949.496       3291      15906



When we ran `summarize` above, Stata produced output that was stored in several local variables. We can access those stored results with the command `return list` (for regular commands) or `ereturn list` (for estimation commands, which we'll cover later in Module 12). Since `summarize` is not an estimation command, we can run the following:

In [2]:
return list


scalars:
                r(sum) =  456229
                r(max) =  15906
                r(min) =  3291
                 r(sd) =  2949.495884768919
                r(Var) =  8699525.974268788
               r(mean) =  6165.256756756757
              r(sum_w) =  74
                  r(N) =  74


Notice that Stata has reported that variables have been stored as scalars, where a scalar is simply a quantity. 

If we want Stata to tell us the mean price from the automobile data set that was just calculated using `summarize`, we can use the following:

In [3]:
display r(mean)

6165.2568


We can now store that scalar as a local, and use that local in other Stata commands:

In [4]:
local price_mean = r(mean)
display "The mean of price variable is `price_mean'." 



The mean of price variable is 6165.256756756757.


Imagine that we wanted to create a new variable that is equal to the price minus the mean of that same variable. We would do this if we wanted to de-mean that variable or, in other words, create a new price variable that has a mean of zero. To do this, we could use the `generate` command along with the local we just created to do exactly that:

In [7]:
local price_mean = r(mean)
g price_demean = price - `price_mean'

Note that there is no output when we run this command. 

If we try to run this command a second time, we will get an error because Stata doesn't want us to accidentally overwrite an existing variable. In order to correct this problem, we need to use the command `replace` instead of the command `generate`. Try it yourself above!

Let's take a look at the mean of our new variable using `summarize` again. 

In [8]:
su price_demean


    Variable |        Obs        Mean    Std. dev.       Min        Max
-------------+---------------------------------------------------------
price_demean |         74   -.0000154    2949.496  -2874.257   9740.743


We can see that the mean is roughly zero just we as expected.

Locals are automatically generated whenever we use loops (as discussed in [Module 3](econometrics/econ490-stata/3_Stata_Essentials.ipynb)). 

Consider another common application here involving a categorical variable that can take on 5 possible values. 

In [9]:
su rep78


    Variable |        Obs        Mean    Std. dev.       Min        Max
-------------+---------------------------------------------------------
       rep78 |         69    3.405797    .9899323          1          5


Note that if we run the command above that we used to display the mean of price, we will now get a different value. Try it yourself!

There are times when we might want to save all the possible categorical values in a local. When we use the `levelsof` command as is done below, we can create a new local with a name that we choose. Here, that name is _levels_rep78_.

In [10]:
levelsof rep78, local(levels_rep)

1 2 3 4 5


We can do different things with this new list of values. For instance, we can now summarize a variable based on every distinct value of _rep78_, by creating a loop using `foreach` and looping through values of the newly created local. 

In [11]:
foreach x in `levels_rep' {
su price if rep78 == `x'
}



    Variable |        Obs        Mean    Std. dev.       Min        Max
-------------+---------------------------------------------------------
       price |          2      4564.5    522.5519       4195       4934

    Variable |        Obs        Mean    Std. dev.       Min        Max
-------------+---------------------------------------------------------
       price |          8    5967.625    3579.357       3667      14500

    Variable |        Obs        Mean    Std. dev.       Min        Max
-------------+---------------------------------------------------------
       price |         30    6429.233     3525.14       3291      15906

    Variable |        Obs        Mean    Std. dev.       Min        Max
-------------+---------------------------------------------------------
       price |         18      6071.5    1709.608       3829       9735

    Variable |        Obs        Mean    Std. dev.       Min        Max
-------------+--------------------------------------------

## 4.3 Globals

Globals are used to store lists of variable names, paths and/or directories that we need for our research project. 

Consider the following example where we create a global called *covariates* that is simply a list of three variable names:

In [16]:
global covariates "rep78 foreign"

We can now use this global anywhere we want to invoke the three variables specified. When we want to indicate that we are using a global, we refer to this type of macro with the dollar sign symbol `$`.

Here we `summarize` these three variables. 

In [13]:
su ${covariates}



    Variable |        Obs        Mean    Std. dev.       Min        Max
-------------+---------------------------------------------------------
       price |         74    6165.257    2949.496       3291      15906
       rep78 |         69    3.405797    .9899323          1          5
     foreign |         74    .2972973    .4601885          0          1


In the empty cell below, `describe` these three variables using the macro we have just created. 

## 4.4 Wrap Up

In this module we learned how Stata has its own set of variables that have some very useful applications. We will see these macros throughout the following modules. You will also use them in your own research project.  

To demonstrate how useful macros can be, we can use our _covariates_ global to run a very simple regression in which _price_ is the dependent variable and the explanatory variables are _rep78_ and _foreign_. That command using our macro would be:

In [17]:
regress price ${covariates}



      Source |       SS           df       MS      Number of obs   =        69
-------------+----------------------------------   F(2, 66)        =      0.02
       Model |  425748.824         2  212874.412   Prob > F        =    0.9759
    Residual |   576371210        66  8732897.12   R-squared       =    0.0007
-------------+----------------------------------   Adj R-squared   =   -0.0295
       Total |   576796959        68  8482308.22   Root MSE        =    2955.1

------------------------------------------------------------------------------
       price | Coefficient  Std. err.      t    P>|t|     [95% conf. interval]
-------------+----------------------------------------------------------------
       rep78 |   76.29497   449.2741     0.17   0.866    -820.7098    973.2997
     foreign |  -205.6112   959.5456    -0.21   0.831    -2121.406    1710.183
       _cons |   5948.776   1422.631     4.18   0.000     3108.401     8789.15
-------------------------------------------------

If we only wanted to include observations where price is above average, then using the local we created earlier in this module the regression would be:

In [18]:
regress price ${covariates} if price > `price_mean'



      Source |       SS           df       MS      Number of obs   =        20
-------------+----------------------------------   F(2, 17)        =      2.03
       Model |  32961080.9         2  16480540.5   Prob > F        =    0.1615
    Residual |   137775181        17  8104422.42   R-squared       =    0.1931
-------------+----------------------------------   Adj R-squared   =    0.0981
       Total |   170736262        19  8986119.05   Root MSE        =    2846.8

------------------------------------------------------------------------------
       price | Coefficient  Std. err.      t    P>|t|     [95% conf. interval]
-------------+----------------------------------------------------------------
       rep78 |  -209.2781   1019.872    -0.21   0.840    -2361.021    1942.465
     foreign |  -2388.877   1678.406    -1.42   0.173    -5930.003    1152.249
       _cons |   11510.02   3250.218     3.54   0.003     4652.663    18367.39
-------------------------------------------------

You can see for yourself that Stata ran the regression on only a subset of the data.

In the next module, we will work on importing data sets in various formats.