# 3. Stata Essentials


Before we proceed, let us open a dataset that is provided whenever we install Stata. This is a dataset on automobiles and their characteristics.

In [1]:
sysuse auto.dta, clear

(1978 Automobile Data)


## 3.1 Stata Syntax 

We'll begin this exploration by running a command that does not require any arguments to work.

In [2]:
describe


Contains data from C:\Program Files (x86)\Stata16\ado\base/a/auto.dta
  obs:            74                          1978 Automobile Data
 vars:            12                          13 Apr 2018 17:45
                                              (_dta has notes)
--------------------------------------------------------------------------------
              storage   display    value
variable name   type    format     label      variable label
--------------------------------------------------------------------------------
make            str18   %-18s                 Make and Model
price           int     %8.0gc                Price
mpg             int     %8.0g                 Mileage (mpg)
rep78           int     %8.0g                 Repair Record 1978
headroom        float   %6.1f                 Headroom (in.)
trunk           int     %8.0g                 Trunk space (cu. ft.)
weight          int     %8.0gc                Weight (lbs.)
length          int     %8.0g               

We observe that our dataset consists on 12 variables and 74 observations. We have a brief description of these variables. For instance, some of these variables are numeric (int, double, float, byte) and some are made of text (string).

The numeric variables can store numbers of different sizes based on their sub-type. You can see a brief description here

![](img/data_type_num.png)

The string variables can also store text of different size based on their sub-type. The brief description is provided here

![](img/data_type_str.png)


With this knowledge we can infer that the variable `make` probably contains the model name written as a text, and the variable `foreign` is probably a variable that takes the values 0 or 1 depending on whether the car is foreign made (i.e. a dummy variable.



## 3.2 Stata Syntax 

Let's look at a very useful command to get some statistics from our variables

In [3]:
summarize foreign


    Variable |        Obs        Mean    Std. Dev.       Min        Max
-------------+---------------------------------------------------------
     foreign |         74    .2972973    .4601885          0          1


We can do the same for multiple variables at the same time

In [4]:
summarize foreign length


    Variable |        Obs        Mean    Std. Dev.       Min        Max
-------------+---------------------------------------------------------
     foreign |         74    .2972973    .4601885          0          1
      length |         74    187.9324    22.26634        142        233


We can open a new page that provides the Stata documentation of a particular command. For instance, we may run 

In [5]:
help summarize

Notice what the output says here

![](img/syntax_summarize.png)


What this is saying is that we can write `summarize` command or at least the abreviation `su`, which is underlined. Also, you will notice that there are some blue names within square brackets. These are optional arguments in this command. Finally, the documentation provides the list of the available options, which need to be written after a comma. 

This means that if we write `summarize` (or the abreviation) and nothing else, it should work fine!

In [6]:
su


    Variable |        Obs        Mean    Std. Dev.       Min        Max
-------------+---------------------------------------------------------
        make |          0
       price |         74    6165.257    2949.496       3291      15906
         mpg |         74     21.2973    5.785503         12         41
       rep78 |         69    3.405797    .9899323          1          5
    headroom |         74    2.993243    .8459948        1.5          5
-------------+---------------------------------------------------------
       trunk |         74    13.75676    4.277404          5         23
      weight |         74    3019.459    777.1936       1760       4840
      length |         74    187.9324    22.26634        142        233
        turn |         74    39.64865    4.399354         31         51
displacement |         74    197.2973    91.83722         79        425
-------------+---------------------------------------------------------
  gear_ratio |         74    3.014865

### 3.2.1 If Conditions

When the syntax of the command allows for `[if]`, it means that we can run the command on a subset of the data that satisfies the condition. The list of conditional operators is the following:

1. Equal sign: ==
2. Greater and Less than: > and <
3. Greater than or equal and Less than or equal: >= and <= 
4. Not Equal: != 

We can also compound different conditions using the list of logical operators:

1. And: & 
2. Or: | 
3. Not: ! or ~ 

Let's look at an example using this new knowledge

In [7]:
su price if foreign==0 


    Variable |        Obs        Mean    Std. Dev.       Min        Max
-------------+---------------------------------------------------------
       price |         52    6072.423    3097.104       3291      15906


In [8]:
su price if foreign==0  & mpg<25


    Variable |        Obs        Mean    Std. Dev.       Min        Max
-------------+---------------------------------------------------------
       price |         44    6354.568    3273.345       3291      15906


We can also make use of the functions `inlist()` and `inrange()` when we want to restrict to a particular list of values or to a particular range.

In [9]:
su price if inlist(mpg,10,15,25,40)


    Variable |        Obs        Mean    Std. Dev.       Min        Max
-------------+---------------------------------------------------------
       price |          7    6507.857     1838.25       4482       9735


Which works the exact same way as

In [10]:
su price if mpg == 10 | mpg == 15 | mpg == 25 | mpg == 40


    Variable |        Obs        Mean    Std. Dev.       Min        Max
-------------+---------------------------------------------------------
       price |          7    6507.857     1838.25       4482       9735


In [11]:
su price if inrange(mpg,5,25) 


    Variable |        Obs        Mean    Std. Dev.       Min        Max
-------------+---------------------------------------------------------
       price |         60    6577.083    3117.013       3291      15906


Which works the exact same way as 

In [12]:
su price if mpg>=5 & mpg<=25


    Variable |        Obs        Mean    Std. Dev.       Min        Max
-------------+---------------------------------------------------------
       price |         60    6577.083    3117.013       3291      15906


### 3.2.2 Missing Values

 There will be observations where there is no information recorded for a particular variable. When it is a string variable it will show as `""` (empty text), and when it is a numeric variable it will show as `.` (a single dot).
 
 Missing values for numeric types are considered infinity in Stata. If you write `su price if mpg>5`
 it may include observations where `mpg` is missing! Consider the following example.


In [13]:
su price if rep78>2


    Variable |        Obs        Mean    Std. Dev.       Min        Max
-------------+---------------------------------------------------------
       price |         64    6239.984    2925.843       3291      15906


In [14]:
su price if rep78>2 & rep78!=.


    Variable |        Obs        Mean    Std. Dev.       Min        Max
-------------+---------------------------------------------------------
       price |         59    6223.847    2880.454       3291      15906


The easiest way is to use the function `missing()`. You will notice that the following line will work just the same way

In [15]:
su price if rep78>2 & !missing(rep78)


    Variable |        Obs        Mean    Std. Dev.       Min        Max
-------------+---------------------------------------------------------
       price |         59    6223.847    2880.454       3291      15906


### 3.2.3 In Conditions 

We can also subset the data in terms of the observation number. 

In [16]:
su price in 1/10


    Variable |        Obs        Mean    Std. Dev.       Min        Max
-------------+---------------------------------------------------------
       price |         10      5517.4    2063.518       3799      10372


Using these type of conditions is generally not recommended because it is sensible to the way the data is sorted. Suppose now we want to order the data from lower to higher price and we attempt to run the same command.

In [17]:
sort price 
su price in 1/10




    Variable |        Obs        Mean    Std. Dev.       Min        Max
-------------+---------------------------------------------------------
       price |         10      3726.5    245.9007       3291       3984


And you can see that the result changes. This is why you should avoid using `in` whenever you can use an `if` condition instead. 

### 3.2.4 Options

From the documentation file, we observed that we can introduce some optional arguments after a comma. 

In [18]:
su price , detail


                            Price
-------------------------------------------------------------
      Percentiles      Smallest
 1%         3291           3291
 5%         3748           3299
10%         3895           3667       Obs                  74
25%         4195           3748       Sum of Wgt.          74

50%       5006.5                      Mean           6165.257
                        Largest       Std. Dev.      2949.496
75%         6342          13466
90%        11385          13594       Variance        8699526
95%        13466          14500       Skewness       1.653434
99%        15906          15906       Kurtosis       4.819188


And the options can have abbreviations as well!

In [19]:
su price , d


                            Price
-------------------------------------------------------------
      Percentiles      Smallest
 1%         3291           3291
 5%         3748           3299
10%         3895           3667       Obs                  74
25%         4195           3748       Sum of Wgt.          74

50%       5006.5                      Mean           6165.257
                        Largest       Std. Dev.      2949.496
75%         6342          13466
90%        11385          13594       Variance        8699526
95%        13466          14500       Skewness       1.653434
99%        15906          15906       Kurtosis       4.819188


# 3.2.5 Wrapping up


In this lecture we learned how to understand the way Stata commands work and their syntax. In general, a standard Stata command will follow the folllowing structure 

```
  name_of_command [varlist] [if] [in] [weight] [, options]
```

At this point, you should feel more comfortable reading a documentation file for a Stata command. The question that remains is how to find new commands!

You are encouraged to search for commands using the command ``search``. For example, if you are interested in running a regression you can write

In [20]:
search regress 

You will see that a new window pops up and you can click at the different options that it shows to look at the documentation for all these commands. The new window should look like this 

![](img/search_regress.png)

In any of the following lectures, whenever there is a command confuses you, you should feel free to write `search command` or `help command` to redirect to the documentation. 