# ECON 490: Opening Datasets (5)

## Prerequisites:
---
1. Understand how to effectively use Stata do files and know how to generate log files.
2. Run basic Stata commands such as `help`, `describe`, `summarize`, `for` and `while`.
3. Know how to use macros in writing Stata commands.

## Learning objectives:
---

1. Understand how to use `clear` at the beginning of our do-files.
2. Know how to change your directory so that Stata can find your files.
3. Import datasets in csv and excel formats.
4. Import datasets in dta format. 
5. Save data files. 

In this repository you will find a folder named "data", with a sub-folder named "raw". In that sub-folder you will find two different versions of the same dataset: fake_data.csv and fake_data.dta. The dataset simulates information of workers in the years 1982-2012 in a fake country where, in 2003, a policy was enacted that allowed some workers to enter a training program with the purpose of boosting their earnings. We will be using this dataset to learn how to explore and manipulate real-world datasets. 

## 5.1 Clearing the Workspace

Do-files should begin with a command that clears the previous work that has been open in Stata. This makes sure that 
1. We do not waste computer memory in things other than the current project.
2. Whatever result we obtain in the current session truly belongs to that session.


We can clear the workspace from many different things (see `help clear` if needed). For the purpose of this lecture, the easiest thing to deal with it all is to write the following

In [None]:
clear *

## 5.2 Changing Directories 

Before we get started on importing data into Stata, it is useful to know how to change the folder that Stata accesses whenever you run a command that either opens or saves a file. Once you instruct Stata to change the directory to a specific folder, from that point onward it will open files from that folder and save all files to that folder, including data files, do files, and log files. Stata will continue to do this until either the program is closed or you change to another directory. This means that every time you open Stata you will need to change the directory to the one you want to use. 

<div class="alert alert-info">

**Note:**  We write the directory path within quotation marks to make sure Stata interprets this as a single string of words. If we didn't do this, we may encounter issues with folders that include blank spaces. 

</div>

Let's change the directory to the specific location where you saved the fake_data file using the command below. You can change your workspace to a directory named "some_folder/some_sub_folder" by writing `cd "some_folder/some_sub_folder"`. 

Use the space below to do this on your own computer.

In [None]:
cd " " 
* type your file path to the folder containing the data between the quotation marks in the line above

Notice that once we change directory Stata outputs the full name of the directory where we are currently working.

One trick to using `cd` is that you can use periods (.) to move back folders: two period to move back one folder, three periods to move back two folders etc. Try the command below to compare the folder Stata is now directed to compared to the command above. You can repeat this using two periods.

In [None]:
cd ..

## 5.3 Opening Datasets 

### 5.3.1 Excel and CSV files 
When looking for the data for your research you will realize that many data sets are not formatted for Stata. In some cases, data sets are formatted as excel or csv files. Not surprisingly the command to to this job is called `import`, and has two main versions: `import excel` and `import delimited`. 

Let's import the dataset called `fake_data.csv`. We would need to use import delimited to import this data into stata. The syntax of this command is `import delimited [using] filename [, import_delimited_options]`. 

We always include the option `clear` when we use import to make sure we're clearing any previous dataset that was opened before. Recall that to use an option, we include a comma (`,`) after the command line and write the option name. You are welcome to also read the documentation of these commands by writing `help import delimited`.

Note the command below will not import the date unless you have changed your directory (above) to the folder than contain this file. 

In [None]:
import delimited using "fake_data.csv", clear

When you run this command, Stata will have printed a message that says that there were 9 variables found with almost 3 million observations.  When we open datasets that are not in Stata format, it is very important to check whether the first row of the data include the variable names. 

You can use the command `list` to look at your data. It would be better to limit the observations since you don't want to see all 3 million! Here we use `in` to constrain the list to the first 3 observations. You can choose range number you like to view.

In [None]:
list in 1/3 

By default the first row of data is interpreted as the variable names, which in this case was correct. If that's not the case, we need to include the import delimited option `varnames(#|nonames)`, where we replace `#` by the observation number that includes the names. If the data has no names the option is `varnames(nonames)`. Don't forget that you can always check the documentation by writing `help import delimited`.

### 5.3.2 Stata files
To open datasets in Stata format we use the command `use`. As we can observe from the example below, we can recognize a dataset is stored in stata format because the file's name will end with .dta.

In [None]:
use "fake_data.dta", clear

In [None]:
list in 1/3 

### 5.3.3 Other files

You can open a number of different data files in Stata with no issues. If you are stuggling, one option at UBC is to use the program StatTransfer to convert your file to dta format. This program is available in the library on the UBC Vancouver Campus at one of the [Digital Scholarship workstations](https://researchcommons.library.ubc.ca/digital-scholarship-lab-use-policy-and-guideline/).

<div class="alert alert-info">

**Note:** UBC has research support available for any student who needs help with data, including anyone who needs help getting the data into a format that can be imported into Stata. You can find the contact information for the Economics Librarian on the [UBC Library ECON 490 Research Guide](https://guides.library.ubc.ca/ECON490).

</div>


## 5.4 Saving Datasets 

You can save any opened dataset into Stata format by writing `save using "some_directory/dataset_name.dta", replace`. The replace option overwrites a previous version of the file. 

We can also save files into different formats with the `export excel` and `export delimited` commands. Look at the help documentation for more details.


## Wrapping Up

Now that you are able to import data into Stata you start doing your own analysis! Try finding a data set that interests you and practice some of the commands that you have already learned in the first few Modules.

