# **Lab 6 — Inputs and Outputs**
---

## Introduction

A proper understanding of how to get information into and out of your programs is essential for making them useful! This concept is often called "I/O" — short for "input/output" — and it encompasses reading in and writing files as well as printing information to the console for immediate viewing. In this lab, we will cover several types of I/O using Python's built-in `print()` function with string formatting as well as file I/O with NumPy and pandas.

**New this week:** Your deliverable for this lab will be a ZIP file containing this notebook, with "deliverables" completed as requested below, as well as two additional files required for Deliverable 3. Please rename your ZIP file to `<last_name>_lab_06.zip` prior to submission. Submit your ZIP file to Canvas under the Lab 6 assignment no later than **midnight Thursday, October 7th**.

## Resources

[Python output formatting](https://docs.python.org/3/tutorial/inputoutput.html)  
[NumPy I/O](https://numpy.org/doc/stable/user/basics.io.genfromtxt.html)  
[pandas I/O](https://pandas.pydata.org/pandas-docs/stable/user_guide/io.html)

## Exercise I: Python's `print()` function and string formatting

In many previous labs you have made use of the `print()` function to show the contents of variables. You've also displayed the values of variables or calculations by simply entering them on their own line at the end of a cell:

In [None]:
#notice that one line below is simply printed, and the other is marked as 'Out:'
x = 5
print(x)
x ** 2

You've probably noticed that this occurs only if the variable is on the **last** line of the cell. So this doesn't display anything:

In [None]:
x
y = 6

And this only displays the value of the variable placed on the **last** line:

In [None]:
x
y

In general, you want to use the `print()` function explicitly to reliably show information to the user. Showing variables by leaving them as the last line in a code cell is for helpful interactive display only — it isn't as robust and it doesn't work everywhere (once you switch from notebooks to scripts, you will have to use `print()` as nothing will be implicitly displayed).

Note that you do not need to provide a string to `print()`. For example, the following all work:

In [None]:
print(5)  # An integer

print(6.893)  # A float

print({'cat': 'liquid'})  # A dictionary

import numpy as np
print(np.array([1, 2, 3]))  # A NumPy array

This works because Python internally converts these objects to strings before displaying them. For more control, however, you'll want to format things into strings before printing them. You can do this using the [`format()` method](https://www.w3schools.com/python/ref_string_format.asp) of strings. This allows you to insert values into a string, as well as format those inserted values to your liking. You place open brackets `{}` where you want values to be inserted in the string; the values to be placed are provided as inputs to `format()`. This is useful if you want to print the value stored in a variable. Here are some examples:

In [None]:
print('My dog is {} years old.'.format(5))

lat_lon = (64.8378, 147.7164)
print('Fairbanks is located at: {}ºN, {}ºW'.format(lat_lon[0], lat_lon[1]))  # Multiple inputs are OK, as long as you have enough open brackets

print('{:.2f} is short for {}.'.format(np.pi, np.pi))

The last line formatted `np.pi` (which is a long float with the digits of $\pi$) by truncating it to two digits after the decimal place. This was accomplished by providing a formatting code inside the brackets — `:.2f` in this case.

A breakdown of the format code:

* `:` signals that we're specifying a format code — you always need this
* `.2` says that we want 2 digits after the decimal
* `f` says we want to format the number as a float

There is a lot you can do with these, but the syntax is tricky so it's best to learn by example. Take a look at more examples [here](https://docs.python.org/3/library/string.html#format-examples) if you're curious, but we won't dive into these further.

## Deliverable 1: String formatting

In a **new code cell** below, write **5 print statements** that use the string `format()` method. There is plenty of room for creativity here!

## Exercise II: File I/O with NumPy

Documentation:
* [`loadtxt()`](https://numpy.org/doc/stable/reference/generated/numpy.loadtxt.html)
* [`genfromtxt()`](https://numpy.org/doc/stable/reference/generated/numpy.genfromtxt.html)
* [`savetxt()`](https://numpy.org/doc/stable/reference/generated/numpy.savetxt.html)

### Reading files

For reading in files using NumPy, there are two main options: `loadtxt()` and `genfromtxt()`. Using `genfromtxt()` allows for more flexibility in terms of handling missing data and handling different data types.

First, let's download a text file called `station.txt` and view its contents:

In [None]:
!curl -O -s http://www.grapenthin.org/teaching/geop501/download/lab07/station.txt  # Download the file
!cat station.txt  # Display the contents

The output you see above is what you'd see if you opened `station.txt` in a text editor. This gives you an idea of how the text file is organized. Here are two examples which read this file in using NumPy functions:

In [None]:
# This approach skips the first line of the file, which contains column names, using "skip_header=1"
example_array = np.genfromtxt('station.txt', encoding='utf-8', dtype=None, delimiter=' ', skip_header=1)

# This approach actually uses the header line to name the output variables, using "names=True"
example_array = np.genfromtxt('station.txt', encoding='utf-8', dtype=None, delimiter=' ', names=True)

# Now this gives you a NumPy array of the values of the "Name" column
print(example_array['Name'])

Note that `delimiter=' '` specifies what is separating the columns — in this case, a [blank space](https://www.youtube.com/watch?v=e-ORhEE9VVg&ab_channel=TaylorSwiftVEVO). A file with commas as delimiters instead would have lines like this:
```
ANMO,34.9500,-106.4600,1820,1,2
```
This is commonly known as a "comma-separated values" file or CSV file. Sound familiar?

### Writing files

To write a file using NumPy, you can use `savetxt()` to save arrays using defined formats. An example to write out the `example_array` to a new file called `file_out.txt` is below. In this example, `fmt` is defining the variable types for each column in `example_array` to be saved to the file (`%s` = string, `%f` = float, `%i` = integer).

In [None]:
np.savetxt('file_out.txt', example_array, fmt='%s %f %f %i %i %i')

!cat file_out.txt  # View the file contents

## Exercise II: File I/O with pandas

Documentation:
* [`read_csv()`](https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.read_csv.html)
* [`read_excel()`](https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.read_excel.html)
* [`to_csv()`](https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.to_csv.html)
* [`to_excel()`](https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.to_excel.html)

### Reading files

We spent time in the last lab working with DataFrames in the pandas library, so it's worth knowing how to bring data from files directly into a DataFrame. It is fairly straightforward to bring this in, similar to what we've used in the earlier sections, although pandas allows us to bring in data from text files and Excel (as well as a lot of other file formats). Try the following:

In [None]:
#RUN THESE NECESSARY INSTALLS BEFORE TRYING THE FOLLOWING LINES
!pip install pandas
!pip install openpyxl

In [None]:
import pandas as pd
station_df = pd.read_csv('station.txt', sep=' ', header=0)
station_df

You now have a DataFrame called `station_df` that contains all the information from the file `station.txt`. You can then work with the DataFrame as we discussed last week, pulling out values in the named columns as needed, using indexing, labels, etc. Note how similar the structure of the DataFrame is to the file structure (compare to the `!cat station.txt` cell above). This is very handy!

Importing data from Excel files is similarly easy:

In [None]:
!curl -O -s http://www.grapenthin.org/teaching/geop501/download/lab07/station.xlsx  # Download an Excel file

station_df_excel = pd.read_excel('station.xlsx')
station_df_excel

### Writing files

pandas DataFrames have `to_csv()` and `to_excel()` methods built in. To write `station_df` to a CSV file, for example, all we have to do is:

In [None]:
station_df.to_csv('station.csv', index=False)
!cat station.csv  # View the resulting file

Note that we used `index=False` above to avoid writing the "index" column which is the first column by default in any DataFrame.

## Deliverable 2: Practice with CSV file I/O

For this deliverable, you'll need to get the file `Bogoslof_SO2_per_event.csv` To do this, execute the following code cell:



In [10]:
!curl -O -s https://github.com/uafgeoteach/GEOS636_PAG/blob/master/labs/Bogoslof_SO2_per_event.csv

To verify that the file made it, execute


```
!ls
```

in a new code cell — you should see the filename listed. (We'll talk way more about commands like this one later in the semester!) You should also see the file in the Jupyter file browser.

### About the file

This CSV file contains information about the volcanic gases emitted by Bogoslof volcano during a series of more than 30 explosive eruptive events occurring in 2016–2017. Bogoslof Island is located in the southern Bering Sea (north of the Aleutian volcanic arc). From the Alaska Volcano Observatory (AVO) website:

> Bogoslof Island is the largest of a cluster of small, low-lying islands comprising the emergent summit of a large submarine stratovolcano.

The island itself is highly dynamic due to the eruptive and erosional processes constantly shaping it — here's a view from August 2017, courtesy Dave Withrow (NOAA/Fisheries).

![Oblique view of Bogoslof Island](https://www.avo.alaska.edu/images/dbimages/display/1503799877.jpg)

For each eruptive event, AVO calculated the mass of sulfur dioxide (SO<sub>2</sub>) emitted using satellite measurements. The provided file has columns of event number, eruption onset time, mass of SO<sub>2</sub> emitted (in kt), time of the satellite SO<sub>2</sub> measurement, and volcanic plume height. Note that 1 kt (kiloton) = 1000 metric tons, and 1 metric ton = 1000 kg.

### Your task

In a **new code cell below**, read this CSV into Python using pandas, modify the SO<sub>2</sub> mass column so that the units are in kg (and rename the column header to match!) and write out a new CSV file that reflects this modification.

**Notes:**

* Demonstrate that your output file reflects the requested modifications by typing `!head <filename>` where `<filename>` is the name of your output CSV file. This will show the first few lines of the CSV file.
* To rename a column of a pandas DataFrame, you may use the syntax: `df = df.rename(columns=<rename_dict>)` where `df` is your DataFrame and `<rename_dict>` is a dictionary with keys (strings) specifying the current column names and values (strings) specifying the desired names.
* Remember to use `index=False` when writing the new CSV to avoid adding the index column!

## Deliverable 3: I/O with your own file

For this last deliverable, we want you to practice on a file relevant to your own research. Find a text/CSV/Excel file associated with your research, and do the following:

1. Upload the file to Opensarlabs (see instructions in D2)
2. Read the file in using NumPy or pandas
3. Modify the file somehow, like we did in D2 — change the units, add or remove columns, and a row, etc. using NumPy and/or pandas tools
4. Write out the modified contents to a new file **of the same type as the input file** — that means Excel in, Excel out, for example!
5. Download the new file to your local machine (click the box next to the file in opensarlabs and click download in the top right)
6. Once you have the three files to submit saved on your local computer (this .ipynb file, the original input file, your modified output file) place them in a folder and "zip" that folder in a .zip file. On most computers this can be done by right-clicking the folder and choosing "compress". If you are having trouble with this ask for help.


Steps 2–4 should take place in a **new code cell** below.

---

🚨 **For this lab, you MUST submit a ZIP file containing three items: This notebook, your original file for D3, and your modified file for D3!** 🚨

**Note:** If you can't find any candidate file on your own computer, find an interesting text/CSV/Excel file on the internet, download it, and use it here. But we prefer for you to use something relevant to your research if at all possible!