Skip to content

Commit

Permalink
Create README.md
Browse files Browse the repository at this point in the history
  • Loading branch information
Yelun Bao committed Nov 5, 2019
1 parent 2b7bbaf commit 06acaf5
Showing 1 changed file with 287 additions and 0 deletions.
287 changes: 287 additions & 0 deletions fall19/lab-p9a/README.md
@@ -0,0 +1,287 @@
# Lab P9a: Files and Formats

In this lab, you'll get practice with files and formats, in
preperation for P9.

## File Vocabulary

For P9, you'll need to be familiar with the following
file-related terms to know what we're asking you to do.

Before we get started with the assignment, let's talk about the
distinction between these three terms, which will become important as
we go along.

* **Directory:** a collection of files. "Folder" is a less-technical synonym you've doubtless heard frequently.

* **File Name:** a name you can use for a file if you know what directory you're in. For example, `movies.csv`, `test.py`, and `main.ipynb` are examples of file names. Note that different files can have the same name, as long as those files are in different directories.

* **Path:** a more-complete name that tells you the file name AND what directory it is in. For example, `p8/main.ipynb` and `p9/main.ipynb` are examples of path names on a Mac, referring to a file named `main.ipynb` in the `p8` directory and a second file with the same name in the `p9` directory, respectively. Windows uses back-slashes instead of forward slashes, so on a Windows laptop the paths would be `p8\main.ipynb` and `p9\main.ipynb`. There may be more levels in a path to represent more levels of directories. For example, `courses\cs301\p8\test.py` refers to the `test.py` file in the `p8` directory, which is in the `cs301` directory, which is in the `courses` directory.

In Python, there's not a special type for file names or paths; we just
use regular strings instead.

## Practice

Create a new directory named `Labp9` and create a `main.ipynb` file there.

Let's start by doing some imports we'll need:

```python
import os, json, csv
```

### Files and Directories

Try running this cell to see the files and directories available
alongside your notebook (remember that "." is a shorthand referring to
the current directory):

```python
# cell 1
os.listdir(".")
```

Let's try creating a new directory to experiment in by running this cell:

```python
# cell 2
os.mkdir("fruit")
```

Now go back and manually rerun `cell 1` (when you called `listdir`).
Do you see the `fruit` directory this time?

Now click `Restart & Run All` from the `Kernel` menu. Do you notice
that there's an exception in the cell where you created the `fruit`
directory? This is because the directory already exists, and it is
not possible to create another with the same name.

There are two options for doing the `mkdir` in a way that won't cause
your notebook to fail in the case that the directory already exists.
To get familiar with them, replace the code with option 1 below, then
do a "Restart & Run All".

#### Option 1: try/except

```python
try:
os.mkdir("fruit")
except FileExistsError:
print("tried to create fruit, but it already existed")
```

Next, try option 2 as well.

#### Option 2: check beforehand

```python
if not os.path.exists("fruit"):
os.mkdir("fruit")
else:
print("did not try to create fruit because it already existed")
```

After making this directory, let's try creating a couple files in the directory. We'll need to
specify a path to the file. Run this cell to get a path:

```python
path = os.path.join("fruit", "apple.txt")
path
```

If you're on a Mac, you'll see `fruit/apple.txt`; on Windows, you'll
see `fruit\apple.txt`. Be careful! Use this way to create paths.
Never use the regular string join method we've learned, because that
will not work on everybody's computer.

Sometimes we also need to find the file name of a given path. There is also a method in module `os` that does the job. Now try to find the file name from the path we generated just now, with the method `os.path.basename`

```python
basename = os.path.basename(path)
basename
```

You will see that basename is the name of the file. There is also a method which finds the directory of the path. Try it!

```python
dirname = os.path.dirname(path)
dirname
```

### Reading and Writing files

Now let's create a file with that path:

```python
f = open(path, "w", encoding="utf-8")
f.write("apples are red\n")
f.close()
```

Did it work? Let's check:

```python
os.listdir("fruit")
```

Also, try using `idle` to find and open the `apple.txt` file.

Now copy and adapt the above code to create a `banana.txt` and
`orange.txt` file. You can decide what to write to these files.

Paste this code to a cell:

```python
def fruit_message(name):
f = open(os.path.join("fruit", name+".txt"), encoding="utf-8")
msg = f.read()
f.close()
return msg
```

What does `fruit_message("apple")` return? (try it!)

Try the other fruits too. What if you try getting the message for a
fruit that doesn't exist? Modify `fruit_message` so it returns "bad
fruit" in that scenario. Use the `mkdir` example from earlier for
inspiration.

### JSON

JSON allows us to represent various Python structures (e.g., dicts) as
strings. It is possible to save a string containing JSON data to a
file (one might call such a file a JSON file, even though there is
nothing special about the file except for its contents).

Saving Python data to a JSON file is a two step process (we'll soon
see how to make this a one-step process):

1. convert the dict (or other structure) to a string
2. write that string to a file

Let's try it:

```python
# Python structures
fruits = [
{"name": "apple", "count": 50, "tasty": True},
{"name": "watermelon", "count": 60, "tasty": False},
{"name": "kiwi", "count": 55, "tasty": True},
]
print("Python structs:", fruits)

# JSON string
json_str = json.dumps(fruits)
print("JSON string:", json_str)

# save to file
f = open(os.path.join("fruit", "summary.json"), "w", encoding="utf-8")
f.write(json_str)
f.close()
```

Open `summary.json` in `idle`. How many differences can you see
between JSON and the Python `fruit` list we wrote to create the structures?

Notice we had to call both `json_str = json.dumps(fruits)` and
`f.write(json_str)`. The `json.dump` function combines these two.
Try it! Replace `????` below to save some fruits of your choosing to
a file of your choosing.

```python
# Python structures
fruits = [
????
]
print("Python structs:", fruits)

# save to file
f = open(os.path.join("fruit", ????), "w", encoding="utf-8")
json.dump(fruits, f)
f.close()
```

Reading data back is also a two step process:

1. read from file to string
2. convert that string to JSON structures

Try it:

```python
f = open(os.path.join("fruit", "summary.json"), encoding="utf-8")
json_str = f.read()
f.close()

data = json.loads(json_str)
print(data)
```

Just like `json.dump(data, f)` is a shortcut for `json.dumps` and
`f.write`, `data = json.load(f)` is a shortcut for `f.read` and
`json.loads`. Try simplifying the above code by using this shortcut.

### CSV

Create a couple CSV files by running the following:

```python
f = open(os.path.join("fruit", "good.csv"), "w", encoding="utf-8")
f.write("fruit,count\n")
f.write("apple,10\n")
f.write("banana,3\n")
f.write("orange,0\n")
f.close()

f = open(os.path.join("fruit", "rotten.csv"), "w", encoding="utf-8")
f.write("fruit,count\n")
f.write("apple,10\n")
f.write("banana,3\n")
f.write("orange\n")
f.close()
```

There are different ways to read CSV files. Perhaps one of the
easiest is with a `csv.DictReader` object. A DictReader is created
based on a file object. A DictReader is an iterator object; it
produces a dictionary for each row of a CSV file, automatically using
the header of the CSV to determine the keys for the dicts.

Try it:

```python
f = open(os.path.join("fruit", "good.csv"), encoding="utf-8")
reader = csv.DictReader(f)
for row in reader:
print(row)
f.close()
```

You should see something like this:

```python
OrderedDict([('fruit', 'apple'), ('count', '10')])
OrderedDict([('fruit', 'banana'), ('count', '3')])
OrderedDict([('fruit', 'orange'), ('count', '0')])
```

What is an `OrderedDict`? It behaves just like the normal `dict` with
which you are familiar, but it keeps keys in a fixed order. The
important thing for now is that you can use it like a regular dictionary:

For example, try looking specific cells and printing them:

```python
f = open(os.path.join("fruit", "good.csv"), encoding="utf-8")
reader = csv.DictReader(f)
for row in reader:
print(row["fruit"], row["count"])
f.close()
```

Try changing the above code to read "rotten.csv" instead of
"good.csv". In "rotten.csv", there is a missing value for the count
in the orange row. How does `DictReader` handle this? For the
project, you'll need to write some code to skip CSV rows with missing
values.

0 comments on commit 06acaf5

Please sign in to comment.