# Module 5: File system operations

In this module we will explore the interactions between your code and the file system you're working with. We will learn to execute all operations on files you would normally do by hand, such as copying, moving, renaming, creating and deleting files or folders. This will be done in a programmatic way, making sure the implementation is generic and not os-dependant.


In [1]:
import requests
import json
import os
import sys
import pathlib
import shutil

### Section 1: open

The built-in [```open```](https://docs.python.org/3/library/functions.html#open) function is used to interact with the content of a file. This could be any file, from human readable JSON data to compiled executables. In most cases, we are dealing with non-compiled data in some text format, containing data or configuration for the program.

In this section we will discover how we use the ```open``` function. [Here](https://docs.python.org/3/tutorial/inputoutput.html#reading-and-writing-files) is more reference if needed.

##### Exercise 1
We want to have some files in our system to interact with. Of course this file needs some contents, so we're going to use an API to load some JSON data, and save this to the disk.

The data we are working with comes from a simple cocktail recipe website, which allows you to get a recipe for a cocktail based on a search term. This will return a list of recipes for all cocktails that match this search term. In this example, we will collect all cocktails that are either a fizz, a sour, or punch.

Run the next celland check out the contents of `cocktail_data` to get an idea of what the data looks like

In [2]:
url = "https://www.thecocktaildb.com/api/json/v1/1/search.php"

cocktails = ["sour", "fizz", "punch"]

cocktail_data = {}

for cocktail in cocktails:

    params = {"s": cocktail}

    response = requests.get(url=url, params=params)
    
    cocktail_data[cocktail] = response.json()['drinks']

cocktail_data

{'sour': [{'idDrink': '11417',
   'strDrink': 'Gin Sour',
   'strDrinkAlternate': None,
   'strTags': 'Sour',
   'strVideo': None,
   'strCategory': 'Ordinary Drink',
   'strIBA': None,
   'strAlcoholic': 'Alcoholic',
   'strGlass': 'Whiskey sour glass',
   'strInstructions': 'In a shaker half-filled with ice cubes, combine the gin, lemon juice, and sugar. Shake well. Strain into a sour glass and garnish with the orange slice and the cherry.',
   'strInstructionsES': None,
   'strInstructionsDE': 'In einem Shaker, der halb mit Eiswürfeln gefüllt ist, Gin, Zitronensaft und Zucker mischen. Gut schütteln. In ein Sour Glas abseihen und mit der Orangenscheibe und der Kirsche garnieren.',
   'strInstructionsFR': None,
   'strInstructionsIT': "In uno shaker riempito a metà con cubetti di ghiaccio, unisci il gin, il succo di limone e lo zucchero.Filtrare in un bicchiere e guarnire con la fetta d'arancia e la ciliegia.Agitare bene.",
   'strInstructionsZH-HANS': None,
   'strInstructionsZH-HANT

As you can see this is a quite large amount of data to read through all at once. We are going to take steps to save this data to the drive in an organized structure.

The first step is to simply save this whole response into a JSON file. To do so we can use the `open` function, which allows us to create and open a file.

[Find the `open` functionality online](https://www.google.com/search?hl=en&q=python%20open) and fill in the arguments for the function. You can name the file whatever you want. Then, using the `write` method of the opened file object, write the raw json data to the file.

In [None]:
## Fill in ___

with open(___) as cocktails_json_file:
    raw_cocktail_data = json.dumps(cocktail_data)
    
    # write raw json data to file
    ___
    
del cocktail_data

In [3]:
# Answer
with open('cocktails.json', 'w') as cocktails_json_file:
    raw_cocktail_data = json.dumps(cocktail_data)
    
    cocktails_json_file.write(raw_cocktail_data)
    
del cocktail_data

You can see the use of the `with` keyword in this code cell. This keyword allows you to execute certain code within a context, in this case the opening of a file, without worrying about exceptions that might happen in your code. You can also do file I/O using the following structure:

```python

file = open('filename.txt', 'r')

text = file.read()

file.close()

```

If in this code, the `file.read()` line throws an exception, the `file.close()` part is never reached, so the file is not closed properly, which can result in corrupting data. The `with` approach will always take care of this, no matter what happens in the block.

##### Exercise 2

Now we've succesfully saved some JSON data to the disk. But lets say we made a mistake in what data we actually want. We also want to include 'flip' cocktails in the data. 

Run the next cell to get this new cocktail data into `new_cocktail_data`

In [4]:
params = {"s": "flip"}

response = requests.get(url=url, params=params)

new_cocktail_data = {'flip': response.json()['drinks']}

We want to add this data to the already existing file. To do so, we need to _change_ the contents of the file. To do so, we need to first _read_ the current contents of the file, then _merge_ the current content with the new content, and then _write_ the result back to that file.

This means we have to do two seperate file operations.

In [None]:
## Fill in ___

# Open the file in read mode:
with open(___) as cocktails_json_file:
    # read raw json data:
    raw_cocktail_data = ___
    
    cocktail_data = json.loads(raw_cocktail_data)

cocktail_data.update(new_cocktail_data)

# Open the file in write mode:
with open('cocktails.json', 'w') as cocktails_json_file:
    raw_cocktail_data = json.dumps(cocktail_data)
    
    # write update json to file
    ___


In [5]:
## Answer

# Open the file in read mode:
with open('cocktails.json', 'r') as cocktails_json_file:
    raw_cocktail_data = cocktails_json_file.read()
    
    cocktail_data = json.loads(raw_cocktail_data)

cocktail_data.update(new_cocktail_data)

# Open the file in write mode:
with open('cocktails.json', 'w') as cocktails_json_file:
    raw_cocktail_data = json.dumps(cocktail_data)
    
    cocktails_json_file.write(raw_cocktail_data)

### Section 2: os

In this section we are going to use the hierarchicak structure of the file system. To do so, we need the built-in [`os`](https://docs.python.org/3/library/os.html) library. This library offers functionality for listing files, removing and creating files and folders, navigating folders, and much more.



##### Exercise 1

In the last section we've pulled some data from an API and saved it to the disk. Of course, we want to get closer to a nice structural way of saving our data, and not just dump everything in one file. Lets delete this file for now. To do so we need the `remove` function from the [`os`](https://docs.python.org/3/library/os.html) library. Find out how to use this function, and delete the file we've created before:

In [None]:
## Your code here:


In [6]:
## Answer
os.remove('cocktails.json')

#### Exercise 2

Let's say we want to save each type of cocktail (sour, fizz, punch and flip) to their own seperate folder. Think about how you would do that manually first, and then take a look at the approach we will take:

- Step 1: Pull apart the Cocktail data into the 4 categories
- Step 2: Create the 4 folders (think about doing this dynamically, all the information we need is in the JSON. Look up how to create folders in the python docs)
- Step 3: Save the correct data to a new file in the correct folder

The resulting file hierarchy will need to look something like this:

```
- sour
    - sour_cocktails.json
- fizz
    - fizz_cocktails.json
- punch
    - punch_cocktails.json
- flip
    - flip_cocktails.json
```


In [None]:
## Your code here:


In [7]:
## Answer

for cocktail_type in cocktail_data:
    # step 1: get correct part from json data
    cocktail_type_data_raw = json.dumps(cocktail_data[cocktail_type])
    
    # step 2: create folder
    os.mkdir(cocktail_type)
    
    # step 3: write data to file
    with open(f'{cocktail_type}/{cocktail_type}_cocktails.json', 'w') as cocktail_file:
        cocktail_file.write(cocktail_type_data_raw)

#### Exercise 3

In the previous exercise, you might have run into something unexpected: if you simply put the name of the output file as an argument in the open function, the file will not end up in the correct folder. After all, the program has no way to determine in which folder you want to save the file. By default, the `open` function will alway look at files in the current working directory. When we saved the JSON file, we saved it in the working directory, and when we read from the file, we opened it from the current working directory.

You can check what directory that is with the following code:

In [8]:
os.getcwd()

'/Users/youngmavericks/de_fundamentals/DE_fundamentals/Module 5; File system operations'

The way we read or write to a file that's not in the current working directory is by simply giving the whole path as an argument instead of simply the filename. In our case this would look like:

```python
open('fizz/fizz_cocktails.json', 'w')
```

Of course, you would want to implement this path dynamically. We could use simple string formatting using [f-strings](https://peps.python.org/pep-0498/) for that:

```python
open(f'{cocktail_type}/{cocktail_type}_cocktails.json', 'w')
```

In case your code from the previous exercise does not put the files in the correct place, go back and fix it.

You can use the following code to check if all the files are there:

In [9]:
for cocktail_type in cocktail_data:
    print(cocktail_type, os.listdir(cocktail_type))

sour ['sour_cocktails.json']
fizz ['fizz_cocktails.json']
punch ['punch_cocktails.json']
flip ['flip_cocktails.json']


The `open` function accepts _absolute_ as well as _relative_ paths.

Relative paths is what we have worked with untill now. They define the locations of files and folders _relative_ to the current working directory, denoted by `./`.

An example (in our case) of a relative path would be: `./sour/sour_cocktails.json` (same as `sour/sour_cocktails.json`)

Absolute paths are always defined starting from the _root_ of your filesystem, denoted by `/`. For example, the `os.getcwd()` returns an absolute path.

To make sure you understand the difference, play around with the [`os.listdir`](https://docs.python.org/3/library/os.html#os.listdir) function in the cell below and explore your own filesystem.

In [10]:
## Relative
print(os.listdir())

print(os.listdir('./'))

print(os.listdir('../'))

print(os.listdir('./sour/'))

## Absolute

print(os.listdir('/'))

print(os.listdir(os.getcwd()))

['sour', 'flip', 'Module 5; File system operations.ipynb', 'punch', '.ipynb_checkpoints', 'fizz']
['sour', 'flip', 'Module 5; File system operations.ipynb', 'punch', '.ipynb_checkpoints', 'fizz']
['.DS_Store', 'Module 3; XML', 'README.md', '.gitignore', 'Module 5; File system operations', 'templates', 'Module 2; API', '.git', 'Module 1; JSON', '.idea']
['sour_cocktails.json']
['home', 'usr', 'bin', 'sbin', '.file', 'etc', 'var', 'Library', 'System', '.VolumeIcon.icns', 'private', '.vol', 'Users', 'Applications', 'opt', 'dev', 'Volumes', 'tmp', 'cores']
['sour', 'flip', 'Module 5; File system operations.ipynb', 'punch', '.ipynb_checkpoints', 'fizz']


#### Exercise 4

TODO: reduce wall of text

In theory, it's always good practice to use absolute paths when reading or writing files, since this way you can use your code from anywhere. Say you are developing a module that does some file I/O with config files. This module could be called from anywhere meaning the current working directory could be anywhere, but the location of the config files should be a fixed location. That's why you would use absolute paths.

A relatively easy way to make sure your paths are always relative, is by either defining a fixed directory, or by dynamically contructing the absolute path.

Say we want the exact same functionality as before, saving 4 JSON's in 4 different folders, but this time we would use only absolute paths. We can achieve this by concatenating the output of `os.getcwd()` with the relative path:

```python
open(f'{os.getcwd()}/{cocktail_type}/{cocktail_type}_cocktails.json', 'w')
```

This is beginning to look a bit messy, and it's also not universal. For example on Windows, the path seperators are `\`, not `/`. The `os` library has functionality to contruct these paths dynamically, namely the [`os.path`](https://docs.python.org/3/library/os.path.html#module-os.path) module.

For example, we can build the path of one of our files with `os.path.join('sour', 'sour_cocktails.json')`. This function will always build the correct path, independant of operating system.

In the next step, we again want to delete all files and folder we've created so we can enter the next stage. To do so, we can use the [`shutil.rmtree`](https://docs.python.org/3/library/shutil.html#shutil.rmtree) functionality, which removes a folder and all it's contents. However, this function needs an absolute path.

In the cell below, find a way to get the abolute paths of the folders we've created using `os.path.join` (remeber that `os.getcwd()` returns an absolute path).

In [None]:
## Fill in here
for cocktail_type in cocktail_data:
    shutil.rmtree(___)


In [11]:
## Answer
for cocktail_type in cocktail_data:
    shutil.rmtree(os.path.join(os.getcwd(), cocktail_type))

### Section 3: paths

Now we want to make the final step in getting all of our data into a nice and clear format, where we want each cocktail to be in its own file. Our goal is to achieve the following file structure:

```
- sour
    - gin_sour.json
    - rum_sour.json
    - ...
- fizz
    - gin_fizz.json
    - royal_fizz.json
    - ...
- punch
    - wine_punch.json
    - aztec_punch.json
    - ...
- flip
    - porto_flip.json
    - brandy_flip.json
    - ...
```


#### Exercise 1

Using only absolute paths and `os.path.join` to build those paths, create this file structure and save the correct data into the correct files.

In [None]:
## Your code here

In [16]:
## Answer

for cocktail_type in cocktail_data:    
    os.mkdir(cocktail_type)
    
    for cocktail in cocktail_data[cocktail_type]:
        cocktail_filename = cocktail['strDrink'].lower().replace(' ', '_') + '.json'
        
        path = os.path.join(os.getcwd(), cocktail_type, cocktail_filename)
        
        with open(path, 'w') as cocktail_file:
            cocktail_file.write(json.dumps(cocktail))

FileExistsError: [Errno 17] File exists: 'sour'

### Section 4: pathlib

Although the functionality of `os.path` is very nice, 