# Files and directories

In [3]:
# import statements
import os
import random
import json
from json import JSONDecodeError

In [4]:
def read_json(path):
    with open(path, encoding="utf-8") as f:
        return json.load(f) # dict, list, etc

# data is a dict, list, etc
def write_json(path, data):
    with open(path, 'w', encoding="utf-8") as f:
        json.dump(data, f, indent=2)

### Review 1: What does sorted() return? 

In [3]:
d = {"Andy": [850, 955], "Meena": [1100, 1320], "Peyman": ["online"]}
t = (45, 32, 29)
sentence = "Meet me at the Sett"
my_list = sentence.split(" ")

# Predict the output, run the cell and confirm your answers
sorted_d = sorted(d)
print(sorted_d, type(sorted(d)))

sorted_t = sorted(t)
print(sorted_t, type(sorted(t)))

sorted_sentence = sorted(sentence)
print(sorted_sentence, type(sorted(sentence)))

sorted_my_list = sorted(my_list)
print(sorted_my_list, type(sorted(my_list)))

['Andy', 'Meena', 'Peyman'] <class 'list'>
[29, 32, 45] <class 'list'>
[' ', ' ', ' ', ' ', 'M', 'S', 'a', 'e', 'e', 'e', 'e', 'e', 'h', 'm', 't', 't', 't', 't', 't'] <class 'list'>
['Meet', 'Sett', 'at', 'me', 'the'] <class 'list'>


### Review 2: Does sorted return a new object instance or modify the existing object instance?

In [4]:
# sorted returns a brand new object instance, whereas sort() (list) method 
# modifies the existing object instance

### Review 3: Difference between + and append() method on lists 

In [5]:
listA = ["Wisconsin", "Madison"]
listB = ["Data" ,  "Science"]

print(listA + listB)        # + operator creates a brand new object instance
print(listA[1] + listB[1])  # just like + operator on strings creates a brand new string object instance
                            # recall that strings are immutable, so you don't have a choice there
listA.append(listB)         # append() method modifies the existing list object instance
print(listA)
print(listB)

['Wisconsin', 'Madison', 'Data', 'Science']
MadisonScience
['Wisconsin', 'Madison', ['Data', 'Science']]
['Data', 'Science']


## Learning Objectives

- Use built-in functions with files: `open`, `close`, `read`, `write`
- Alternate ways of reading file's contents: for, list
- Use these file-specific os module functions:
    - os.listdir
    - os.mkdir
    - os.path.exists 
    - os.path.isfile
    - os.path.isdir
    - os.path.join
- Use `try`/`except` blocks or a `with` statement to handle errors that may occur when using files

# File processing
- open(...) function call:
    - **IMPORTANT: you must pass relative path as argument to the open function call and you shouldn't hardcode "/" or "\\"**
- processing: either read or write contents
- file_object.close() function call:
    - **you should always close your files!**

<div>
<img src="attachment:File_sandwich.png" width="600"/>
</div>

## Reading data from a file

3 options:
- using read() function call:
    - returns file contents as one big string
- convert file object into a list
    - each line becomes an item within the list
- using for loop to iterate over every line

Using read() function call ...

In [6]:
sample_file = open("hello.txt")
data = sample_file.read()
sample_file.close()

data

'Hello CS220 / CS319 students :)\nHope you are having a wonderful day!\n'

What is the type of return value of read() function?

In [7]:
type(data)

str

Suppose "hello.txt" is inside a directory called "somedir", what will the relative path be?

In [8]:
# Windows: "somedir\hello.txt" <- by the end of today's lecture, we'll learn how not to hardcode this "\"
# MAC: "somedir/hello.txt" <- by the end of today's lecture, we'll learn how not to hardcode this "\"

Converting file objects into a list

In [9]:
sample_file = open("hello.txt")
data_list = list(sample_file)
sample_file.close()

data_list

['Hello CS220 / CS319 students :)\n', 'Hope you are having a wonderful day!\n']

What is the type of data_list?

In [10]:
type(data_list)

list

Using for loop to iterate over file object

In [11]:
sample_file = open("hello.txt")
for line in sample_file:
    print(line)
    print(type(line))
sample_file.close()

data_list

Hello CS220 / CS319 students :)

<class 'str'>
Hope you are having a wonderful day!

<class 'str'>


['Hello CS220 / CS319 students :)\n', 'Hope you are having a wonderful day!\n']

### Writing data into a file
- "w" mode in open(...) function call
    - **BE CAREFUL: every time you invoke open, you will overwrite the file's contents**
    - "\n" needs to be explicitly written

In [12]:
print("hello world!")
print("hello cs220!")

hello world!
hello cs220!


In [13]:
hello_file = open("cs220.txt", "w")
hello_file.write("Hi folks.")
hello_file.close()

In [14]:
hello_file = open("cs220.txt", "w")
hello_file.write("Hello CS220 / CS319 students.\n")
hello_file.write("Good luck with exam 2 preparation.")
hello_file.write("Ooops forgot newline\n")
hello_file.write("That's all folks!")
hello_file.close()

TODO: Let's read the contents from the file we just wrote. Use the three different styles of reading the content.**You will not see** `Hi folks`. Why? Remember "w" flags overwrites the contents of an existing file.

In [15]:
hello_file = open("cs220.txt")
data = hello_file.read()
hello_file.close()
data

"Hello CS220 / CS319 students.\nGood luck with exam 2 preparation.Ooops forgot newline\nThat's all folks!"

## os module functions

- os.mkdir 
- os.path.join
- os.listdir
- os.path.exists 
- os.path.isfile
- os.path.isdir 

In [16]:
os.mkdir("test_dir") 
# creates a new directory called "test_dir" inside current working directory

### os.path.join is a very important function, which enables portability of code
- portability enables you to write code in one OS platform and run it on another OS platform
- it enables you to avoid hard-coding "\\" or "/" in the relative path

In [17]:
path = os.path.join("test_dir", "file1.txt")
print(path)  
# what do you get? 

test_dir/file1.txt


Now let's use this to create a few files and directories for learning about other os module functions.

In [18]:
# creating file1.txt inside test_dir

f = open(path, "w")
f.write("file1\n")
f.close()

# creating file2.txt inside test_dir
f = open(os.path.join("test_dir", "file2.txt"), "w")
f.write("file2\n")
f.close()

# creating "test_dir_depth2" inside test_dir
os.mkdir(os.path.join("test_dir", "test_dir_depth2"))

# creating "file1_depth2.txt" and "file2_depth2.txt" inside "test_dir_depth2"
f = open(os.path.join("test_dir", "test_dir_depth2", "file1_depth2.txt"), "w")
f.write("file1_depth2\n")
f.close()

f = open(os.path.join("test_dir", "test_dir_depth2", "file2_depth2.txt"), "w")
f.write("file2_depth2\n")
f.close()

In [19]:
os.listdir(".") # recall that "." refers to the current working directory

['6.json',
 'nums.txt',
 'gen.py',
 'short.txt',
 'files_and_directories_template.ipynb',
 '1.json',
 '2.json',
 'cs220.txt',
 'test_dir',
 '3.json',
 '.ipynb_checkpoints',
 '4.json',
 'files_and_directories.ipynb',
 'hello.txt',
 '5.json']

Now, use listdir to look at contents of test_dir and test_dir_depth2.

In [20]:
os.listdir("test_dir")

['file2.txt', 'file1.txt', 'test_dir_depth2']

In [21]:
os.listdir(os.path.join("test_dir", "test_dir_depth2"))

['file2_depth2.txt', 'file1_depth2.txt']

Let's learn about os.path functions. os.path is a sub-module of os --- does not need importing again.

In [22]:
print(os.path.exists("hello.txt")) # yes
print(os.path.isfile("test_dir"))  # no
print(os.path.isdir("test_dir"))   # yes 

True
False
True


Try your own examples of `exists`, `isfile`, and `isdir`.

In [23]:
print(os.path.exists(os.path.join("test_dir", "file1.txt"))) # yes
print(os.path.exists("file1.txt")) # no, there isn't a file1.txt inside the current working directory
print(os.path.isdir(os.path.join("test_dir", "test_dir_depth2"))) # yes
print(os.path.isfile(os.path.join("test_dir", "test_dir_depth2"))) # no
print(os.path.isdir(os.path.join("test_dir", "test_dir_depth2", "file1_depth2.txt"))) # no
print(os.path.isfile(os.path.join("test_dir", "test_dir_depth2", "file1_depth2.txt"))) # yes

True
False
True
False
False
True


## Exception handling
- FileNotFoundError
- FileExistsError
    - ironically, used for directories, when using `os.mkdir()`
- JSONDecodeError
    - when json file has incorrect format

In [24]:
# let's figure out how to handle a command to open a file that does not exist

path = input("enter the name of the file to open:")
try:
    f = open(path, "r")  # "r" is for reading, but is the default
    d = f.read()
    print(d)
    f.close()
except FileNotFoundError as e:
    print(type(e))
    print(path, "could not be opened")

enter the name of the file to open:some_file.txt
<class 'FileNotFoundError'>
some_file.txt could not be opened


In [25]:
# Accidental execution of code containing mkdir twice

try:
    os.mkdir('new_test_dir')
except FileExistsError:
    print("Directory already exists!")

f = open(os.path.join('new_test_dir', 'out.txt'), 'w')
f.write('hi')
f.close()

In [26]:
# JSONDecodeError - requires import

# Steps:
# Get output of listdir
# Check for files with json extension
# Read each file's contents

files = os.listdir(".")

for some_file in files:
    if some_file.endswith(".json"):
        print("Trying to parse:", some_file, end = ": ")
        try:
            read_json(some_file)
            print("Parsing successful.") # you won't need this for your project solution
        except JSONDecodeError as e:
            print("Parsing failed.") # you won't need this for your project solution
            continue # move on to reading next file

Trying to parse: 6.json: Parsing failed.
Trying to parse: 1.json: Parsing successful.
Trying to parse: 2.json: Parsing failed.
Trying to parse: 3.json: Parsing failed.
Trying to parse: 4.json: Parsing successful.
Trying to parse: 5.json: Parsing failed.


### Python is all about shortening code. Is there a way to shorten the process of:
- opening a file
- handling any Errors while reading/writing
- closing the file

In [27]:
# we can use a 'with' statement to shorten our code
# no need to close, automatically taken care of

with open("some_numbers.txt", "w") as f: 
    for i in range(10):
        f.write(str(random.randint(1, 100)) + "\n")

## Case studies

### Sum example

In [28]:
path = "nums.txt"  
f = open(path, "w")

for i in range(1000000):
    f.write(str(random.randint(1,100)) + "\n")
f.close()

In [29]:
#Solution 1: bad solution because we do a lot of reading work before we do any addition (miss the bug)
f = open("nums.txt")
nums = list(f)
f.close()

total = 0
for num in nums:
    total += num

print(total)

TypeError: unsupported operand type(s) for +=: 'int' and 'str'

In [30]:
#Solution 2: better solution because start adding immediately after reading numbers from each line 
#(catch bugs quickly)
f = open("nums.txt")

total = 0
for num in f:
    total += num

print(total)
f.close()

TypeError: unsupported operand type(s) for +=: 'int' and 'str'

In [31]:
#Solution 2: with fix for the bug
f = open("nums.txt")

total = 0
for num in f:
    total += int(num)

print(total)
f.close()

50546440


### Self-study: Recursive file search

In [32]:
# program recursive file searcher

def recursive_dir_search(search_dir, search_file): 
    """
    find relative path to a target file
    """
    for curr in os.listdir(search_dir):   
        # build a path to this current thing
        curr = os.path.join(search_dir, curr) 
        
        #check if curr is a file
        if os.path.isfile(curr):
            #check if it contains the search name (substring match)
            if search_file in curr:        # base case...no recursive call
                return curr
        else:                              # recursive case!!
            return_value = recursive_dir_search(curr, search_file)
            if return_value != None:        # we found the file
                return return_value          
            # if not, we still need to keep looking within other sub directories
            
    # finished all recursive searching and never found it   
    return None       

# this function is like our main program
def dir_search(dir_name, file_name):
    if not os.path.exists(dir_name):
        print("Unable to find search_dir!")
    else:
        search_file = recursive_dir_search(dir_name, file_name)
        if search_file != None:
            print("Relative path to file:", search_file)
            f = open(search_file)
            data = f.read()
            f.close()
            print("File contents:")
            print(data)
        else:
            print(file_name, "not found!")

In [33]:
# Positive test case - file exists
dir_search(".", "file1.txt")

Relative path to file: ./test_dir/file1.txt
File contents:
file1



In [34]:
# Negative test case - file doesn't exist
dir_search(".", "some_file.txt")

some_file.txt not found!
