# CS 122 Lecture 7: Using the file system

Learning Objectives:
By the end of this lesson, you should be able to:
1. Use the `os` module to write platform-independent scripts to access information about the file system
2. Copy files using the `shutil` module
3. Use the `glob` module to write Unix-type file search commands

**Note: To follow along, make a copy of this module in CS_122 folder created for Assignment 1**

### Import Modules for this Notebook
In the previous notebook introducing modules, we imported modules as we needed them. However, it is good practice to import all of the modules you need in your notebook (or other scripts) in one import block near the top of the file. For this notebook, we will use 4 modules:

In [None]:
# import the os, shutil, and glob modules
import os
import shutil
import glob

## Mounting Google Drive

In [None]:
from google.colab import drive
drive.mount('/content/drive')

Mounted at /content/drive


In [None]:
root_path = "./drive/My Drive/CS_122"
(os.path.exists(root_path))

True

# Part 1: The `os` module and paths

Python's built-in `os` module is a useful tool to acccess information about the file system. Many of the `os` functions mimic common shell scripting commands but return python objects that can be used in Python code. Let's take a look at few examples:

### The Current Working Directory

In [None]:
# check your current working directory
print(os.getcwd())

/content


Questions:
1. What kind of path is this (absolute or relative)?
> Absolute


2. What is the equivalent command in your terminal?
> pwd

### Contents of the Current Working Directory

In [None]:
# check the contents of your current working directory
print(os.listdir())

# check the contents in root path
print(os.listdir(root_path))

['.config', 'drive', 'sample_data']
['CS_122_Class', '.ipynb_checkpoints', 'CS122_HW_1.ipynb', 'Copy of  7_FileSystem.ipynb']


### Paths on your machine
Paths on your machine provide the address where certain data is stored. For example, the above list, we find that there is a directory called `CS_122_Class` in our present working directory. If we would like to provide a path in this folder, we just need to append `CS_122_Class` to our current path. However, different operating systems use different formats for the string representation of paths. The `os` module gives us a convenient way to write platform-independent paths.

In [None]:
# create an absolute path to the data folder
class_folder = os.path.join(root_path, 'CS_122_Class')

# print out the data folder path
print(class_folder)

# print the number of files in the data folder
print(len(os.listdir(class_folder)))

./drive/My Drive/CS_122/CS_122_Class
4


### &#x1F914; Mini-Exercise
Goal: Get a list of all Jupyter Notebooks we've written in CS 122 so far using the `os` module.

In [None]:
notebooks = []
for file_name in os.listdir(os.path.join(class_folder, 'Lectures')):
    if file_name[-6:]=='.ipynb':
        notebooks.append(file_name)

for notebook in notebooks:
    print(notebook)

1_a_Python_Basics.ipynb
2_Data_Types.ipynb
1_b_Strings.ipynb
3_Dictionaries.ipynb
4_Control_Flow.ipynb
5_Functions.ipynb
6_Modules.ipynb
7_FileSystem.ipynb


### Making new directories
The `os` module gives us the functionality to modify our file system. For example, we can make a new directory given an absolute (or relative) path.

In [None]:
# define a path for a new organized_data directory
organized_data = os.path.join(root_path, 'organized_data')

# make a new directory called organized_data in the present working directory and check if the data exists - only make it if it does not exist
if not os.path.exists(organized_data):
    os.mkdir(organized_data)

Question: What is the equivalent command in the terminal?
> mkdir

In [None]:
# Initialize and check data folder
data_folder = os.path.join(class_folder, 'data')

print(os.path.exists(data_folder))

True


# Part 2: The `shutil` module
The `shutil` mode provides the utility to make copies of files on your file system. There are three main functions used for copying files, as follows:

|  | copyfile | copy | copy2 |
| -- | -------- | ---- | ----- |
| Destination can be a directory | N | Y | Y |
| Copies permissions | N | Y | Y |
| Copies metadata | N | N | Y |

In [None]:
# define a path to the source data file 2023_0101.txt in data
src_path = os.path.join(data_folder, '2023_0101.txt')

# define a destination path to the current directory with the file name
dst_path = os.path.join(organized_data,'2023_0101.txt')

# try the copyfile method with the dst path
print(shutil.copyfile(src_path, dst_path))

# try the copy method with the dst_path or organized_data path
print(shutil.copy(src_path, organized_data))

# try the copy2 method with the dst path or organized_data path
print(shutil.copy2(src_path, organized_data))

./drive/My Drive/CS_122/organized_data/2023_0101.txt
./drive/My Drive/CS_122/organized_data/2023_0101.txt
./drive/My Drive/CS_122/organized_data/2023_0101.txt


In [None]:
os.path.exists('./drive/My Drive/CS_122/organized_data/2023_0101.txt')

True

### &#x1F914; Mini-Exercise

Given code creates monthly directories in the organized_data folder.

Modify the code below to make copies of the 2023 data in monthly directories in the `organized_data` directory

In [None]:
# make a new folder in the organized_data folder for each month in 2022
for file_name in os.listdir(data_folder):

    # check that the file is from 2023
    if file_name[:4]=='2023':

        # define the name of a new folder in the format YYYY_MM
        year_month = file_name[:7]

        # if this year_month is not yet in the organized_data directory, then make it
        if year_month not in os.listdir(organized_data):
            os.mkdir(os.path.join(organized_data, year_month))

        # make a copy of the file in the year_month folder
        # define the src_path and the dest_path
        # then, copy the file using one of the shutil functions

        src_path = os.path.join(data_folder, file_name)
        dest_path = os.path.join(organized_data, year_month, file_name)
        # Make a copy here
        print(shutil.copyfile(src_path, dest_path))

## Overview: Python Commands vs Unix Shell Commands

| Python | Unix | Purpose |
| ------ | ---- | ------- |
| os.getcwd() | pwd | Determine the current/present working directory |
| os.chdir() | cd | Change directory |
| os.mkdir() | mkdir | Make a directory |
| os.rename() | mv | Rename a file or move to a new location |
| os.listdir() | ls | List the files and folders in a directory |
| shutil.copy() | cp | Copy a file to a new location |

# Part 3: The `glob` module
When using Unix-type shell commands, wildcard symbols are extremely useful for finding and accessing subsets of files. There are 2 main wildcard symbols:

| symbol | use |
| ------ | --- |
| `?`    | Wildcard for a single symbol |
| `*`    | Wildcard symbol for any number of symbols |

Try these in the `data` directory in your shelf:
1. How would you determine the names of files that correspond to the first day of each month in 2023?
2. How would you determine the name of all files that correspond to December of 2023?

The `glob` module provides functionality to provide Unix-style searches of your file system.

In [None]:
# find all files names that correspond to the first day of each month in 2023
search_path = os.path.join(data_folder,'2023_??01.txt')
glob.glob(search_path)

['./drive/My Drive/CS_122/CS_122_Class/data/2023_0501.txt',
 './drive/My Drive/CS_122/CS_122_Class/data/2023_0701.txt',
 './drive/My Drive/CS_122/CS_122_Class/data/2023_0301.txt',
 './drive/My Drive/CS_122/CS_122_Class/data/2023_0101.txt',
 './drive/My Drive/CS_122/CS_122_Class/data/2023_0401.txt',
 './drive/My Drive/CS_122/CS_122_Class/data/2023_0201.txt',
 './drive/My Drive/CS_122/CS_122_Class/data/2023_1101.txt',
 './drive/My Drive/CS_122/CS_122_Class/data/2023_0801.txt',
 './drive/My Drive/CS_122/CS_122_Class/data/2023_1201.txt',
 './drive/My Drive/CS_122/CS_122_Class/data/2023_1001.txt',
 './drive/My Drive/CS_122/CS_122_Class/data/2023_0901.txt',
 './drive/My Drive/CS_122/CS_122_Class/data/2023_0601.txt']

In [None]:
# find all files in December 2023
search_path = os.path.join(data_folder,'2023_12*.txt')
glob.glob(search_path)

['./drive/My Drive/CS_122/CS_122_Class/data/2023_1227.txt',
 './drive/My Drive/CS_122/CS_122_Class/data/2023_1226.txt',
 './drive/My Drive/CS_122/CS_122_Class/data/2023_1218.txt',
 './drive/My Drive/CS_122/CS_122_Class/data/2023_1224.txt',
 './drive/My Drive/CS_122/CS_122_Class/data/2023_1230.txt',
 './drive/My Drive/CS_122/CS_122_Class/data/2023_1231.txt',
 './drive/My Drive/CS_122/CS_122_Class/data/2023_1225.txt',
 './drive/My Drive/CS_122/CS_122_Class/data/2023_1219.txt',
 './drive/My Drive/CS_122/CS_122_Class/data/2023_1221.txt',
 './drive/My Drive/CS_122/CS_122_Class/data/2023_1209.txt',
 './drive/My Drive/CS_122/CS_122_Class/data/2023_1208.txt',
 './drive/My Drive/CS_122/CS_122_Class/data/2023_1220.txt',
 './drive/My Drive/CS_122/CS_122_Class/data/2023_1222.txt',
 './drive/My Drive/CS_122/CS_122_Class/data/2023_1223.txt',
 './drive/My Drive/CS_122/CS_122_Class/data/2023_1212.txt',
 './drive/My Drive/CS_122/CS_122_Class/data/2023_1206.txt',
 './drive/My Drive/CS_122/CS_122_Class/d