<a href="https://github.com/theonaunheim">
    <img style="border-radius: 100%; float: right;" src="static/strawberry_thief_square.png" width=10% alt="Theo Naunheim's Github">
</a>
<br style="clear: both">
<hr>
<br>


<h1 align='center'>Files</h1>

<br>

<div style="display: table; width: 100%">
    <div style="display: table-row; width: 100%;">
        <div style="display: table-cell; width: 50%; vertical-align: middle;">
            <img src="static/file.png" width="200">
        </div>
        <div style="display: table-cell; width: 10%">
        </div>
        <div style="display: table-cell; width: 40%; vertical-align: top;">
            <blockquote>
                <p style="font-style: italic;">"A place for everything and everything in its place."</p>
                <br>
                <p>-Benjamin Franklin</p>
            </blockquote>
        </div>
    </div>
</div>

<br>

<div align='left'>
    Image courtesy of <a href='https://commons.wikimedia.org/wiki/File:Document_sans_information.svg'>PICOL</a> under the <a href='https://creativecommons.org/licenses/by/3.0/deed.en'>CC BY 3.0</a>
</div>

<hr>

# Generally

Files and folders are a pain to deal with manually. Luckily, Python provides ample facilities for moving, copying, renaming, reading, and otherwise mangling files. This provides some of the high-level facilities for interacting with a file system. For simple IO operations, please see the other presentations or [Reading and Writing Files](https://docs.python.org/3/tutorial/inputoutput.html#reading-and-writing-files).

---

# Modules covered

### Standard Library
* [os](https://docs.python.org/3/library/os.html)
* [pathlib](https://docs.python.org/3/library/pathlib.html)
* [shutil](https://docs.python.org/3/library/shutil.html)

### Third Party Libraries
* None


# Modules not covered

### Standard Library
* None

### Third Party Libraries
* None

---

In [1]:
# Python stdlib imports
import os
import pathlib
import shutil

# Third party imports


# os module

The os module is where you will start your journey, though pathlib is generally better.

### File and Folder Navigation / Examination

In [2]:
# Get your parent directory
orig_dir = os.getcwd()           # orig_dir = pathlib.Path().cwd()

# List contents
orig_dir_contents = os.listdir() # orig_dir_contents = list(pathlib.Path().rglob('*'))

print(
    'We are currently in "' +
    orig_dir +
    '", the contents of which are:\n' +
    '\n'.join(orig_dir_contents)
)

We are currently in "C:\Users\theon\Documents\devel\tutorials\journeys_in_jupyter\202_automation_basics", the contents of which are:
.ipynb_checkpoints
1_introduction.ipynb
2_web.ipynb
3_database.ipynb
4_files.ipynb
5_subprocesses.ipynb
6_other.ipynb
7_automation_exercises.ipynb
8_automation_solutions.ipynb
data
static


In [3]:
# Lets change directories!
sub_dir = 'data'
os.chdir(sub_dir)

# List contents of this directory
sub_dir_contents = os.listdir()    # (orig_dir / 'data').rglob('*')

print(
    'We are now in "' +
    sub_dir +
    '", the contents of which are:\n' +
    '\n'.join(sub_dir_contents)
)

We are now in "data", the contents of which are:
greenway_analysis.xlsx
iris_dataset.csv
sub1
sub2
test.sqlite3
user_file.csv


### File and Folder Creation / Deletion

In [4]:
# Now create a new folder
os.makedirs('subsub1')          # pathlib.Path().mkdir(orig_dir / 'data' / 'subsub1', parent=True)
print(os.listdir(), end='\n\n')

# Now get rid of the subfolder
# PYTHON WILL HAPPILY LET YOU SHOOT YOURSELF IN THE FOOT
# BE SURE YOU WANT TO DELETE THIS FOREVER IF YOU REMOVE IT
os.removedirs('subsub1')        # pathlib.Path().rmdir(orig_dir / 'data' / 'subsub1', parent=True)
print(os.listdir(), end='\n\n')

# You can also remove files with os.remove()
with open('my_file', 'w+') as f:
    pass
print(os.listdir(), end='\n\n') 

os.remove('my_file')
print(os.listdir(), end='\n\n') # pathlib.Path(orig_dir / 'data' / 'subsub1' / 'my_file').unlink()

# And go back to our original directory
os.chdir(orig_dir)

['greenway_analysis.xlsx', 'iris_dataset.csv', 'sub1', 'sub2', 'subsub1', 'test.sqlite3', 'user_file.csv']

['greenway_analysis.xlsx', 'iris_dataset.csv', 'sub1', 'sub2', 'test.sqlite3', 'user_file.csv']

['greenway_analysis.xlsx', 'iris_dataset.csv', 'my_file', 'sub1', 'sub2', 'test.sqlite3', 'user_file.csv']

['greenway_analysis.xlsx', 'iris_dataset.csv', 'sub1', 'sub2', 'test.sqlite3', 'user_file.csv']



# os.path

Work is hard. Let Python do it. Better yet, do it in pathlib.

In [5]:
directory_contents = os.listdir()
directory_contents.append('nonexistant_file')

# These give us back lists, so we can do all sorts of handy things
for item in directory_contents[-3:]:
    # Such as check for existence
    if os.path.exists(item):
        print(item + ' exists! ')
        absolute_path = os.path.abspath(item)
        print('Its absolute path is ' + absolute_path + '!')
    else:
        print('Tricksy coder. ' + item + ' does not exist.')
        continue
    # Check for the type and certain file attributes
    if os.path.isfile(item):
        print('It is a file!')
    # And determine if it is a directory or not
    if os.path.isdir(item):
        print('It is a directory!')
    print()

data exists! 
Its absolute path is C:\Users\theon\Documents\devel\tutorials\journeys_in_jupyter\202_automation_basics\data!
It is a directory!

static exists! 
Its absolute path is C:\Users\theon\Documents\devel\tutorials\journeys_in_jupyter\202_automation_basics\static!
It is a directory!

Tricksy coder. nonexistant_file does not exist.


### It also lets us avoid building paths from scratch

In [6]:
# 'This path will work on Linux, Mac, and Windows without regard to separators.
path = os.path.join('.', 'sub_folder_1', 'sub_folder_2')
print(os.path.abspath(path))
print()
print('Our separator is "' + os.sep + '" !')

C:\Users\theon\Documents\devel\tutorials\journeys_in_jupyter\202_automation_basics\sub_folder_1\sub_folder_2

Our separator is "\" !


# os.stat

os.stat gives us information about files.

In [7]:
iris_path = os.path.join(orig_dir, 'data', 'iris_dataset.csv')

stat_obj = os.stat(iris_path)

for attr in dir(stat_obj):
    if attr.startswith('st'):
        print(attr + ': ' + str(getattr(stat_obj, attr)))

st_atime: 1540778866.0382464
st_atime_ns: 1540778866038246400
st_ctime: 1540777562.297617
st_ctime_ns: 1540777562297616900
st_dev: 954603072
st_file_attributes: 32
st_gid: 0
st_ino: 1688849860384413
st_mode: 33206
st_mtime: 1540777562.297617
st_mtime_ns: 1540777562297616900
st_nlink: 1
st_size: 4775
st_uid: 0


# os.walk

os.walk lets us navigate the file system and perform arbitrary operations.

Anything walk can do glob can do better. Glob can do anything better than walk.

In [8]:
for root, folders, files in os.walk(orig_dir):
    os.chdir(root)
    print('Examining ' + os.path.split(root)[1] + ':')
    for file_path in files:
        if '.csv' in file_path:
            with open(file_path) as f:
                data = f.read(10)
            print('\tThe first 10 bytes of {}: {}'.format(file_path, data))

# Change working directory to original
os.chdir(orig_dir)

Examining 202_automation_basics:
Examining .ipynb_checkpoints:
Examining data:
	The first 10 bytes of iris_dataset.csv: sepal_leng
	The first 10 bytes of user_file.csv: Forename,S
Examining sub1:
	The first 10 bytes of root_vegetable_inventory_00.csv: destinatio
	The first 10 bytes of root_vegetable_inventory_01.csv: destinatio
	The first 10 bytes of root_vegetable_inventory_02.csv: destinatio
	The first 10 bytes of root_vegetable_inventory_03.csv: destinatio
	The first 10 bytes of root_vegetable_inventory_04.csv: destinatio
	The first 10 bytes of root_vegetable_inventory_05.csv: destinatio
	The first 10 bytes of root_vegetable_inventory_06.csv: destinatio
	The first 10 bytes of root_vegetable_inventory_07.csv: destinatio
	The first 10 bytes of root_vegetable_inventory_08.csv: destinatio
	The first 10 bytes of root_vegetable_inventory_09.csv: destinatio
Examining sub2:
Examining static:


# pathlib

Pathlib is like the above, but simpler.

In [9]:
# Get your documents folder
documents_folder = pathlib.Path('C://') / 'Users' / os.environ['USERNAME'] / 'Documents'

# Get a list of all excel files in that folder.
excel_paths = list(documents_folder.rglob('**/*.xlsx'))

try:
    # Examine file
    f = excel_paths[0]
    f_info = {
        'drive'      : f.drive,
        'parent'     : f.parent,
        'name'       : f.name,
        'stem'       : f.stem,
        'suffox'     : f.suffix,
        'uri'        : f.as_uri(),
        'parts'      : f.parts,
        'suffixes'   : f.suffixes,
        'is_file'    : f.is_file(),
        'is_dir'     : f.is_dir(),
        'first_bytes': f.read_bytes()[:10],
        'first_text' : f.read_text(encoding='latin-1')[:10],
    }
    # Print values of first excel file
    for key, value in f_info.items():
        print(f'{key}: {value}')
        
except IndexError:
    print("You don't appear to have any .xlsx files in your Documents folder.")

drive: C:
parent: C:\Users\theon\Documents\devel\tutorials\journeys_in_jupyter\104_basic_pandas_part_1\data
name: common_surnames.xlsx
stem: common_surnames
suffox: .xlsx
uri: file:///C:/Users/theon/Documents/devel/tutorials/journeys_in_jupyter/104_basic_pandas_part_1/data/common_surnames.xlsx
parts: ('C:\\', 'Users', 'theon', 'Documents', 'devel', 'tutorials', 'journeys_in_jupyter', '104_basic_pandas_part_1', 'data', 'common_surnames.xlsx')
suffixes: ['.xlsx']
is_file: True
is_dir: False
first_bytes: b'PK\x03\x04\x14\x00\x06\x00\x08\x00'
first_text: PK  


### Copying, moving, and deleting

Shutil and os have got you covered.

In [10]:
# Change to data folder for simplicity.
os.chdir('data')
os.listdir()

['greenway_analysis.xlsx',
 'iris_dataset.csv',
 'sub1',
 'sub2',
 'test.sqlite3',
 'user_file.csv']

In [11]:
# Lets copy test.sqlite3 to sub2
shutil.copy('test.sqlite3', 'sub2/test.sqlite3')
os.listdir('sub2')

['foundations_of_data_science.pdf',
 'JPM Big Data and AI Strategies.pdf',
 'test.sqlite3']

In [12]:
# Lets move it to a new filename
shutil.move('sub2/test.sqlite3', 'sub2/renamed.sqlite3')
os.listdir('sub2')

['foundations_of_data_science.pdf',
 'JPM Big Data and AI Strategies.pdf',
 'renamed.sqlite3']

In [13]:
# Why shutil doesn't have a remove function is above my pay grade
os.remove('sub2/renamed.sqlite3')
os.listdir('sub2')

['foundations_of_data_science.pdf', 'JPM Big Data and AI Strategies.pdf']

In [14]:
# Lets copy a directory tree
shutil.copytree('sub1', 'sub2/subsub1')
os.listdir('sub2/subsub1')

['root_vegetable_inventory_00.csv',
 'root_vegetable_inventory_01.csv',
 'root_vegetable_inventory_02.csv',
 'root_vegetable_inventory_03.csv',
 'root_vegetable_inventory_04.csv',
 'root_vegetable_inventory_05.csv',
 'root_vegetable_inventory_06.csv',
 'root_vegetable_inventory_07.csv',
 'root_vegetable_inventory_08.csv',
 'root_vegetable_inventory_09.csv']

In [15]:
# Lets copy a directory tree
shutil.rmtree('sub2/subsub1')
os.listdir('sub2')

['foundations_of_data_science.pdf', 'JPM Big Data and AI Strategies.pdf']

# Additional Learing Resources

* ### [Python Tutorial OS Interface](https://docs.python.org/3/tutorial/stdlib.html#operating-system-interface)

---

# Next Up: [Subprocesses](5_subprocesses.ipynb)

<img style="margin-left: 0;" src="static/subprocess.png" width="200">

<div align='left'>
    <br>
    Image courtesy of <a href='https://commons.wikimedia.org/w/index.php?search=split+lane&title=Special:Search&profile=default&fulltext=1&searchToken=dmt3fqeomz3cl82rr4p82nmwh#/media/File:Singapore_Road_Signs_-_Regulatory_Sign_-_Split_Way.svg'>Woodennature</a> under the <a href='https://creativecommons.org/licenses/by/3.0/'>CC BY 3.0</a>
</div>


---