# File Handling in Python

## Introduction to the built-in `pathlib` library

`pathlib` is an object-oriented file system. What this means practically is
that it is a flexible, reusable and cross-os solution to handling file paths
with Python. Let's dive right in.

- You can see the dense documentation [here](https://docs.python.org/3/library/pathlib.html)

- And a less dense tutorial [here](https://www.geeksforgeeks.org/python/pathlib-module-in-python/)

First, import the `Path` class, which is the main way to interact with the `pathlib` library.

In [2]:
from pathlib import Path

#### Relative vs. Absolute File Paths
An important distinction when coding using file paths is to determine whether you should be using relative or absolute file paths. An absolute file path starts from the base of your file system (on windows it is a drive letter like C:/... and on a mac, it just starts with your root ~/...). Which kind of path you choose to use is up to your particular use case.

In [3]:
# Start by defining a relative path to the example files directory in this repository.
# A single dot indicates "here" and a double dot indicates "go back one level".
fp1 = Path("./example_files")
fp2 = Path("..extras/example_files")

# fp1 and fp2 point to the same place. You can use the .resolve method to convert a relative path to an absolute path to check this.
print(fp1, fp2)
print(fp1.resolve())
print(fp1.resolve() == fp2.resolve())

example_files ..extras\example_files
C:\Users\zvig\python_code\python_workshop_code\umdgeopy\extras\example_files
False


### Cross-os compatibility

The most painful thing about coding in file paths is when you all-of-a-sudden have to change operating systems and the way file paths are read changes. Luckily, `pathlib` accounts for this! There are two main types of OS file path types: Windows and POSIX. `pathlib` allows you to convert from a windows-like path to a posix-like path with the .as_posix() method. When creating a Path object, it will automatically detect your operating system and ensure that the path matches the file path style that is compatible.

In [4]:
print(fp1.resolve())  # Back slashes for a native windows OS
print(fp1.resolve().as_posix())  # Now it has forward slashes

C:\Users\zvig\python_code\python_workshop_code\umdgeopy\extras\example_files
C:/Users/zvig/python_code/python_workshop_code/umdgeopy/extras/example_files


### File search capabilities

Searching for files within a file tree is super easy! Just use the `.walk()` and `.glob()` methods!

In [5]:
for r, d, f in fp1.walk():  # iterates through all root directories, parent directories and files.
    for i in r.glob("*.txt"):  # searches for any files with the extenstion .txt in each root directory.
        print(i)  # Prints the file paths that fulfil the .glob pattern.

example_files\data0.txt
example_files\data1.txt
example_files\data2.txt
example_files\data3.txt
example_files\data4.txt
example_files\data5.txt
example_files\data6.txt
example_files\data7.txt
example_files\data8.txt
example_files\data9.txt
example_files\folder1\subdata1.txt


### Module File Path Building

When you need to make new file paths, just use existing `Path` objects like so!

In [None]:
fp3 = Path(fp1, "newdata0")
fp3_abs = fp3.resolve()
print(fp3_abs)
# You can also see all of the parts of your path separately
print(fp3_abs.parts)  # Prints all parts in a tuple of strings
print(fp3_abs.name)  # Prints just the very last part of the path
print(fp3_abs.parent)  # Prints the path one directory level above the end
print(fp3_abs.parent.parent.parent)  # And now three above
print(fp3_abs.drive)  # Prints the drive letter (for windows at least)

C:\Users\zvig\python_code\python_workshop_code\umdgeopy\extras\example_files\newdata0
('C:\\', 'Users', 'zvig', 'python_code', 'python_workshop_code', 'umdgeopy', 'extras', 'example_files', 'newdata0')
newdata0
C:\Users\zvig\python_code\python_workshop_code\umdgeopy\extras\example_files
C:\Users\zvig\python_code\python_workshop_code\umdgeopy
C:


### File and directory creation

You can also create files and directories using the `.touch()` and `.mkdir()` methods respectively.

In [None]:
fp3.mkdir(exist_ok=True)  # exist_ok=True argument specifies that it will still create the directory even if it already exists... careful! It will overwrite things!
fp4 = fp3.with_suffix(".txt")  # This adds an extension to a path.
fp4.touch()

### Regular Expressions

Going along with the theme of file handling is regular expressions or Regexes. These nifty little tools allow you to flexibly and reliable searching through strings of data. Regexes in python are implemented using the `re` library. It is a steep learning curve, but once you get the hang of it, they are super helpful! [See this helpful cheat sheet.](https://www.rexegg.com/regex-quickstart.php)

In [65]:
import re
mylongstr = "23085jfbaASeuhfp982hdhTHINGTHATIWANT2sdfawed8143ASDERF2095834"
myregex = re.compile(r'([0-9a-zA-Z]+)(THINGTHATIWANT)([0-9a-zA-Z]+)')
match = re.match(myregex, mylongstr)
if match is not None:
    print(match.groups())

('23085jfbaASeuhfp982hdh', 'THINGTHATIWANT', '2sdfawed8143ASDERF2095834')


### Practical Regex Example: FTIR .MAP File Parsing

In [66]:
import re
base_dir = Path("D:/misc_data/")
with open(Path(base_dir, "2025_10_6_10_42_17_nmnhlabradorite_arraymap1.map"), "rb") as f:
    b = f.read()

patt = re.compile(r'XPos=\s?-?\d+.?\d+,\sYPos=\s?\d+.?\d+,\sX=-?\d+.?\d+, Y=\s\d+.?\d+')
result = re.findall(patt, str(b))
with open("./FTIR_parsing.txt", "w") as f:
    for i in result:
        f.write(f"{i}\n")