# Using Python to Interact with the Operating System

## Module 1 Getting Your Python On
- **Kernel**: the core of a computer’s operating system. It talks directly to our hardware and manages our systems resources.
- **User space**: everything outside the kernel
- **Operation system**: kernel and user space
- **Interpreted** vs. **Compiled**
- **Shebang**: tells the operating system what command we want to use to execute a script
```py
#! /usr/bin/env python3
```
1. `__init__.py` files are required to make Python treat directories containing the file as packages.
    1. It marks the directory as a Python Package.
    2. It can contain initialization code for the Package.
2. Key points and pitfalls about **Automating**
    1. Scalability
    2. Pareto Principle: 80% of consequences come from 20% of causes
    3. bit-rot: process falling out of step because of the environment
    4. try to avoid silent failures
    5. periodic tests
    6. login the action (system log)



## Module 2 Managing Files with Python
1. `open()` `read()` `write()` `close()`

In [None]:
file_path = "./my_file"

f = open(file_path, 'r+')
print(f.read())
f.write("Hello, world!\n")
f.close()

# same as
with open(file_path, 'r+') as f:
    print(f.read())
    f.write("Hello, world!\n")
    # automatically close file after

2. os methods

In [None]:
import os

file_name = "sample.md"

# interact with files
os.remove(file_name)
os.rename("source", "destination")
os.path.exists(file_name) # return Boolean
os.path.getsize(file_name) # return in bytes
os.path.abspath(file_name) # return absolute path
os.path.getmtime(file_name) #return a timestamp, seconds since Unix OS publishing

import datetime
timestamp = os.path.getmtime(file_name)
datetime.datetime.fromtimestamp(timestamp) #return datetime.datetime


dir_path = "./"
new_dir_name = "new_dir"
# interact with directories
os.getcwd() # current work directory
os.mkdir(new_dir_name)
os.chdir(new_dir_name)
os.chdir("../")
os.rmdir(new_dir_name) # if it's empty
os.listdir() # return a list with all sub-directories and files
os.path.join(new_dir_name, file_name) # create a file path
os.path.isdir(file_path) # return True or False

- **Parsing**: analyzing a file content to correctly structure the data.
- **CSV**: comma separated values
3. csv methods

In [None]:
import csv

file_path = "./movies.csv"


# read as lists
with open(file_path, newline='') as csvfile:
    reader = csv.reader(csvfile)
    count = 0
    for row in reader:
        print(f"{count}: {'; '.join(row)}")
        count = count + 1

# read as dict
with open(file_path, newline='') as csvfile:
    reader = csv.DictReader(csvfile)
    count = 0
    for row in reader:
        print(f"{count}: {row}")
        count = count + 1


# write with list
# a list of movie data
movie_data = [
    ['Movie Title', 'Year', 'Gross Revenue'],
    ['Avengers: Endgame', 2019, 27902000000],
    ['Avatar', 2009, 27897000000],
    ['Titanic', 1997, 22082000000],
    ['Star Wars: The Force Awakens', 2015, 20533000000],
    ['Avengers: Infinity War', 2018, 20481000000]
]
# open a CSV file and write the data
with open(file_path, 'w', newline='') as csvfile:
    writer = csv.writer(csvfile)
    for row in movie_data:
        writer.writerow(row)


# write with dict
# a dict of movie data
movie_data = [
    {'Movie Title': 'Avengers: Endgame', 'Year': 2019, 'Gross Revenue': 27902000000},
    {'Movie Title': 'Avatar', 'Year': 2009, 'Gross Revenue': 27897000000},
]

with open(file_path, 'w', newline='') as csvfile:
    fieldnames = ['Movie Title', 'Year', 'Gross Revenue']
    writer = csv.DictWriter(csvfile, fieldnames=fieldnames)

    # Write the header row
    writer.writeheader()

    # Write the data rows
    for row in movie_data:
        writer.writerow(row)

## Module 3 Regular Expression
1. Regular Expression/regex/regexp: a text pattern
2. Use in bash
```bash
# find files under a path, -i (ignore-case)
grep -i string file_or_dir

# reserved characters
"." # for any character
grep l.rts file_or_dir # match alerts, blurts, flirts
"^" # start with.... "^" call caret.
grep ^fruit file_or_dir # match fruit, fruitcake, fruited
"$" # end with
grep cat$ ffile_or_dir # match bobcat, cat, ducat
```

3. Use in Python


In [18]:
import re

# re.search() only get the first one
print("aza: ", re.search(r"aza", "plaza"))
print("p.ng: ", re.search(r"p.ng", "Pangaea", re.IGNORECASE))
print("[a-z]way: ", re.search(r"[a-z]way", "the highway"))

# if it doesn't match, return None
print("[a-zA-Z0-9]way: ", re.search(r"[a-zA-Z0-9]way", "the way"))

# not h
print("[^h]ay: ", re.search(r"[^a]way", "the hay"))
# not h or 
print("[h|b]ay: ", re.search(r"[a|b]way", "the hay"))

# re.findall() return a list contain all the matched string

# repetition qualifiers
# *: repeat any times, including zero time
print("py.*n: ", re.search(r"py.*n", "python in space"))
print("py[a-z]*n: ", re.search(r"py[a-z]*n", "python in space"))
print("py[a-z]*n: ", re.search(r"py[a-z]*n", "pyn"))

# +: repeat at least once
# ?: optional (none or once)

# {x}: repeat x times
print("[a-zA-z]{5}: ", re.findall(r"[a-zA-z]{5}", "a scary ghost appeared"))
print("[a-zA-z]{5}\\b: ", re.findall(r"\b[a-zA-z]{5}\b", "a scary ghost appeared"))
# "{5,10}" # 5 to 10
# "{5, }" # more then 5
# "{ ,10}" # up to 10


# capturing groups
result = re.search(r"^(\w*), (\w*)$", "Hatsune, Miku")
print(result)
print(result.groups())
print(result[0])
print(result[1])
print(result[2])

# other methods
result = re.split(r"[,]", "Hatsune, Miku, is, cute!")
print(result)

result = re.sub(r"^([\w.-]*), ([\w.-]*)$", r"\2 \1", "Hatsune, Miku")
print(result)

aza:  <re.Match object; span=(2, 5), match='aza'>
p.ng:  <re.Match object; span=(0, 4), match='Pang'>
[a-z]way:  <re.Match object; span=(7, 11), match='hway'>
[a-zA-Z0-9]way:  None
[^h]ay:  None
[h|b]ay:  None
py.*n:  <re.Match object; span=(0, 9), match='python in'>
py[a-z]*n:  <re.Match object; span=(0, 6), match='python'>
py[a-z]*n:  <re.Match object; span=(0, 3), match='pyn'>
[a-zA-z]{5}:  ['scary', 'ghost', 'appea']
[a-zA-z]{5}\b:  ['scary', 'ghost']
<re.Match object; span=(0, 13), match='Hatsune, Miku'>
('Hatsune', 'Miku')
Hatsune, Miku
Hatsune
Miku
['Hatsune', ' Miku', ' is', ' cute!']
Miku Hatsune


4. escaping characters
```bash
"\n" # next line (special string character)
"\t" # tab (special string character)
"\w" # words (letters, numbers, and underscores)
"\d" # digits
"\s" # whitespace (space, tab, or newline)
"\b" # word boundaries
```

## Module 4 Managing Data and Process
1. **shell**: command-line interfaces, like Bash. Programs get executed inside a shell environment.
2. **command prompt**: characters used in a command-line interface to indicate readiness to accept commands.
3. **PATH** variable: where command-line look for programs.
4. data streams

In [27]:
# input()
print("Your input is: ", input("Please input and press Enter to confirm: "))

# get environment variable PATH
import os
print(os.environ.get("PATH", ""))

# get command-line arguments
import sys
argument = sys.argv # return a list
# first argv (argument[0]) will be command itself
# for example: C:\Windows\System32>"C:\Users\danny\Desktop\print_arg.py" cat
# argument == ['C:\\Users\\danny\\Desktop\\print arg.py', 'cat']

Your input is:  a
d:\3_Work\2_repos\2024_Google_Python_Recap\.venv\Scripts;C:\WINDOWS\system32;C:\WINDOWS;C:\WINDOWS\System32\Wbem;C:\WINDOWS\System32\WindowsPowerShell\v1.0\;C:\WINDOWS\System32\OpenSSH\;E:\0_Work\flutter\bin;C:\Program Files\dotnet\;C:\ProgramData\chocolatey\bin;C:\Program Files (x86)\NVIDIA Corporation\PhysX\Common;D:\3_Work\1_coding tool\Git\cmd;C:\Program Files\Microsoft SQL Server\150\Tools\Binn\;C:\Program Files\Microsoft SQL Server\Client SDK\ODBC\170\Tools\Binn\;D:\3_Work\1_coding tool\Python312\Scripts\;D:\3_Work\1_coding tool\Python312\;C:\Users\danny\AppData\Local\Microsoft\WindowsApps;D:\3_Work\1_coding tool\Microsoft VS Code\bin;C:\Users\danny\.dotnet\tools


5. exit status/return code: the value returned by a program to the shell.

in Bash:
```bash
echo $?
```

6. create exit status 

in Python:
```py
sys.exit(exit_status)
```

7. **subprocesses**: run a subprocess/child process in python, the parent process will be blocked until the child process finishing executed

in Python:
```py
import subprocess
sleep = subprocess.run(["python", "--version"], capture_output=True, text=True)
print(sleep.returncode) # Python 3.12.4
print(sleep.stdout) # 0
```

8. words counter
```py
names = {}
names[name] = names.get(name, 0) + 1 # if name not in names, create names[name] = 0 then +1
```
9. ```if __name__ == "__main__":```

## Module 5 Testing in Python
1. Edge cases: a special case that makes errors, for example: input empty string.
2. unittest module (can't run in ipynb)
```py
import unittest
def add(a, b):
    return a + b

class TestAddFunction(unittest.TestCase):
    def test_add_positive_numbers(self):
        self.assertEqual(add(1, 2), 3)

    def test_add_negative_numbers(self):
        self.assertEqual(add(-1, -2), -3)

    def test_add_mixed_numbers(self):
        self.assertEqual(add(1, -2), -1)
        self.assertEqual(add(-1, 2), 1)

unittest.main()

```
3. Test Concepts terms:
    1. white box test: the code is transparent. it is created alongside the development.
    2. black box test: the code is opaque. It can be created before the code.
    3. integration test: a test with the overall system (in the test environment).
    4. regression test: a test that is written after identifying a bug.
    5. smoke test: typically performed at the beginning to ensure that the most critical functions are working correctly
    6. load test: stressed test.
    7. test suit: combine different tests.
    8. test-driven development(TDD): creating the test before the code

4. Errors and Exceptions

In [None]:
try:
  f = open("demofile.txt")
  try:
    f.write("Lorem Ipsum")
  except:
    print("Something went wrong when writing to the file")
  finally:
    f.close()
except:
  print("Something went wrong when opening the file")

if "something" == "wrong":
   raise Exception("Sorry, something went wrong") # Raise an error and stop the program


# if the condition is False, raise the AssertionError and stop the program
assert "something" != "wrong", "Sorry, something went wrong"


## Module 6 Bash Scripting
1. common commands
```bash
# shebang
#! /bin/bash 
.. # parent directory
. # current directory
* # all the files in the current directory


cd dir_path # changes the current working directory
pwd # prints the current working directory
ls # lists the contents of the current directory
ls dir_path # lists the contents of the received directory
ls -l # lists the additional information for the contents of the directory
ls -a # lists all files, including those hidden
mkdir dir_name # creates the directory
rmdir dir_name # deletes the directory (if empty)
cp file_name target_name # copies file_name into target_name
mv file_name target_name # moves file_name into new_name
rm file_name: remove file_name
touch file_name # creates an empty file or updates the modified time if it exists
chmod modifiers file_name # change the permissions for the files according to the provided modifiers
chown new_owner file_name #  changes the owner of the files to the given user
chgrp new_group file_name # changes the group of the files to the given group
man command # shows the manual page of the given command


# Operating with the content of files
cat file_name # shows the content of the file through standard output
wc file_name # counts the number of characters, words, and lines in the given file
file file_name # prints the type of the given file, as recognized by the operating system
head file_name # shows the first 10 lines of the given file
tail file_name # shows the last 10 lines of the given file
less file_name # scrolls through the contents of the given file (press “q” to quit)
sort file_name # sorts the lines of the file alphabetically
cut option file_name # cutting out the sections from each line of files


# Managing streams
command > file # redirects standard output, overwrites the file
command >> file # redirects standard output, appends to file
command < file # redirects standard input from the file
command 2> file # redirects standard error to file
command1 | command2 # connects the output of command1 to the input of command2, "|" call pipe


# Operating with processes
Ctrl-c # finish the process cleanly with SIGING
Ctrl-z # temporary stop with SIGSTOP, fg to start again with 
ps # lists the processes executing in the current terminal for the current user (use for searching PID)
ps ax # lists all processes currently executing for all users
ps e # shows the environment for the processes listed
kill PID # sends the SIGTERM signal to the process identified by PID (terminate the process)
fg # causes a job that was stopped or in the background to return to the foreground
bg # causes a job that was stopped to go to the background
jobs # lists the jobs currently running or stopped
top # shows the processes currently using the most CPU time (press “q” to quit)


echo # print empty line
echo $variable # print variable
echo "This is variable: $(variable)" # print string and variable
variable=value # no space

# glob, can also being use in python glob module
echo *.py # print all .py files in the cwd
echo c* # print all files start with c in the cwd
echo ???.py # all .py files with three characters name

# condition
# if grep exit status == 0
if grep "the world" /etc/txt_file ; then 
    echo "found it"
else
    echo "not found"
fi
# must end with fi

if test -n "$PATH"; then echo "Path is not empty"; fi
# same as
if [ -n "$PATH" ]; then echo "Path is not empty"; fi
# space matter


# loop
n=0
command= $1 # get the first command line argument

while ! $command && [ $n -le 5 ]; do # condition: command != 0 and n <= 5
  sleep $n # wait a moment before try again
  ((n+=1))
  echo "Retry #$n"
done;
# must end with done


basename file_name.HTM .HTM # return index
# change files name
for file in *.HTM; do
  name=$(basename "$file" .HTM) # add "" to $file is to make sure spaces in the $file are included
  echo mv "$file" "$name.html" # use echo to print the command before actually modify files
done


cut -d' ' -f5- <file> | sort | uniq -c |sort -nr | head -5
# cut: cut file line 
# -d: by ' '
# -f5-: print field(column) number 5 and after
# sort: sort alphabetical
# unique: display each match once
# -c: count times it occurred
# sort -nr: sort numerically and in reverse order
# head: print first 10 lines
# -5: 5 lines (default is 10 lines)
```

## Module 7 Final Project
1. Steps for coding projects
    1. understand the problem statement
    2. research
    3. planning
    4. writing
### Project: Log Analysis Using Regular Expressions
1. Use regex to parse a log file
2. Append and modify values in a dictionary
3. Write to a file in CSV format
4. Move files to the appropriate directory for use with the CSV->HTML converter