# IPython shell commands

The 3 Laws of Automation:

1. Any task that is talked about being automated, will be automated
2. If it isn't, it's broken
3. If a human is doing it, a machine will eventually do it better

Learning Objectives:
- IPython shell commands
- Shell commands with subprocess
    - e.g. capturing the output of shell commands and sending as input to processes
- Walking the file system
    - e.g. find files matching a pattern or look for a specific file type
- Command-line functions
    - e.g. automate tasks using a library, run scripts in cron
    
## Using IPython with shell commands
    
To use shell commands, just precede it with `!`

In [None]:
# returns an SList datatype
!df -h

In [None]:
# we can assign the output to a python variable
ls = !ls

In [None]:
# the type is SList
type(ls)

> `!` only works in Jupyter, it will throw an error in python

## Passing python programs to the interpreter

There are two ways:
1. Passing a script to the Python interpreter

In [None]:
# Create a simple script 
!echo "print('hello world!')" > hello_world.py

In [None]:
!python hello_world.py

2. Passing a program to the Python interpreter via `-c`

In [None]:
!python -c "import datetime; print(datetime.datetime.now())"

## Using python and shell together

We can assign a shell variable to python.

In [None]:
# How many csvs exist in the previous course?
csvs = !ls -h ../../6_data_processing_in_shell/notes/*.csv
len(csvs)

In [None]:
# How many txts exist in the previous course?
txts = !ls -h ../../6_data_processing_in_shell/notes/*.txt
len(txts)

## Capture IPython Shell output

One of the most important principles of UNIX is that the OS should b+provide simple tools which can be combined to create sophisticated solutions.

1. Grab the output with `!`

In [None]:
# Grab the 5th col, filesize, and sum it for all printing the sum in the end
total_size = !ls -l | awk '{ SUM+=$5} END {print SUM}'

In [None]:
total_size

2. Grab the output with `%%bash --out output`

In [None]:
%%bash --out output
ls -l | awk '{ SUM+=$5} END {print SUM}'

In [None]:
type(output)

In [None]:
output

### Comparison

They are pretty similar but the first options returns an `SList` file type, which is very useful.

In [None]:
type(total_size)

### Capturing the STDERR

We might want to capture the standard error stream to debug errors later.

Saves the output into the variable `output`

In [None]:
%%bash --out output --err error
ls -l | awk '{ SUM+=$5} END {print SUM}'
echo "no error so far" >&2

We now captured the output and the error in different variables

In [None]:
error

In [None]:
output

## Automate with SList

The SList format comes from the need to interface python with IPython shell commands. An SList object comes by default with three methods:
- `fields`
- `grep`
- `sort`

### `fields`

`fields` simulates the `awk` command. 

In [None]:
ls = !ls -l /usr/bin

In [None]:
# Confirming it's an SList
type(ls)

In [None]:
# Grabbing just the modification dates for a few ls entries
# Collect whitespace-separated fields
ls.fields(1,5)[1:4]

### `grep`

`grep`-like operations on the output of a shell command.

In [None]:
ls = !ls -l /usr/bin

In [None]:
# Find utilities that will kill UNIX processes
ls.grep("kill")

### `sort`

Performs sorting on the output of a shell command.
- first argument is which column to sort on
- second argument is whether to sort by alphabetical or numerical values

In [None]:
disk_usage = !df -h

In [None]:
disk_usage.sort(5, nums = True)

### Python lists and SLists

We can use some methods from lists on SLists, like `pop`. It's also very easy to convert SLists to python lists with `list()`

In [None]:
list(disk_usage)

## Find our jupyter notebooks with `grep`


In [None]:
files = !ls ~/dev/stuff/sandbox/miguel

In [None]:
files.grep(".ipynb")

# Shell commands with subprocess

One of Python's strengths is the ability to glue itself to other languages and systems. There's a Python API for almost everything, including one to interact with the UNIX shell. We can for example:
- send data to UNIX processes
- listen to output
- kill processes

## subprocess.run

This is the simplest way to run shell commands in Python 3.5+. Takes a list of strings and runs the command without capturing the output.

In [None]:
import subprocess

subprocess.run(["ls", "-l"])

Dealing with Unicode in Python 3+ is more powerful but also more complex. Bytes strings need to be converted to `utf-8` to be processed further. This is accomplished with:

`regular_string = res.decode("utf-8")`

## Status codes

UNIX commands return a status code which represents the status of their completion. 
- `0` means successful
- non-zero means unsuccessful

In [None]:
# Printing the status code of the last run command
# Notice the 0 at the end: was successful
!ls -l; echo $?

In [None]:
# A non-successful example; didn't quite work?
!ls --bogus | echo $?

### Capturing the status code with the subprocess

In [None]:
## Notice how returncode is part of the CompletedProcess object
subprocess.run(["ls", "-l"])

In [None]:
## Successful example
subprocess.run(["ls", "-l"]).returncode

In [None]:
## Non-successful example
subprocess.run(["ls", "--lame"]).returncode

### Control flow for status codes

We can check for status codes in a control flow structure to account for possible errors.

In [None]:
good_user_input = "--lame"
out = subprocess.run(["ls", good_user_input])

In [None]:
out

In [None]:
if out.returncode == 0:
    print("Success")
else:
    print("Unsuccessful")

In [None]:
# Running two subprocesses from Python
import subprocess

# Execute Unix command `head` safely as items in a list
with subprocess.Popen(["head", "test.txt"], stdout=subprocess.PIPE) as head:
  
  # Print each line of list returned by `stdout.readlines()`
    for line in head.stdout.readlines():
        print(line)

    # Execute Unix command `wc -w` safely as items in a list
with subprocess.Popen(["wc", "-w", "test.txt"], stdout=subprocess.PIPE) as word_count:

    # Print the string output of standard out of `wc -w`
    print(word_count.stdout.read())

In [None]:
import subprocess

# Use subprocess to run the `ps aux` command that lists running processes
with subprocess.Popen(["ps", "aux"], stdout=subprocess.PIPE) as proc:
    process_output = proc.stdout.readlines()
    
# Look through each line in the output and skip it if it contains "python"
for line in process_output:
    if b"python" in line:
        continue
    print(line)

In [None]:
!ps aux

## Capturing the output of shell commands

The `subprocess.Popen` class is used to capture the output of a process. We can run the shell command in Python and capture its output.

In [11]:
from subprocess import Popen, PIPE, TimeoutExpired

with Popen(["ls"], stdout=PIPE) as proc:
    out = proc.stdout.readlines()
    
print(out)

[b'7_command_line_automation_in_python.ipynb\n', b'hello_world.py\n', b'test.txt\n']


### `with` statement

On exit, the `with` statement automatically waits for processes to finish and automatically closes the files descriptors.

### `communicate` method

The `communicate` method is another commonly used method to communicate with both `stdout` and `stderr`. If it exceeds a certain amount of time, it throws an exception which can be caught.

In [13]:
# Attempt to commuicat for up to 30 seconds

# try:
#     out, err = proc.communicate(timeout=30)
# except TimeoutExpired:
#     # kill the process since a timeout was triggered
#     proc.kill()
#     # capture both standard output and standard error
#     out, error = proc.communicate()

### Using `PIPE`

To execute a shell command and capture its output there are two required components:
1. `PIPE`
    - `PIPE` operates just like the UNIX pipe operator
    - Allows python to communicate with both the input and output of the process
2. `stdout`
    - Actual output of the command
    - Can be consumed in two ways:
        - `stdout.read()`: returns a string
        - `stdout.readlines()`: returns an iterator
        - `shell=False` always! 

In [None]:
# This is unsafe!
# Most engineers pass a variable as input to Popen and this
# would allow for arbitrary code to be executed
with Popen("ls -l /tmp", shell=True, stdout=PIPE) as proc:
    
# The correct way is this
# shell=False is default
with Popen(["ls", "-l", "/tmp"], shell=False, stdout=PIPE) as proc: 

There are other parameters which are commonly used with the Popen class:
- `stderr`: can be used to capture the output of errors

In [14]:
# Capture the error of an invalid command
with Popen(["ls", "--noflag"], shell=False, stdout=PIPE, stderr=PIPE) as proc: 
    print(proc.stderr.read())

b'ls: illegal option -- -\nusage: ls [-ABCFGHLOPRSTUWabcdefghiklmnopqrstuwx1] [file ...]\n'


### Getting a list of pip installs

In [1]:
from subprocess import Popen, PIPE
import json
import pprint

# Use the with context manager to run subprocess.Popen()
with Popen(["pip", "list", "--format=json"], stdout=PIPE) as proc:
    # Pipe the output of subprocess.Popen() to stdout
    result = proc.stdout.readlines()
    
# Convert the JSON payload to a Python dictionary
# JSON is a datastructure similar to a Python dictionary
converted_result = json.loads(result[0])

# Display the result in the IPython terminal
pprint.pprint(converted_result)

[{'name': 'agate', 'version': '1.6.1'},
 {'name': 'agate-dbf', 'version': '0.2.1'},
 {'name': 'agate-excel', 'version': '0.2.3'},
 {'name': 'agate-sql', 'version': '0.5.4'},
 {'name': 'ansiwrap', 'version': '0.8.4'},
 {'name': 'appnope', 'version': '0.1.0'},
 {'name': 'attrs', 'version': '19.3.0'},
 {'name': 'Babel', 'version': '2.8.0'},
 {'name': 'backcall', 'version': '0.1.0'},
 {'name': 'beautifulsoup4', 'version': '4.8.0'},
 {'name': 'bleach', 'version': '3.1.0'},
 {'name': 'cachetools', 'version': '4.0.0'},
 {'name': 'certifi', 'version': '2019.11.28'},
 {'name': 'chardet', 'version': '3.0.4'},
 {'name': 'Click', 'version': '7.0'},
 {'name': 'colorama', 'version': '0.4.3'},
 {'name': 'configparser', 'version': '4.0.2'},
 {'name': 'crayons', 'version': '0.3.0'},
 {'name': 'csvkit', 'version': '1.0.4'},
 {'name': 'cycler', 'version': '0.10.0'},
 {'name': 'dbfread', 'version': '2.0.7'},
 {'name': 'decorator', 'version': '4.4.1'},
 {'name': 'defusedxml', 'version': '0.6.0'},
 {'name':

### Dealing with a error process

In [19]:
# Start a long running process using subprocess.Popen()
proc = Popen(["sleep", "6"], stdout=PIPE, stderr=PIPE)

# Use subprocess.communicate() to create a timeout 
try:
    output, error = proc.communicate(timeout=5)
                                     
except TimeoutExpired:

    # Cleanup the process if it takes longer than the timeout
    proc.kill()

    # Read standard out and standard error streams and print
    output, error = proc.communicate()
    print(f"Process timed out with output: {output}, error: {error}")

Process timed out with output: b'', error: b''


### Finding duplicate files

We can calculate the hash of files to see if there are duplicates.

In [29]:
files = !ls

In [31]:
checksums = {}
duplicates = []

# Iterate over the list of files filenames
for filename in files:
    # Use Popen to call the md5sum utility
    with Popen(["md5sum", filename], stdout=PIPE) as proc:
        checksum, _ = proc.stdout.read().split()

        # Append duplicate to a list if the checksum is found
        if checksum in checksums:
            duplicates.append(filename)
        checksums[checksum] = filename

    print(f"Found Duplicates: {duplicates}")

Found Duplicates: []
Found Duplicates: []
Found Duplicates: []


## Sending input to processes

We can also send input to shell commands from python. There are two common ways of sending the output of a process as input to another from python:
1. `Popen`

In [33]:
proc1 = Popen(["process_one.sh"], stdout=PIPE)
Popen(["process_two.sh"], stdin=proc1.stdout)

2. `run` method (higher level abstraction)

In [None]:
# Simplifies boilerplate code
proc1 = run(["process_one.sh"], stdout=PIPE)
run(["process_two.sh"], input=proc1.stdout)

Passing input from one command to another is a classic UNIX paradigm.

### The string language of UNIX pipes

- Strings are the language of shell pipes
- Pass strings via STDOUT

In [34]:
!echo "never odd or even" | rev

neve ro ddo reven


### Translating between objects and strings

Shell returns strings for everything whereas python uses objects. We need a way of converting between one thing and the other.

To recap:
- Python objects contain:
    - data
    - methods
- UNIX strings are:
    - data only
    - often columnar

### User input

- Bash uses `read`
- Python uses `input`
- Python can also accept input from command-line libraries
- Subprocess can pipe input to scripts that wait for user input

In [36]:
import subprocess

# runs find command to search for files
find = subprocess.Popen(
    ["find", ".", "-type", "f", "-print"], stdout=subprocess.PIPE)

# runs wc and counts the number of lines
# this is how we pipe in subprocess
word_count = subprocess.Popen(
    ["wc", "-l"], stdin=find.stdout, stdout=subprocess.PIPE)

# print the decoded and formatted output
output = word_count.stdout.read()
print(output.decode("utf-8").strip())

4


## Passing arguments safely to shell commands

1. User input should never be trusted! 
    2. We should always assume that user input can be malicious.

In [37]:
# expecteed input to a script
#/some/dir

In [38]:
# actual input from malicious user
# some/dir && rm -rf /all/your/dirs

Recall:

In [None]:
# This is unsafe!
# Most engineers pass a variable as input to Popen and this
# would allow for arbitrary code to be executed
with Popen("ls -l /tmp", shell=True, stdout=PIPE) as proc:
    
# The correct way is this
# shell=False is default
with Popen(["ls", "-l", "/tmp"], shell=False, stdout=PIPE) as proc: 

However, if we *must* use `shell=True` we should always sanitise the input with the `shlex` module

In [40]:
import shlex

shlex.split("/tmp && rm -rf /all/my/dirs")

['/tmp', '&&', 'rm', '-rf', '/all/my/dirs']

In [43]:
# here we sanitise the malicious input preventing it from being run
directory = shlex.split("/tmp")
cmd = ["ls"]
cmd.extend(directory)
subprocess.run(cmd, shell=True)

CompletedProcess(args=['ls', '/tmp'], returncode=0)

Best practise is using a list, always. It limits the mistakes you can make.

### Best practices for security

- always use `shell=False`
- assume all users are malicious
- never use security by obscurity
- always use the principle of least privilege (postman only needs access to front-yard)
- reduce complexity

# Dealing with file systems

- Learning objectives:
    - learn how to walk a filesystem
    
In a file system, files are tipically created by:
- computer
    - log files
    - build artifacts
    - directory trees
    - structured data
    - unstructured data
    - ML models
- humans    
    - config files
    - user profile data
    - business documents
    - code
    - data science projects
    - ML models

The file system is a hierarchy and the `tree` command is good to visualise this.

In [46]:
!tree ../

[01;34m../[00m
├── [01;34mdatasets[00m
├── [01;34mnotes[00m
│   ├── 7_command_line_automation_in_python.ipynb
│   ├── hello_world.py
│   └── test.txt
└── [01;34mslides[00m

3 directories, 3 files


## `os.walk`

On each traversal, `os.walk` returns:
- `root`
- `dirs`
- `files`

in a python generator.

## Finding file extensions

To find just the file extensions, we can

In [54]:
fullpath = "7_command_line_automation_in_python.ipynb"
_, ext = os.path.splitext(fullpath)

In [55]:
ext

'.ipynb'

In [56]:
os.path.splitext(fullpath)

('7_command_line_automation_in_python', '.ipynb')

In [57]:
matches = []
# Walk the filesystem starting at the test_dir
for root, _, files in os.walk('../../../datacamp/'):
    for name in files:
        # Create the full path to the file by using os.path.join()
        fullpath = os.path.join(root, name)
        print(f"Processing file: {fullpath}")
        # Split off the extension and discard the rest of the path
        _, ext = os.path.splitext(fullpath)
        # Match the extension pattern .csv
        if ext == ".csv":
            matches.append(fullpath)
            
# Print the matches you find          
print(matches)

Processing file: ../../../datacamp/.DS_Store
Processing file: ../../../datacamp/README.md
Processing file: ../../../datacamp/1_introduction_to_data_engineering-1_introduction_to_data_engineering.ipynb
Processing file: ../../../datacamp/3_software_engineering_for_data_scientists_in_python/.DS_Store
Processing file: ../../../datacamp/3_software_engineering_for_data_scientists_in_python/datasets/yay_pep8.py
Processing file: ../../../datacamp/3_software_engineering_for_data_scientists_in_python/datasets/.DS_Store
Processing file: ../../../datacamp/3_software_engineering_for_data_scientists_in_python/datasets/nay_pep8.py
Processing file: ../../../datacamp/3_software_engineering_for_data_scientists_in_python/slides/.DS_Store
Processing file: ../../../datacamp/3_software_engineering_for_data_scientists_in_python/notes/.DS_Store
Processing file: ../../../datacamp/3_software_engineering_for_data_scientists_in_python/notes/3_software_engineering_for_data_scientists_in_python.ipynb
Processing fil

## Find files matching a pattern

- `Path.glob()`s main capabilities:
    - finds patterns in directories
    - yields matches
    - can recursively search
    
It is intuitive to write patterns in `glob`.

In [66]:
from pathlib import Path

# it does not search recursively inside
path = Path("../../5_introduction_to_shell/datasets/")
result = path.glob("*.csv")
result

<generator object Path.glob at 0x11cdacf50>

In [68]:
# note it returns a generator which is a key performance optimisation
[*result]

[PosixPath('../../5_introduction_to_shell/datasets/cities.csv'),
 PosixPath('../../5_introduction_to_shell/datasets/sales.csv'),
 PosixPath('../../5_introduction_to_shell/datasets/sales_2.csv')]

### Recursive glob patterns

In [71]:
from pathlib import Path

path = Path("../../../datacamp/")
# by using ** we are recursively searching
# one line of code traverses the whole file system
[*path.glob("**/*.csv")]

[PosixPath('../../../datacamp/5_introduction_to_shell/datasets/cities.csv'),
 PosixPath('../../../datacamp/5_introduction_to_shell/datasets/sales.csv'),
 PosixPath('../../../datacamp/5_introduction_to_shell/datasets/sales_2.csv'),
 PosixPath('../../../datacamp/5_introduction_to_shell/datasets/seasonal/autumn.csv'),
 PosixPath('../../../datacamp/5_introduction_to_shell/datasets/seasonal/spring.csv'),
 PosixPath('../../../datacamp/5_introduction_to_shell/datasets/seasonal/winter.csv'),
 PosixPath('../../../datacamp/5_introduction_to_shell/datasets/seasonal/summer.csv'),
 PosixPath('../../../datacamp/12_introduction_to_pyspark/datasets/airports.csv'),
 PosixPath('../../../datacamp/12_introduction_to_pyspark/datasets/planes.csv'),
 PosixPath('../../../datacamp/12_introduction_to_pyspark/datasets/flights.csv'),
 PosixPath('../../../datacamp/6_data_processing_in_shell/datasets/Spotify_Popularity_1.csv'),
 PosixPath('../../../datacamp/6_data_processing_in_shell/datasets/Spotify_Popularity.csv

### Using `os.walk` to find patterns

- `os.walk` pattern matching:
    - more explicit
    - can explicitly look at dirs or files
    - doesn't return `Path` object
    
It is a more low-level way of traversing.

In [76]:
import os

result = os.walk("/tmp")
# consume the generator
next(result)
# find the pattern here

('/tmp',
 ['com.epsecurity.dmg',
  'com.google.Keystone',
  'powerlog',
  'KSDownloadAction.sPCkxeJKy7',
  'tmp000013b4',
  'com.apple.launchd.AUUwN25ZrI',
  'com.apple.launchd.7zZ6Nw6FWy',
  'com.apple.launchd.wNS2e1pTx0',
  'com.apple.launchd.l2VcmXU8ED'],
 ['05CBD2EF-2465-460F-8461-2CAA90FE3300',
  '8D0C7BB0-4EE4-448D-B750-52C501E76DB0',
  '58380972-590B-4C06-9AB6-1A0EFDB17D0A',
  '7454626B-9AA6-44A2-BB44-6800A2A45DF1',
  'A8F1F488-5569-4783-B4A7-739172F57301',
  'com.adobe.AdobeIPCBroker.ctrl-miguel.carvalho',
  'AlTest1.out',
  'dlm_message_server_out#1',
  'E7E328B0-6CCC-4F6C-9A23-6752759A3DEE',
  'com.googlecode.munki.installatlogout',
  '96DA018E-92E4-4A97-B9E4-19A58C52C3B2',
  'CA80DBCD-063E-4107-9ECA-D540B2F132E6',
  'BC88B4FF-4F5C-4592-BE12-E9DFE3A66398',
  'DEA990DB-9D84-4A8E-99C3-857AC053C8A4',
  'freshservice_agent_status',
  '628D1BE6-1958-4D1F-9662-0FBF07FD2604',
  '92E1D3A0-0BF3-4559-A226-236598844EC9',
  'self_protect_comm',
  '08148B41-AAB7-4056-BA3E-67656B0E7F8A',
 

### Using `fnmatch`

`fnmatch.fnmatch` tests if a pattern is true or false which can be used to build simple UNIX wildcard matches.

- Supports UNIX shell wildcard matches
- Can be converted to a regular expression

In [80]:
import fnmatch

if fnmatch.fnmatch("file", "*.ipynb"):
    log.info(f"Found match {file}")

`fnmatch.translate` converts a pattern/UNIX wildcard to a regex expression.

In [82]:
import re, fnmatch

regex = fnmatch.translate("*.csv")
pattern = re.compile(regex)
print(pattern)

re.compile('(?s:.*\\.csv)\\Z')


In [83]:
pattern.match("titanic.csv")

<re.Match object; span=(0, 11), match='titanic.csv'>

## High-level file and directory operations

There are two main modules to assist with high-level file and directory operations:

- `shutil`: high-level operations
    - copy tree
    - delete tree
    - archive tree
- `tempfile`: generates temporary files and directories

### `shutil.copytree`

Can recursively copy a tree of files and folders

In [84]:
from shutil import copytree, ignore_patterns

In [88]:
# create dummy tree to copy
!mkdir sometree && touch sometree/somefile.txt && touch sometree/somefile.csv

In [91]:
copytree("sometree", "newtree", ignore=ignore_patterns("*.csv"))

'newtree'

In [92]:
!ls

7_command_line_automation_in_python.ipynb [34msometree[m[m
hello_world.py                            test.txt
[34mnewtree[m[m


In [93]:
!ls newtree/

somefile.txt


### `shutil.rmtree`

- can recursively delete tree of files and folders

In [94]:
from shutil import rmtree

# rmtree(source, destination)

### `shutil.make_archive`

In [95]:
from shutil import make_archive

In [96]:
# make_archive("somearchive", "gztar", "inside_tmp_dir")

## Using pathlib

Ability to use object-oriented file system paths

In [100]:
from pathlib import Path

path = Path("/tmp")
files = [*path.glob("*")][0:4]

# notice they are PosixPath objects
files

[PosixPath('/tmp/05CBD2EF-2465-460F-8461-2CAA90FE3300'),
 PosixPath('/tmp/8D0C7BB0-4EE4-448D-B750-52C501E76DB0'),
 PosixPath('/tmp/58380972-590B-4C06-9AB6-1A0EFDB17D0A'),
 PosixPath('/tmp/7454626B-9AA6-44A2-BB44-6800A2A45DF1')]

In [104]:
first_file = files[0]

In [105]:
# check the current working directory
first_file.cwd()

PosixPath('/Users/miguel.carvalho/dev/datacamp/7_command_line_automation_in_python/notes')

In [106]:
# check path exists
first_file.exists()

True

In [108]:
# return as a full path
first_file.as_posix()

'/tmp/05CBD2EF-2465-460F-8461-2CAA90FE3300'

In [110]:
# open a file directly from the object
some_file = Path("7_command_line_automation_in_python.ipynb")

with some_file.open() as file_to_read:
    print(file_to_read.readlines()[-1:])

['}\n']


In [111]:
# creating a directory with pathlib
tmp = Path("/Users/miguel.carvalho/dev/datacamp/7_command_line_automation_in_python/test")
tmp.mkdir()

In [113]:
!ls ../

[34mdatasets[m[m [34mnotes[m[m    [34mslides[m[m   [34mtest[m[m


In [117]:
# write text to files
# note how the file didn't even exist at this point
write_path = Path("../some_random_file.txt")
write_path.write_text("Wow")

3

In [118]:
print(write_path.read_text())

Wow


In [119]:
# renaming files
modify_file = Path("../some_random_file.txt")
modify_file.rename("../some_random_file_2.txt")

In [120]:
!ls ../

[34mdatasets[m[m               [34mnotes[m[m                  [34mslides[m[m                 some_random_file_2.txt [34mtest[m[m


# Command line functions

Python functions can be very helpful when automating.

## Decorators

Decorators are functions which wrap other functions and make them more powerful.

Decorators are incredibly powerful syntactic sugar for automation 

In [126]:
from functools import wraps
import time

def instrument(f):
    # this decorator print out the time the function it wraps takes to  
    # execute as well as the arguments it took and the wrapped function's
    # name
    @wraps(f)
    def wrap(*args, **kw):
        ts = time.time()
        results = f(*args, **kw)
        te = time.time()
        print(f"function: {f.__name__}, args: [{args}, {kw}] took: {te-ts} sec")
        return result
    return wrap

## How does a decorator work?

An important part of writing a decorator is to use the `wraps` function from the `functools` module to preserve the docstring and name of the function being wrapped.

In [130]:
from functools import wraps

def do_nothing_decorator(f):
    @wraps(f)
    def wrapper(*args, **kwds):
        print("INSIDE DECORATOR: This is called before the function")
        return f(*args, **kwds)
    return wrapper

@do_nothing_decorator
def hello_world():
    print("""This is a hello world function""")

In [131]:
hello_world()

INSIDE DECORATOR: This is called before the function
This is a hello world function


In [132]:
# note how the name of the function is preserved
print(f"Function name: {hello_world.__name__}")

Function name: hello_world


In [133]:
@instrument
def lazy_work(x, y, sleep=2):
    """Sleeps then works"""
    time.sleep(sleep)
    return x + y

In [136]:
lazy_work(1, 3, sleep=4)

function: lazy_work, args: [(1, 3), {'sleep': 4}] took: 4.004442930221558 sec


<generator object walk at 0x11e1c53d0>

Many automation tasks involve functions and decorators:
- `flask` web framework
- `click` command line tool framework
- `numba` open source JIT compiler
- custom profiling, tracing and timing

> Remember a decorator must return the the function is wraps. This is the last line of a decorator.

## Understand script input

`sys.argv` captures input to a script as a list

In [140]:
import sys

args = sys.argv
args

['/Users/miguel.carvalho/.pyenv/versions/3.7.5/envs/myenv/lib/python3.7/site-packages/ipykernel_launcher.py',
 '-f',
 '/Users/miguel.carvalho/Library/Jupyter/runtime/kernel-eb9bde70-85e5-46c4-8ba5-1e2ccfb38e86.json']

In [139]:
# grabbing the first argument
args[0]

'/Users/miguel.carvalho/.pyenv/versions/3.7.5/envs/myenv/lib/python3.7/site-packages/ipykernel_launcher.py'

In [148]:
!cat script.py


import sys

def hello(user_input):
	print(f"From a user: {user_input}")


if __name__ == "__main__":
	arg1 = sys.argv[1]
	hello(arg1)

In [147]:
!python script.py miguel

From a user: miguel


In [16]:
import subprocess

# runs python script that reverse strings in a file line by line
run_script = subprocess.Popen(
    # assuming we had the reverse.py script
    ["python", "reverseit.py", "i will be reversed"], stdout=subprocess.PIPE)

# print out the script output
for line in run_script.stdout.readlines():
    print(line)

b'desrever eb lliw i\n'


## Introduction to Click

`click` automate the difficult part of writing command-line tools.

- Python package for creating beautiful command line interfaces
- Three main features:
    - arbitrary nesting of commands
    - automatic help page generation (saves a lot of time!)
    - lazy loading of subcommands at runtime

### Basic click structure

In [19]:
# run this from the command line
!python click_example.py

Enter a phrase: ^C
Aborted!


In [22]:
import click
import random

# Create random values to choose from
values = ["Nashville", "Austin", "Denver", "Cleveland"]

# Select a random choice
result = random.choice(values)

# Print the random choice using click echo
# We can add color formatting as well! 
click.echo(f"My choice is: {result}")

My choice is: Cleveland


## Mapping functions to subcommands

In the example, an initial `click` application is created with the `cli` function. This is designed by the `click.group()` decorator.

In [26]:
# import click

# @click.group()
# def cli():
#     pass

# @cli.command()
# def one():
#     click.echo("One-1")

# @cli.command()
# def two():
#     click.echo("Two-2")
    
# if __name__ == "__main__":
#     cli()

### `click` utilities

- `click` utilities can:
    - generate colored output
    - generate paginated output
    - clear the screen
    - wait for key press
    - launch editors
    - write files

In [28]:
# write with click
with click.open_file("test.txt", 'w') as f:
    f.write("jazz flute")

`click` prints messages with `click.echo`. With it we can:
- generate colored output
- generate blinking or bold text
- print both unicode and binary data

### Testing `click` applications



In [29]:
import click
from click.testing import CliRunner

In [32]:
@click.command()
@click.argument('phrase')
def echo_phrase(phrase):
    click.echo(f"You said: {phrase}")

In [34]:
runner = CliRunner(
result = runner.invoke(echo_phrase, ["Have data will camp"])

assert == result.output == "You said: Have data will camp\n"

SyntaxError: invalid syntax (<ipython-input-34-e452db2975df>, line 4)