> **Note:** Please copy and save each code block below in a .py file. Then execute on command line using syntax:
> ``` python <your_file>.py argument1 argument2 . . . ```


# The CLI
A command line interface (CLI) provides a way for a user to interact with a program running in a text-based shell interpreter. Some examples of shell interpreters are Bash on Linux or Command Prompt on Windows. A command line interface is enabled by the shell interpreter that exposes a command prompt. It can be characterized by the following elements:

* A command or program
* Zero or more command line arguments (which comprise of options, arguments and subcommands)
* An output representing the result of the command
* Textual documentation referred to as usage or help

In the command line shell of the operating system, the arguments that are given after the name of the program are known as Command Line Arguments. The complexity of the command line ranges from the ability to pass a single argument, to numerous arguments and options, much like a [Domain Specific Language](https://en.wikipedia.org/wiki/Domain-specific_language). For example, some programs may launch web documentation from the command line or start an [interactive shell interpreter](https://docs.python.org/tutorial/interpreter.html#interactive-mode) like Python.

The two following examples with the Python command illustrate the description of a command line interface:
```console
 $ python -c "print('I Like Python')"
 I Like Python
```
In this first example, the Python interpreter takes option -c for **command**, which says to execute the Python command line arguments following the option -c as a Python program.

Another example shows how to invoke Python with -h to display the help:
```console
$ python -h
usage: python3 [option] ... [-c cmd | -m mod | file | -] [arg] ...
Options and arguments (and corresponding environment variables):
-b     : issue warnings about str(bytes_instance), str(bytearray_instance)
         and comparing bytes/bytearray with str. (-bb: issue errors)
[ ... complete help text not shown ... ]
```

# Command Line Arguments in Python
Python command line scripts also provide various ways of capturing and extracting these types of arguments / user-inputs. These argument values can then be used to modify the behaviour of our python program. 

For example, We can implement a Python program, main.py:

In [None]:
# main.py
import sys

if __name__ == "__main__":
    print(f"Arguments count: {len(sys.argv)}")  # The command line arguments are stored in a list called sys.argv.
    
    # enumerate(), when applied to an iterable, returns an enumerate object that can emit pairs associating the index of 
    # an element in sys.arg to its corresponding value. This allows looping through the content of sys.argv without 
    # having to maintain a counter for the index in the list.
    for i, arg in enumerate(sys.argv): 
        print(f"Argument {i:>6}: {arg}")

Execute main.py as follows:

```console
$ python main.py Python Command Line Arguments
Arguments count: 5
Argument      0: main.py
Argument      1: Python
Argument      2: Command
Argument      3: Line
Argument      4: Arguments
```

In the above execution, you are passing arguments, *Python*, *Command*, *Line*, and *Arguments* to *main.py* which is executed using the command *python*.

The three most common ways to read CLI arguments in python are:

* Using sys.argv
* Third party libraries
    * getopt module
    * Using argparse module
    * Other 

## sys.argv
The sys module provides functions and variables used to manipulate different parts of the Python runtime environment. This module provides access to some variables used or maintained by the interpreter and to functions that interact strongly with the interpreter.

The underlying support for all Python command line arguments is provided by the variable ```sys.argv``` which is a simple list structure. It’s main purpose are:

* It is a list of command line arguments.
* len(sys.argv) provides the number of command line arguments.
* argv[0] contains the name of the current Python program.
* argv[1:], the rest of the list, contains any and all Python command line arguments passed to the program.

The following example demonstrates the content of sys.argv:

In [7]:
# argv.py
import sys  #  imports the internal Python module sys.

print(f"Name of the script      : {sys.argv[0]}")  # extracts the name of the program by accessing 
                                                    # the first element of the list sys.argv.
print(f"Arguments of the script : {sys.argv[1:]}")  # displays the Python command line arguments by 
                                        # fetching all the remaining elements of the list sys.argv.

Name of the script      : /Library/Frameworks/Python.framework/Versions/3.7/lib/python3.7/site-packages/ipykernel_launcher.py
Arguments of the script : ['-f', '/Users/saquib/Library/Jupyter/runtime/kernel-744c2c72-cac3-4ab3-98aa-8979305e7609.json']


Run the above script in console:
```console
$ python argv.py un deux trois quatre
Name of the script      : sys.argv[0]='argv.py'
Arguments of the script : sys.argv[1:]=['un', 'deux', 'trois', 'quatre']
```
To summarize, sys.argv contains all the argv.py Python command line arguments. When the Python interpreter executes a Python program, it parses the command line and populates sys.argv with the arguments.

**Example 1:** Let’s suppose there is a Python script for adding two numbers and the numbers are passed as command-line arguments.

In [5]:
# Python program to demonstrate 
# command line arguments 
  
import sys 
  
# total arguments 
n = len(sys.argv) 
print("Total arguments passed:", n) 
  
# Arguments passed 
print("\nName of Python script:", sys.argv[0]) 
  
print("\nArguments passed:", end = " ") 
for i in range(1, n): 
    print(sys.argv[i], end = " ") 
      
# Addition of numbers 
Sum = 0
for i in range(1, n): 
    if sys.argv[i].isdigit(): # check if the argument is a number
        Sum += int(sys.argv[i]) 
      
print("\n\nResult:", Sum) 

Total arguments passed: 3

Name of Python script: /Library/Frameworks/Python.framework/Versions/3.7/lib/python3.7/site-packages/ipykernel_launcher.py

Arguments passed: -f /Users/saquib/Library/Jupyter/runtime/kernel-21ba57d7-e9ad-4413-893d-5bd0aaa2a0b7.json 

Result: 0


**Example 2:** The example reverse.py reverses the first argument passed at the command line:

In [8]:
# reverse.py

import sys

arg = sys.argv[1]  # fetches the first argument of the program stored at index 1 of sys.argv. Remember 
                    # that the program name is stored at index 0 of sys.argv.
print(arg[::-1])  # prints the reversed string. args[::-1] is a Pythonic way to use a slice operation to 
                   # reverse a list.

f-


To execute:
```console
$ python reverse.py "Real Python"
nohtyP laeR
```

> **Note:** Note that surrounding the multi-word string "Real Python" with quotes ensures that the interpreter handles it as a unique argument, instead of two arguments. You’ll delve into argument separators in a later section. 

### Mutating sys.argv
sys.argv is globally available to your running Python program and is immutable ie parts of the program or modules can change the value in sys.argv. 

Observe what happens if you tamper with sys.argv:

In [9]:
# argv_pop.py

import sys

print(sys.argv)
sys.argv.pop()  # You invoke .pop() to remove and return the last item in sys.argv.
print(sys.argv)

['/Library/Frameworks/Python.framework/Versions/3.7/lib/python3.7/site-packages/ipykernel_launcher.py', '-f', '/Users/saquib/Library/Jupyter/runtime/kernel-744c2c72-cac3-4ab3-98aa-8979305e7609.json']
['/Library/Frameworks/Python.framework/Versions/3.7/lib/python3.7/site-packages/ipykernel_launcher.py', '-f']


Execute the script above:
```console
$ python argv_pop.py un deux trois quatre
['argv_pop.py', 'un', 'deux', 'trois', 'quatre']
['argv_pop.py', 'un', 'deux', 'trois']
```
Notice that the fourth argument is no longer included in sys.argv. Therefore it is better to store the sys.argv arguments in a separate variable in the beginning of your program as follows:

In [None]:
# argv_var_pop.py

import sys

print(sys.argv)
args = sys.argv[1:]
print(args)
sys.argv.pop()
print(sys.argv)
print(args)

Above, even though *sys.argv* lost its last element, *args* has been safely preserved. Since *args* isn’t global, and you can pass it around to parse the arguments per the logic of your program. 

### Escaping Whitespace Characters
In the reverse.py example you saw earlier, the first and only argument is "Real Python", and the result is "nohtyP laeR". The argument includes a whitespace separator between "Real" and "Python", but it is treated as a single argument as we have we have enclosed it in " " to escape it.

Different operating system CLIs have different ways to escape arguments which is discussed as follows:

**Escape in Linux CLI**

In Linux, whitespaces can be escaped by doing one of the following:

* Surrounding the arguments with single quotes (')
* Surrounding the arguments with double quotes (")
* Prefixing each space with a backslash (\)

Without one of the escape solutions, reverse.py stores two arguments, "Real" in sys.argv[1] and "Python" in sys.argv[2]:
```console
$ python reverse.py Real Python
laeR
```
The output above shows that the script only reverses "Real" and that "Python" is ignored. To ensure both arguments are stored, you’d need to surround the overall string with double quotes (").

You can also use a backslash (\) to escape the whitespace:
```console
$ python reverse.py Real\ Python
nohtyP laeR
```
With the backslash (\), the command shell exposes a unique argument to Python, and then to reverse.py.

**Escape in Unix CLI**

In Unix shells, the internal field separator (IFS) shell variable, pre-defines the characters which the CLI will treat as delimiters. The content of the shell variable, IFS, can be displayed by running the following command:
```console
$ printf "%q\n" "$IFS"
$' \t\n'
```
From the result above, ' \t\n', we can identify three delimiters:
* Space (' ')
* Tab (\t)
* Newline (\n)

Therefore, unix will treat the space in 'real python' as a delimiter and as a result, 2 separate blocks of text arguments 

```console
$ python reverse.py Real Python
```

Prefixing a space with a backslash (\) bypasses the default behavior of the space as a delimiter in the string "Real Python". This results in one block of text as intended, instead of two.

We can change the value of IFS and define our own delimiter
```console
$IFS = ;
$ python reverse.py Real;Python
```
since, ; is a delimiter, again CLI will treat Real and Python as 2 separate blocks.


**Escape in windows CLI**
In Windows, the whitespace interpretation can be managed by using a combination of double quotes. It’s slightly counterintuitive because, in the Windows terminal, a double quote (") is interpreted as a switch to disable and subsequently to enable special characters like **space, tab, or pipe (|)**.

As a result, when you surround more than one string with double quotes, the Windows terminal interprets the first double quote as a command to **ignore special characters** and the second double quote as one to **interpret special characters**.

With this information in mind, it’s safe to assume that surrounding more than one string with double quotes will give you the expected behavior, which is to expose the group of strings as a single argument. To confirm this peculiar effect of the double quote on the Windows command line, observe the following two examples:

```console
C:/>python reverse.py "Real Python"
nohtyP laeR
```

In the example above, you can intuitively deduce that "Real Python" is interpreted as a single argument. However, realize what occurs when you use a single double quote:

```console
C:/>python reverse.py "Real Python
nohtyP laeR
```

The command prompt passes the whole string "Real Python" as a single argument, in the same manner as if the argument was "Real Python". In reality, the Windows command prompt sees the unique double quote as a switch to disable the behavior of the whitespaces as separators and passes anything following the double quote as a unique argument.

For more information on the effects of double quotes in the Windows terminal, check out [A Better Way To Understand Quoting and Escaping of Windows Command Line Arguments](http://www.windowsinspired.com/understanding-the-command-line-string-and-arguments-received-by-a-windows-program/).

### Handling Errors
Python command line arguments are **loose strings**. Many things can go wrong, so it’s a good idea to provide the users of your program with some guidance in the event they pass incorrect arguments at the command line. For example, reverse.py expects one argument, and if you omit it, then you get an error:
```console
$ python reverse.py
Traceback (most recent call last):
  File "reverse.py", line 5, in <module>
    arg = sys.argv[1]
IndexError: list index out of range
```
The Python [exception](https://realpython.com/python-exceptions/) IndexError is raised, and the corresponding [traceback](https://realpython.com/python-traceback/) shows that the error is caused by the expression arg = sys.argv[1]. The message of the exception is list index out of range. You didn’t pass an argument at the command line, so there’s nothing in the list sys.argv at index 1.

This is a common pattern that can be addressed in a few different ways. For this initial example, you’ll keep it brief by including the expression arg = sys.argv[1] in a try block. Modify the code as follows:
```python
# reverse_exc.py

import sys

try:
    arg = sys.argv[1]  # The expression is included in a try block.
except IndexError:
    raise SystemExit(f"Usage: {sys.argv[0]} <string_to_reverse>")  # This line raises the built-in exception SystemExit.
print(arg[::-1])
```
If no argument is passed to reverse_exc.py, then the process exits with a status code of 1 after printing the usage. Note the integration of sys.argv[0] in the error message. It exposes the name of the program in the usage message. Now, when you execute the same program without any Python command line arguments, you can see the following output:
```console
$ python reverse.py
Usage: reverse.py <string_to_reverse>

$ echo $?
1
```
reverse.py didn’t have an argument passed at the command line. As a result, the program raises SystemExit with an error message. This causes the program to exit with a status of 1, which displays when you print the special variable $? with echo.

## The Anatomy of Python Command Line Arguments
The syntax for command line arguments follows some standards that are regularly used by developers while implementing a command line interface.

Python command line arguments are a subset of the command line interface. They can be composed of different types of arguments:

1. **Options** modify the behavior of a particular command or program.
2. **Arguments** represent the source or destination to be processed.
3. **Subcommands** allow a program to define more than one command with the respective set of options and arguments.

Before we go deeper into the different types of arguments, we’ll get an overview of the accepted standards that have been guiding the design of the command line interface and arguments. These have been refined since the advent of the computer terminal in the mid-1960s.

### Standards
A few available standards provide some definitions and guidelines to promote consistency for implementing commands and their arguments. These are the main UNIX standards and references:

* [POSIX Utility Conventions](http://pubs.opengroup.org/onlinepubs/9699919799/basedefs/V1_chap12.html)
* [GNU Standards for Command Line Interfaces](https://www.gnu.org/prep/standards/standards.html#Command_002dLine-Interfaces)
* [docopt](http://docopt.org/)

The standards above define guidelines and nomenclatures for anything related to programs and Python command line arguments. The following points are examples taken from those references:

#### POSIX:
* A program or utility is followed by options, option-arguments, and operands.
* All options should be preceded with a hyphen or minus (-) delimiter character.
* Option-arguments should not be optional.

#### GNU:
* The GNU standards are very similar to the POSIX standards but provide some modifications and extensions below:
* All programs should support two standard options, which are --version and --help.
* Long-named options are equivalent to the single-letter Unix-style options. An example is --debug and -d.

### docopt:
* Short options can be stacked, meaning that -abc is equivalent to -a -b -c.
* Long options can have arguments specified after a space or the equals sign (=). The long option --input=ARG is equivalent to --input ARG.

> **Note:** You don’t need to follow those standards rigorously. Instead, follow the conventions that have been used successfully for years since the advent of UNIX. If you write a set of utilities for you or your team, then ensure that you stay consistent across the different utilities.

In the following sections, you’ll learn more about each of the command line components, options, arguments, and sub-commands.

### Options
An **option**, sometimes called a **flag** or a **switch**, is intended to modify the behavior of the program. For example, the command ls on Linux lists the content of a given directory. Without any arguments, it lists the files and directories in the current directory:
```console
$ cd /dev
$ ls
autofs
block
bsg
btrfs-control
bus
char
console
```

Let’s add a few options. You can combine -l and -s into -ls, which changes the information displayed in the terminal:
```console
$ cd /dev
$ ls -ls
total 0
0 crw-r--r--  1 root root       10,   235 Jul 14 08:10 autofs
0 drwxr-xr-x  2 root root             260 Jul 14 08:10 block
0 drwxr-xr-x  2 root root              60 Jul 14 08:10 bsg
0 crw-------  1 root root       10,   234 Jul 14 08:10 btrfs-control
0 drwxr-xr-x  3 root root              60 Jul 14 08:10 bus
0 drwxr-xr-x  2 root root            4380 Jul 14 15:08 char
0 crw-------  1 root root        5,     1 Jul 14 08:10 console
```

An **option** can take an argument, which is called an **option-argument**. See an example in action with [od](https://en.wikipedia.org/wiki/Od_%28Unix%29) below:
```console
$ od -t x1z -N 16 main
0000000 7f 45 4c 46 02 01 01 00 00 00 00 00 00 00 00 00  >.ELF............<
0000020
```
**od** stands for **octal dump**. This utility displays data in different printable representations, like octal (which is the default), hexadecimal, decimal, and ASCII. In the example above, it takes the binary file main and displays the first 16 bytes of the file in hexadecimal format. The option -t expects a type as an option-argument, and -N expects the number of input bytes.

In the example above, -t is given type x1, which stands for hexadecimal and one byte per integer. This is followed by z to display the printable characters at the end of the input line. -N takes 16 as an option-argument for limiting the number of input bytes to 16.

### Arguments
**arguments** are also called **operands** or [parameters](https://en.wikipedia.org/wiki/Parameter#Computing) in the POSIX standards. The arguments represent the source or the destination of the data that the command acts on. For example, the command [cp](https://en.wikipedia.org/wiki/Cp_%28Unix%29), which is used to copy one or more files to a file or a directory, takes at least one source and one target:
```console
$ ls main
main

$ cp main main2

$ ls -lt
main
main2
...
```
In line 4, cp takes two arguments:

1. **main:** the source file
2. **main2:** the target file
It then copies the content of main to a new file named main2. Both main and main2 are arguments, or operands, of the program cp.

### Subcommands
The concept of **subcommands** isn’t documented in the POSIX or GNU standards, but it does appear in **docopt**. The standard Unix utilities are small tools adhering to the [Unix philosophy](https://en.wikipedia.org/wiki/Unix_philosophy). Unix programs are intended to be programs that [do one thing and do it well](https://en.wikipedia.org/wiki/Unix_philosophy#Do_One_Thing_and_Do_It_Well). This means no subcommands are necessary.

By contrast, a new generation of programs, including [git](https://git-scm.com/), [go](https://golang.org/), [docker](https://www.docker.com/), and [gcloud](https://cloud.google.com/sdk/gcloud/), come with a slightly different paradigm that embraces subcommands. They’re not necessarily part of the Unix landscape as they span several operating systems, and they’re deployed with a full ecosystem that requires several commands.

Take git as an example. It handles several commands, each possibly with their own set of options, option-arguments, and arguments. The following examples apply to the git subcommand branch:

* **git branch** displays the branches of the local git repository.
* **git branch custom_python** creates a local branch custom_python in a local repository.
* **git branch -d custom_python** deletes the local branch custom_python.
* **git branch --help** displays the help for the git branch subcommand.

In the Python ecosystem, pip has the concept of subcommands, too. Some [pip](https://en.wikipedia.org/wiki/Pip_%28package_manager%29) subcommands include *list*, *install*, *freeze*, or *uninstall*.

### Windows
On Windows, the conventions regarding Python command line arguments are slightly different, in particular, those regarding [command line options](https://en.wikipedia.org/wiki/Command-line_interface#Option_conventions_in_DOS,_Windows,_OS/2). To validate this difference, take *tasklist*, which is a native Windows executable that displays a list of the currently running processes. It’s similar to ps on Linux or macOS systems. Below is an example of how to execute tasklist in a command prompt on Windows:

```console
C:/>tasklist /FI "IMAGENAME eq notepad.exe"

Image Name                     PID Session Name        Session#    Mem Usage
========================= ======== ================ =========== ============
notepad.exe                  13104 Console                    6     13,548 K
notepad.exe                   6584 Console                    6     13,696 K
```

Note that the separator for an option is a forward slash (/) instead of a hyphen (-) like the conventions for Unix systems. For readability, there’s a space between the program name, taskslist, and the option /FI, but it’s just as correct to type taskslist/FI.

The particular example above executes tasklist with a filter to only show the Notepad processes currently running. You can see that the system has two running instances of the Notepad process. Although it’s not equivalent, this is similar to executing the following command in a terminal on a Unix-like system:

```console
$ ps -ef | grep vi | grep -v grep
andre     2117     4  0 13:33 tty1     00:00:00 vi .gitignore
andre     2163  2134  0 13:34 tty3     00:00:00 vi main.c
```

The ps command above shows all the current running vi processes. The behavior is consistent with the [Unix Philosophy](https://en.wikipedia.org/wiki/Unix_philosophy), as the output of ps is transformed by two grep filters. The first grep command selects all the occurrences of vi, and the second grep filters out the occurrence of grep itself.

With the spread of Unix tools making their appearance in the Windows ecosystem, non-Windows-specific conventions are also accepted on Windows.

### Visuals
At the start of a Python process, Python command line arguments are split into two categories:

1. **Python options:** These influence the execution of the Python interpreter. For example, adding option [-O](https://docs.python.org/3/using/cmdline.html#cmdoption-o) is a means to optimize the execution of a Python program by removing assert and __debug__ statements. There are other [Python options](https://docs.python.org/using/cmdline.html#interface-options) available at the command line.

2. **Python program and its arguments:** Following the Python options (if there are any), you’ll find the Python program, which is a file name that usually has the extension .py, and its arguments. By convention, those can also be composed of options and arguments.

Take the following command that’s intended to execute the program main.py, which takes options and arguments. Note that, in this example, the Python interpreter also takes some options, which are [-B](https://docs.python.org/using/cmdline.html#id1) and [-v](https://docs.python.org/using/cmdline.html#id4).

```console
$ python -B -v main.py --verbose --debug un deux
```

In the command line above, the options are Python command line arguments and are organized as follows:

* **The option -B** tells Python not to write .pyc files on the import of source modules. For more details about .pyc files, check out the section [What Does a Compiler Do?](https://realpython.com/cpython-source-code-guide/#what-does-a-compiler-do) in [Your Guide to the CPython Source Code](https://realpython.com/cpython-source-code-guide).
* **The option -v** stands for **verbose** and tells Python to trace all import statements.
* **The arguments passed to main.py** are fictitious and represent two long options (--verbose and --debug) and two arguments (un and deux).

This example of Python command line arguments can be illustrated graphically as follows:

![alt text](./assets/017_CLI.png)

Within the Python program main.py, you only have access to the Python command line arguments inserted by Python in sys.argv. The Python options may influence the behavior of the program but are not accessible in main.py.

## A Few Methods for Parsing Python Command Line Arguments
Now you’re going to explore a few approaches to apprehend options, option-arguments, and operands. This is done by parsing Python command line arguments. In this section, you’ll see some concrete aspects of Python command line arguments and techniques to handle them. First, you’ll see an example that introduces a straight approach relying on [list comprehensions](https://realpython.com/list-comprehension-python/) to collect and separate options from arguments. Then you will:

* Use regular expressions to extract elements of the command line
* Learn how to handle files passed at the command line
* Apprehend the standard input in a way that’s compatible with the Unix tools
* Differentiate the regular output of the program from the errors
* Implement a custom parser to read Python command line arguments

This will serve as a preparation for options involving modules in the standard libraries or from external libraries that you’ll learn about later in this tutorial.

For something uncomplicated, the following pattern, which doesn’t enforce ordering and doesn’t handle option-arguments, may be enough:
```python
# cul.py

#. The code collects and separates the different argument types using list comprehensions

import sys

opts = [opt for opt in sys.argv[1:] if opt.startswith("-")]  # collects all the options by filtering on any Python command line arguments starting with a hyphen (-).
args = [arg for arg in sys.argv[1:] if not arg.startswith("-")]  # assembles the program arguments by filtering out the options.

if "-c" in opts:
    print(" ".join(arg.capitalize() for arg in args))
elif "-u" in opts:
    print(" ".join(arg.upper() for arg in args))
elif "-l" in opts:
    print(" ".join(arg.lower() for arg in args))
else:
    raise SystemExit(f"Usage: {sys.argv[0]} (-c | -u | -l) <arguments>...")
```    
    
The intent of the program above is to modify the case of the Python command line arguments. Three options are available:

* -c to capitalize the arguments
* -u to convert the arguments to uppercase
* -l to convert the argument to lowercase


When you execute the Python program above with a set of options and arguments, you get the following output:
```console
$ python cul.py -c un deux trois
Un Deux Trois
```

This approach might suffice in many situations, but it would fail in the following cases:

* If the order is important, and in particular, if options should appear before the arguments
* If support for option-arguments is needed
* If some arguments are prefixed with a hyphen (-)

You can leverage other options before you resort to a library like argparse or click.

## Regular Expressions
You can use a regular expression to enforce a certain order, specific options and option-arguments, or even the type of arguments. To illustrate the usage of a regular expression to parse Python command line arguments, you’ll implement a Python version of seq, which is a program that prints a sequence of numbers. Following the docopt conventions, a specification for seq.py could be this:

```console
Print integers from <first> to <last>, in steps of <increment>.

Usage:
  python seq.py --help
  python seq.py [-s SEPARATOR] <last>
  python seq.py [-s SEPARATOR] <first> <last>
  python seq.py [-s SEPARATOR] <first> <increment> <last>

Mandatory arguments to long options are mandatory for short options too.
  -s, --separator=STRING use STRING to separate numbers (default: \n)
      --help             display this help and exit

If <first> or <increment> are omitted, they default to 1. When <first> is
larger than <last>, <increment>, if not set, defaults to -1.
The sequence of numbers ends when the sum of the current number and
<increment> reaches the limit imposed by <last>.
```

First, look at a regular expression that’s intended to capture the requirements above:
```python
args_pattern = re.compile(
    r"""
    ^
    (
        (--(?P<HELP>help).*)|
        ((?:-s|--separator)\s(?P<SEP>.*?)\s)?
        ((?P<OP1>-?\d+))(\s(?P<OP2>-?\d+))?(\s(?P<OP3>-?\d+))?
    )
    $
""",
    re.VERBOSE,
)
```
To experiment with the regular expression above, you may use the snippet recorded on Regular Expression 101. The regular expression captures and enforces a few aspects of the requirements given for seq. In particular, the command may take:

A help option, in short (-h) or long format (--help), captured as a named group called HELP
A separator option, -s or --separator, taking an optional argument, and captured as named group called SEP
Up to three integer operands, respectively captured as OP1, OP2, and OP3
For clarity, the pattern args_pattern above uses the flag re.VERBOSE on line 11. This allows you to spread the regular expression over a few lines to enhance readability. The pattern validates the following:

Argument order: Options and arguments are expected to be laid out in a given order. For example, options are expected before the arguments.
Option values**: Only --help, -s, or --separator are expected as options.
Argument mutual exclusivity: The option --help isn’t compatible with other options or arguments.
Argument type: Operands are expected to be positive or negative integers.
For the regular expression to be able to handle these things, it needs to see all Python command line arguments in one string. You can collect them using str.join():

arg_line = " ".join(sys.argv[1:])
This makes arg_line a string that includes all arguments, except the program name, separated by a space.

Given the pattern args_pattern above, you can extract the Python command line arguments with the following function:

def parse(arg_line: str) -> Dict[str, str]:
    args: Dict[str, str] = {}
    if match_object := args_pattern.match(arg_line):
        args = {k: v for k, v in match_object.groupdict().items()
                if v is not None}
    return args
The pattern is already handling the order of the arguments, mutual exclusivity between options and arguments, and the type of the arguments. parse() is applying re.match() to the argument line to extract the proper values and store the data in a dictionary.

The dictionary includes the names of each group as keys and their respective values. For example, if the arg_line value is --help, then the dictionary is {'HELP': 'help'}. If arg_line is -s T 10, then the dictionary becomes {'SEP': 'T', 'OP1': '10'}. You can expand the code block below to see an implementation of seq with regular expressions.


At this point, you know a few ways to extract options and arguments from the command line. So far, the Python command line arguments were only strings or integers. Next, you’ll learn how to handle files passed as arguments.

File Handling
It’s time now to experiment with Python command line arguments that are expected to be file names. Modify sha1sum.py to handle one or more files as arguments. You’ll end up with a downgraded version of the original sha1sum utility, which takes one or more files as arguments and displays the hexadecimal SHA1 hash for each file, followed by the name of the file:

# sha1sum_file.py

import hashlib
import sys

def sha1sum(filename: str) -> str:
    hash = hashlib.sha1()
    with open(filename, mode="rb") as f:
        hash.update(f.read())
    return hash.hexdigest()

for arg in sys.argv[1:]:
    print(f"{sha1sum(arg)}  {arg}")
sha1sum() is applied to the data read from each file that you passed at the command line, rather than the string itself. Take note that m.update() takes a bytes-like object as an argument and that the result of invoking read() after opening a file with the mode rb will return a bytes object. For more information about handling file content, check out Reading and Writing Files in Python, and in particular, the section Working With Bytes.

The evolution of sha1sum_file.py from handling strings at the command line to manipulating the content of files is getting you closer to the original implementation of sha1sum:

$ sha1sum main main.c
9a6f82c245f5980082dbf6faac47e5085083c07d  main
125a0f900ff6f164752600550879cbfabb098bc3  main.c
The execution of the Python program with the same Python command line arguments gives this:

$ python sha1sum_file.py main main.c
9a6f82c245f5980082dbf6faac47e5085083c07d  main
125a0f900ff6f164752600550879cbfabb098bc3  main.c
Because you interact with the shell interpreter or the Windows command prompt, you also get the benefit of the wildcard expansion provided by the shell. To prove this, you can reuse main.py, which displays each argument with the argument number and its value:

$ python main.py main.*
Arguments count: 5
Argument      0: main.py
Argument      1: main.c
Argument      2: main.exe
Argument      3: main.obj
Argument      4: main.py
You can see that the shell automatically performs wildcard expansion so that any file with a base name matching main, regardless of the extension, is part of sys.argv.

The wildcard expansion isn’t available on Windows. To obtain the same behavior, you need to implement it in your code. To refactor main.py to work with wildcard expansion, you can use glob. The following example works on Windows and, though it isn’t as concise as the original main.py, the same code behaves similarly across platforms:

# main_win.py

import sys
import glob
import itertools
from typing import List

def expand_args(args: List[str]) -> List[str]:
    arguments = args[:1]
    glob_args = [glob.glob(arg) for arg in args[1:]]
    arguments += itertools.chain.from_iterable(glob_args)
    return arguments

if __name__ == "__main__":
    args = expand_args(sys.argv)
    print(f"Arguments count: {len(args)}")
    for i, arg in enumerate(args):
        print(f"Argument {i:>6}: {arg}")
In main_win.py, expand_args relies on glob.glob() to process the shell-style wildcards. You can verify the result on Windows and any other operating system:

C:/>python main_win.py main.*
Arguments count: 5
Argument      0: main_win.py
Argument      1: main.c
Argument      2: main.exe
Argument      3: main.obj
Argument      4: main.py
This addresses the problem of handling files using wildcards like the asterisk (*) or question mark (?), but how about stdin?

If you don’t pass any parameter to the original sha1sum utility, then it expects to read data from the standard input. This is the text you enter at the terminal that ends when you type Ctrl+D on Unix-like systems or Ctrl+Z on Windows. These control sequences send an end of file (EOF) to the terminal, which stops reading from stdin and returns the data that was entered.

In the next section, you’ll add to your code the ability to read from the standard input stream.

Standard Input
When you modify the previous Python implementation of sha1sum to handle the standard input using sys.stdin, you’ll get closer to the original sha1sum:

# sha1sum_stdin.py

from typing import List
import hashlib
import pathlib
import sys

def process_file(filename: str) -> bytes:
    return pathlib.Path(filename).read_bytes()

def process_stdin() -> bytes:
    return bytes("".join(sys.stdin), "utf-8")

def sha1sum(data: bytes) -> str:
    sha1_hash = hashlib.sha1()
    sha1_hash.update(data)
    return sha1_hash.hexdigest()

def output_sha1sum(data: bytes, filename: str = "-") -> None:
    print(f"{sha1sum(data)}  {filename}")

def main(args: List[str]) -> None:
    if not args:
        args = ["-"]
    for arg in args:
        if arg == "-":
            output_sha1sum(process_stdin(), "-")
        else:
            output_sha1sum(process_file(arg), arg)

if __name__ == "__main__":
    main(sys.argv[1:])
Two conventions are applied to this new sha1sum version:

Without any arguments, the program expects the data to be provided in the standard input, sys.stdin, which is a readable file object.
When a hyphen (-) is provided as a file argument at the command line, the program interprets it as reading the file from the standard input.
Try this new script without any arguments. Enter the first aphorism of The Zen of Python, then complete the entry with the keyboard shortcut Ctrl+D on Unix-like systems or Ctrl+Z on Windows:

$ python sha1sum_stdin.py
Beautiful is better than ugly.
ae5705a3efd4488dfc2b4b80df85f60c67d998c4  -
You can also include one of the arguments as stdin mixed with the other file arguments like so:

$ python sha1sum_stdin.py main.py - main.c
d84372fc77a90336b6bb7c5e959bcb1b24c608b4  main.py
Beautiful is better than ugly.
ae5705a3efd4488dfc2b4b80df85f60c67d998c4  -
125a0f900ff6f164752600550879cbfabb098bc3  main.c
Another approach on Unix-like systems is to provide /dev/stdin instead of - to handle the standard input:

$ python sha1sum_stdin.py main.py /dev/stdin main.c
d84372fc77a90336b6bb7c5e959bcb1b24c608b4  main.py
Beautiful is better than ugly.
ae5705a3efd4488dfc2b4b80df85f60c67d998c4  /dev/stdin
125a0f900ff6f164752600550879cbfabb098bc3  main.c
On Windows there’s no equivalent to /dev/stdin, so using - as a file argument works as expected.

The script sha1sum_stdin.py isn’t covering all necessary error handling, but you’ll cover some of the missing features later in this tutorial.

Standard Output and Standard Error
Command line processing may have a direct relationship with stdin to respect the conventions detailed in the previous section. The standard output, although not immediately relevant, is still a concern if you want to adhere to the Unix Philosophy. To allow small programs to be combined, you may have to take into account the three standard streams:

stdin
stdout
stderr
The output of a program becomes the input of another one, allowing you to chain small utilities. For example, if you wanted to sort the aphorisms of the Zen of Python, then you could execute the following:

$ python -c "import this" | sort
Although never is often better than *right* now.
Although practicality beats purity.
Although that way may not be obvious at first unless you're Dutch.
...
The output above is truncated for better readability. Now imagine that you have a program that outputs the same data but also prints some debugging information:

# zen_sort_debug.py

print("DEBUG >>> About to print the Zen of Python")
import this
print("DEBUG >>> Done printing the Zen of Python")
Executing the Python script above gives:

$ python zen_sort_debug.py
DEBUG >>> About to print the Zen of Python
The Zen of Python, by Tim Peters

Beautiful is better than ugly.
Explicit is better than implicit.
Simple is better than complex.
Complex is better than complicated.
...
DEBUG >>> Done printing the Zen of Python
The ellipsis (...) indicates that the output was truncated to improve readability.

Now, if you want to sort the list of aphorisms, then execute the command as follows:

$ python zen_sort_debug.py | sort

Although never is often better than *right* now.
Although practicality beats purity.
Although that way may not be obvious at first unless you're Dutch.
Beautiful is better than ugly.
Complex is better than complicated.
DEBUG >>> About to print the Zen of Python
DEBUG >>> Done printing the Zen of Python
Errors should never pass silently.
...
You may realize that you didn’t intend to have the debug output as the input of the sort command. To address this issue, you want to send traces to the standard errors stream, stderr, instead:

# zen_sort_stderr.py
import sys

print("DEBUG >>> About to print the Zen of Python", file=sys.stderr)
import this
print("DEBUG >>> Done printing the Zen of Python", file=sys.stderr)
Execute zen_sort_stderr.py to observe the following:

$ python zen_sort_stderr.py | sort
DEBUG >>> About to print the Zen of Python
DEBUG >>> Done printing the Zen of Python

Although never is often better than *right* now.
Although practicality beats purity.
Although that way may not be obvious at first unless you're Dutch
....
Now, the traces are displayed to the terminal, but they aren’t used as input for the sort command.

Custom Parsers
You can implement seq by relying on a regular expression if the arguments aren’t too complex. Nevertheless, the regex pattern may quickly render the maintenance of the script difficult. Before you try getting help from specific libraries, another approach is to create a custom parser. The parser is a loop that fetches each argument one after another and applies a custom logic based on the semantics of your program.

A possible implementation for processing the arguments of seq_parse.py could be as follows:

def parse(args: List[str]) -> Tuple[str, List[int]]:
    arguments = collections.deque(args)
    separator = "\n"
    operands: List[int] = []
    while arguments:
        arg = arguments.popleft()
        if not operands:
            if arg == "--help":
                print(USAGE)
                sys.exit(0)
            if arg in ("-s", "--separator"):
                separator = arguments.popleft()
                continue
        try:
            operands.append(int(arg))
        except ValueError:
            raise SystemExit(USAGE)
        if len(operands) > 3:
            raise SystemExit(USAGE)

    return separator, operands
parse() is given the list of arguments without the Python file name and uses collections.deque() to get the benefit of .popleft(), which removes the elements from the left of the collection. As the items of the arguments list unfold, you apply the logic that’s expected for your program. In parse() you can observe the following:

The while loop is at the core of the function, and terminates when there are no more arguments to parse, when the help is invoked, or when an error occurs.
If the separator option is detected, then the next argument is expected to be the separator.
operands stores the integers that are used to calculate the sequence. There should be at least one operand and at most three.
A full version of the code for parse() is available below:


This manual approach of parsing the Python command line arguments may be sufficient for a simple set of arguments. However, it becomes quickly error-prone when complexity increases due to the following:

A large number of arguments
Complexity and interdependency between arguments
Validation to perform against the arguments
The custom approach isn’t reusable and requires reinventing the wheel in each program. By the end of this tutorial, you’ll have improved on this hand-crafted solution and learned a few better methods.

A Few Methods for Validating Python Command Line Arguments
You’ve already performed validation for Python command line arguments in a few examples like seq_regex.py and seq_parse.py. In the first example, you used a regular expression, and in the second example, a custom parser.

Both of these examples took the same aspects into account. They considered the expected options as short-form (-s) or long-form (--separator). They considered the order of the arguments so that options would not be placed after operands. Finally, they considered the type, integer for the operands, and the number of arguments, from one to three arguments.

Type Validation With Python Data Classes
The following is a proof of concept that attempts to validate the type of the arguments passed at the command line. In the following example, you validate the number of arguments and their respective type:

# val_type_dc.py

import dataclasses
import sys
from typing import List, Any

USAGE = f"Usage: python {sys.argv[0]} [--help] | firstname lastname age]"

@dataclasses.dataclass
class Arguments:
    firstname: str
    lastname: str
    age: int = 0

def check_type(obj):
    for field in dataclasses.fields(obj):
        value = getattr(obj, field.name)
        print(
            f"Value: {value}, "
            f"Expected type {field.type} for {field.name}, "
            f"got {type(value)}"
        )
        if type(value) != field.type:
            print("Type Error")
        else:
            print("Type Ok")

def validate(args: List[str]):
    # If passed to the command line, need to convert
    # the optional 3rd argument from string to int
    if len(args) > 2 and args[2].isdigit():
        args[2] = int(args[2])
    try:
        arguments = Arguments(*args)
    except TypeError:
        raise SystemExit(USAGE)
    check_type(arguments)

def main() -> None:
    args = sys.argv[1:]
    if not args:
        raise SystemExit(USAGE)

    if args[0] == "--help":
        print(USAGE)
    else:
        validate(args)

if __name__ == "__main__":
    main()
Unless you pass the --help option at the command line, this script expects two or three arguments:

A mandatory string: firstname
A mandatory string: lastname
An optional integer: age
Because all the items in sys.argv are strings, you need to convert the optional third argument to an integer if it’s composed of digits. str.isdigit() validates if all the characters in a string are digits. In addition, by constructing the data class Arguments with the values of the converted arguments, you obtain two validations:

If the number of arguments doesn’t correspond to the number of mandatory fields expected by Arguments, then you get an error. This is a minimum of two and a maximum of three fields.
If the types after conversion aren’t matching the types defined in the Arguments data class definition, then you get an error.
You can see this in action with the following execution:

$ python val_type_dc.py Guido "Van Rossum" 25
Value: Guido, Expected type <class 'str'> for firstname, got <class 'str'>
Type Ok
Value: Van Rossum, Expected type <class 'str'> for lastname, got <class 'str'>
Type Ok
Value: 25, Expected type <class 'int'> for age, got <class 'int'>
Type Ok
In the execution above, the number of arguments is correct and the type of each argument is also correct.

Now, execute the same command but omit the third argument:

$ python val_type_dc.py Guido "Van Rossum"
Value: Guido, Expected type <class 'str'> for firstname, got <class 'str'>
Type Ok
Value: Van Rossum, Expected type <class 'str'> for lastname, got <class 'str'>
Type Ok
Value: 0, Expected type <class 'int'> for age, got <class 'int'>
Type Ok
The result is also successful because the field age is defined with a default value, 0, so the data class Arguments doesn’t require it.

On the contrary, if the third argument isn’t of the proper type—say, a string instead of integer—then you get an error:

python val_type_dc.py Guido Van Rossum
Value: Guido, Expected type <class 'str'> for firstname, got <class 'str'>
Type Ok
Value: Van, Expected type <class 'str'> for lastname, got <class 'str'>
Type Ok
Value: Rossum, Expected type <class 'int'> for age, got <class 'str'>
Type Error
The expected value Van Rossum, isn’t surrounded by quotes, so it’s split. The second word of the last name, Rossum, is a string that’s handled as the age, which is expected to be an int. The validation fails.

Note: For more details about the usage of data classes in Python, check out The Ultimate Guide to Data Classes in Python 3.7.

Similarly, you could also use a NamedTuple to achieve a similar validation. You’d replace the data class with a class deriving from NamedTuple, and check_type() would change as follows:

from typing import NamedTuple

class Arguments(NamedTuple):
    firstname: str
    lastname: str
    age: int = 0

def check_type(obj):
    for attr, value in obj._asdict().items():
        print(
            f"Value: {value}, "
            f"Expected type {obj.__annotations__[attr]} for {attr}, "
            f"got {type(value)}"
        )
        if type(value) != obj.__annotations__[attr]:
            print("Type Error")
        else:
            print("Type Ok")
A NamedTuple exposes functions like _asdict that transform the object into a dictionary that can be used for data lookup. It also exposes attributes like __annotations__, which is a dictionary storing types for each field, and For more on __annotations__, check out Python Type Checking (Guide).

As highlighted in Python Type Checking (Guide), you could also leverage existing packages like Enforce, Pydantic, and Pytypes for advanced validation.

Custom Validation
Not unlike what you’ve already explored earlier, detailed validation may require some custom approaches. For example, if you attempt to execute sha1sum_stdin.py with an incorrect file name as an argument, then you get the following:

$ python sha1sum_stdin.py bad_file.txt
Traceback (most recent call last):
  File "sha1sum_stdin.py", line 32, in <module>
    main(sys.argv[1:])
  File "sha1sum_stdin.py", line 29, in main
    output_sha1sum(process_file(arg), arg)
  File "sha1sum_stdin.py", line 9, in process_file
    return pathlib.Path(filename).read_bytes()
  File "/usr/lib/python3.8/pathlib.py", line 1222, in read_bytes
    with self.open(mode='rb') as f:
  File "/usr/lib/python3.8/pathlib.py", line 1215, in open
    return io.open(self, mode, buffering, encoding, errors, newline,
  File "/usr/lib/python3.8/pathlib.py", line 1071, in _opener
    return self._accessor.open(self, flags, mode)
FileNotFoundError: [Errno 2] No such file or directory: 'bad_file.txt'
bad_file.txt doesn’t exist, but the program attempts to read it.

Revisit main() in sha1sum_stdin.py to handle non-existing files passed at the command line:

def main(args):
    if not args:
        output_sha1sum(process_stdin())
    for arg in args:
        if arg == "-":
            output_sha1sum(process_stdin(), "-")
            continue
        try:
            output_sha1sum(process_file(arg), arg)
        except FileNotFoundError as err:
            print(f"{sys.argv[0]}: {arg}: {err.strerror}", file=sys.stderr)
To see the complete example with this extra validation, expand the code block below:


When you execute this modified script, you get this:

$ python sha1sum_val.py bad_file.txt
sha1sum_val.py: bad_file.txt: No such file or directory
Note that the error displayed to the terminal is written to stderr, so it doesn’t interfere with the data expected by a command that would read the output of sha1sum_val.py:

$ python sha1sum_val.py bad_file.txt main.py | cut -d " " -f 1
sha1sum_val.py: bad_file.txt: No such file or directory
d84372fc77a90336b6bb7c5e959bcb1b24c608b4
This command pipes the output of sha1sum_val.py to cut to only include the first field. You can see that cut ignores the error message because it only receives the data sent to stdout.


## Using getopt module

Python getopt module is similar to the getopt() function of C. Unlike the sys module, the getopt module extends the separation of the input string by parameter validation. It allows both short, and long options including a value assignment. However, this module requires the use of the sys module to process input data properly. To use getopt module, it is required to remove the first element from the list of command-line arguments.

### Syntax: 
```python
getopt.getopt(args, options, [long_options])
```

where:
* **args:** List of arguments to be passed. Usually: sys.argv[1:] ie The sys.argv arguments minus the first argument.
* **options:** String of option letters that the script want to recognize. Options that require an argument should be followed by a colon (:).
* **long_options:** List of string with the name of long options. Options that require arguments should be followed by an equal sign (=).

* **Return Type:** Returns value consisting of two elements: the first is a list of (option, value) pairs. The second is the list of program arguments left after the option list was stripped.

Example:

In [6]:
# Python program to demonstrate 
# command line arguments   
  
import getopt, sys 
  
  
# Remove 1st argument from the 
# list of command line arguments 
argumentList = sys.argv[1:] 
  
# Options 
options = "hmo:"
  
# Long options 
long_options = ["Help", "My_file", "Output ="] 
  
try: 
    # Parsing argument 
    arguments, values = getopt.getopt(argumentList, options, long_options) 
      
    # checking each argument 
    for currentArgument, currentValue in arguments: 
  
        if currentArgument in ("-h", "--Help"): 
            print ("Diplaying Help") 
              
        elif currentArgument in ("-m", "--My_file"): 
            print ("Displaying file_name:", sys.argv[0]) 
              
        elif currentArgument in ("-o", "--Output"): 
            print (("Enabling special output mode (% s)") % (currentValue)) 
              
except getopt.error as err: 
    # output error, and return with an error code 
    print (str(err)) 

option -f not recognized


## Using argparse module
Using argparse module is a better option than the above two options as it provides a lot of options such as positional arguments, default value for arguments, help message, specifying data type of argument etc.

Note: As a default optional argument, it includes -h, along with its long version --help.

Example 1: Basic use of argparse module.

In [3]:
# Python program to demonstrate 
# command line arguments 
  
  
import argparse 
  
# Initialize parser 
parser = argparse.ArgumentParser() 
parser.parse_args() 

usage: ipykernel_launcher.py [-h]
ipykernel_launcher.py: error: unrecognized arguments: -f /Users/saquib/Library/Jupyter/runtime/kernel-744c2c72-cac3-4ab3-98aa-8979305e7609.json


SystemExit: 2

Example 2: Adding description to the help message.

In [2]:
# Python program to demonstrate 
# command line arguments 
  
  
import argparse 
  
msg = "Adding description"
  
# Initialize parser 
parser = argparse.ArgumentParser(description = msg) 
parser.parse_args() 

usage: ipykernel_launcher.py [-h]
ipykernel_launcher.py: error: unrecognized arguments: -f /Users/saquib/Library/Jupyter/runtime/kernel-744c2c72-cac3-4ab3-98aa-8979305e7609.json


SystemExit: 2

  warn("To exit: use 'exit', 'quit', or Ctrl-D.", stacklevel=1)


Example 3: Defining optional value

In [4]:
# Python program to demonstrate 
# command line arguments 
  
import argparse 
  
  
# Initialize parser 
parser = argparse.ArgumentParser() 
  
# Adding optional argument 
parser.add_argument("-o", "--Output", help = "Show Output") 
  
# Read arguments from command line 
args = parser.parse_args() 
  
if args.Output: 
    print("Diplaying Output as: % s" % args.Output) 


usage: ipykernel_launcher.py [-h] [-o OUTPUT]
ipykernel_launcher.py: error: unrecognized arguments: -f /Users/saquib/Library/Jupyter/runtime/kernel-744c2c72-cac3-4ab3-98aa-8979305e7609.json


SystemExit: 2