# Advanced Computational Physics 


## More about Python: Functions, Classes and Symbolic computing
### Subprocesses, concurrency and parallelism in Python


#### *X. Cid Vidal*
####  USC, October 2024 


In [1]:
import time
print(' Last revision ', time.asctime())

 Last revision  Tue Nov  5 13:19:31 2024


## 1. Working with Subprocesses in Python
The `subprocess` module in Python allows you to spawn new processes, connect to their input/output pipes, and retrieve their results. It's a powerful tool for running shell commands or integrating Python with external programs.

You can use `subprocess.run()`, `subprocess.Popen()`, and other functions depending on the level of control you need. Here are some common uses of `subprocess`:
- Running shell commands
- Communicating with a process (passing input and receiving output)
- Handling errors raised by subprocesses
- Running background tasks

In [2]:
import subprocess

# Example of running a shell command with subprocess
result = subprocess.run(['echo', 'Hello from subprocess'], stdout=subprocess.PIPE, text=True)

# Print the output
print(result.stdout)

Hello from subprocess



### Explanation: `subprocess.run`
- The `subprocess.run()` function runs a command, waits for it to finish, and returns a `CompletedProcess` object.
- `stdout=subprocess.PIPE` captures the output of the command.
- `text=True` ensures the output is in string format rather than bytes.


### Example: Run a Shell Command Using Subprocess
Write a Python script that uses the `subprocess` module to list all files and directories in the current directory using the shell command `ls` (or `dir` on Windows). Capture and print the output in Python.

In [3]:
# Use subprocess to list all files and directories
result = subprocess.run(['ls'], stdout=subprocess.PIPE, text=True)

# Print the output
print(result.stdout)

README.md
about_functions.ipynb
about_list_expressions.ipynb
about_testing.ipynb
classes.ipynb
classes_intro.ipynb
concurrency_parallelism.ipynb
concurrency_parallelism_sol.ipynb
cookbook_numpy.ipynb
cookbook_pandas.ipynb
environment.yml
img
index.ipynb
output.txt
python_intro.ipynb
setup.ipynb
shortcut_matplotlib1.ipynb
shortcut_matplotlib2.ipynb
sympy.ipynb
test_subprocess_dir
vector.py



### 1.1 Running a Subprocess with Input
You can also pass input to a subprocess. In the following example, we'll run the `bc` command (basic calculator) and pass input to it to perform a mathematical operation.

In [4]:
# Example of subprocess with input
result = subprocess.run(['bc'], input='5 + 10', stdout=subprocess.PIPE, text=True)

# Print the output
print('Result from bc calculator:', result.stdout)

Result from bc calculator: 15



### Example: Subprocess with Input
Use the `subprocess` module to call a calculator program (like `bc` on Unix-like systems) and pass multiple calculations to it. Capture and print the result of each calculation.

In [14]:
# Using subprocess to pass multiple inputs to bc calculator
calculations = ['5 + 510 * 2','3+3']
results = [subprocess.run(['bc'], input=calculation, stdout=subprocess.PIPE, text=True) for calculation in calculations]

# Print the result of each calculation
for i,result in enumerate(results):
    print('Results from bc calculator',i,":")
    print(result.stdout)

Results from bc calculator 0 :
1025

Results from bc calculator 1 :
6



### 1.2 Running a Subprocess with Error Handling and Timeout
Here’s an example of using `subprocess.run()` with error handling. This command will attempt to ping different addresses, and we’ll handle the error using `subprocess.CalledProcessError` and set a timeout in case the process hangs.

In [16]:
import subprocess

for ip in ['8.8.8.8','256.256.256.256']:
    try:
        # Run the command with timeout and check for errors
        result = subprocess.run(['ping', '-c', '4', ip], 
                                stdout=subprocess.PIPE, stderr=subprocess.PIPE, 
                                text=True, timeout=5, check=True)
    except subprocess.CalledProcessError as e:
        print(f"Error occurred: {e.stderr}")
    except subprocess.TimeoutExpired:
        print("The subprocess timed out!")
    else:
        print(result.stdout)

PING 8.8.8.8 (8.8.8.8): 56 data bytes
64 bytes from 8.8.8.8: icmp_seq=0 ttl=115 time=37.424 ms
64 bytes from 8.8.8.8: icmp_seq=1 ttl=115 time=31.267 ms
64 bytes from 8.8.8.8: icmp_seq=2 ttl=115 time=37.087 ms
64 bytes from 8.8.8.8: icmp_seq=3 ttl=115 time=34.604 ms

--- 8.8.8.8 ping statistics ---
4 packets transmitted, 4 packets received, 0.0% packet loss
round-trip min/avg/max/stddev = 31.267/35.096/37.424/2.464 ms

Error occurred: ping: cannot resolve 256.256.256.256: Unknown host



### 1.3 Using `subprocess.Popen` for Advanced Control
`subprocess.Popen` gives you more flexibility than `subprocess.run` as it allows you to interact with the process while it's running. You can send input and read output in real-time, making it useful for more complex tasks.

In [7]:
import subprocess

# Start a process with Popen and communicate with it
process = subprocess.Popen(['bc'], stdin=subprocess.PIPE, stdout=subprocess.PIPE, stderr=subprocess.PIPE, text=True)

# Send input to the process
output, error = process.communicate(input='5 * 5')

print('Output from bc:', output)

Output from bc: 25



## 2. Concurrency and parallelism
We will now explore **concurrency** and **parallelism** in Python. These are powerful techniques for improving the performance of Python programs by allowing multiple operations to happen simultaneously or concurrently.

We'll cover:
- Multi-threading with the `threading` module
- Multi-processing with the `multiprocessing` module
- Asynchronous programming with `asyncio`

### What is Concurrency and Parallelism?
- **Concurrency**: Multiple tasks can start, run, and complete in overlapping time periods.
- **Parallelism**: Multiple tasks are executed at the same time on multiple processors or cores.
Python’s **GIL** (Global Interpreter Lock) affects how parallel threads are executed, but it can be bypassed using multi-processing or asynchronous programming.

<center><img src="https://miro.medium.com/v2/resize:fit:4800/format:webp/0*zp67gXLSUUlcGeg4.png" width="700"></center>


## 2.1 Multi-threading with the `threading` Module
Multi-threading is useful for I/O-bound tasks like reading/writing files or handling network requests. Threads share the same memory space and can be used to handle tasks that do not require much CPU power.

In [17]:
import threading
import time

# Example of multi-threading
def print_numbers():
    for i in range(5):
        print(i)
        time.sleep(1)  # Simulate I/O-bound task with delay

# Create two threads
thread1 = threading.Thread(target=print_numbers)
thread2 = threading.Thread(target=print_numbers)

print("start")
# Start the threads
thread1.start()
thread2.start()

print("\njoin")
# Wait for threads to complete
thread1.join()
thread2.join()

start
00


join
11

22

33

4
4


### Multi-threading example
Create two threads, one that prints numbers from 1 to 5 with a delay of 1 second, and another that prints letters from 'A' to 'E' with a delay of 1.5 seconds. Both should run concurrently.

In [36]:
def print_numbers():
    for i in range(1, 6):
        print(i)
        time.sleep(1)

def print_letters():
    for letter in 'ABCDE':
        print(letter)
        time.sleep(1.5)

# Create threads
thread1 = threading.Thread(target=print_numbers)
thread2 = threading.Thread(target=print_letters)

print("start")
# Start threads
thread1.start()
thread2.start()

print("join")
# Wait for threads to finish
thread1.join()
thread2.join()

start
1
Ajoin

2
B
3
4
C
5
D
E


In [35]:
# Create threads
thread1 = threading.Thread(target=print_numbers)
thread2 = threading.Thread(target=print_letters)

## COMPARE TO
print("FIRST")
# Start first thread
thread1.start()
thread1.join()

print("SECOND")
# don't start second till first finishes
thread2.start()
thread2.join()

FIRST
1
2
3
4
5
SECOND
A
B
C
D
E


In [22]:
# Create threads
thread1 = threading.Thread(target=print_numbers)
thread2 = threading.Thread(target=print_letters)

## OR
print("FIRST")
# Start first thread
thread1.start()

print("SECOND")
# don't start second till first finishes
thread2.start()

FIRST
1SECOND
A

2
B
3
C
4
5
D
E


## 2.2 Multi-processing with the `multiprocessing` Module
Multi-processing allows you to bypass Python's GIL by running processes in separate memory spaces. This is ideal for CPU-bound tasks that need to run in parallel across multiple cores.
Note multiprocessing **just won't work in Windows** and jupyter-notebooks. For those of you using Windows, just take the code as independent python files and run them, or move to another unix machine.
Also, you'll need the idiom ``if __name__ == "__main__":`` to prote

In [38]:
#import multiprocessing as mp
import multiprocess as mp ## in ipython, use multiprocess instead (mostly the same)

# Example of multi-processing
def square_numbers(numbers):
    for number in numbers:
        print(f"Square of {number} is {number ** 2}")

numbers = [1, 2, 3, 4, 5]


# Create a process
process = mp.Process(target=square_numbers, args=(numbers,))

# Start and join the process
process.start()
process.join()

Square of 1 is 1
Square of 2 is 4
Square of 3 is 9
Square of 4 is 16
Square of 5 is 25


### Multi-processing example
Write a function that calculates the cube of numbers in a list. Use the `multiprocessing` module to run two processes simultaneously: one for squaring numbers and the other for cubing them.

In [27]:
def cube_numbers(numbers):
    for number in numbers:
        print(f"Cube of {number} is {number ** 3}")

# Create processes
process1 = mp.Process(target=square_numbers, args=(numbers,))
process2 = mp.Process(target=cube_numbers, args=(numbers,))

# Start processes
process1.start()
process2.start()

# Join processes
process1.join()
process2.join()

Square of 1 is 1
Square of 2 is 4
Cube of 1 is 1Square of 3 is 9

Square of 4 is 16Cube of 2 is 8

Square of 5 is 25Cube of 3 is 27

Cube of 4 is 64
Cube of 5 is 125


In [37]:
## this is to be combined with the process monitor in a mac
def infinite_loop():
    while 1: continue

# Create processes
processes = [mp.Process(target=infinite_loop) for i in range(7)]

# Start processes
for process in processes: process.start()
## kill processes after 10 s
for i,process in enumerate(processes):
    time.sleep(10)
    process.kill()
    print("process",i,"killed")

process 0 killed
process 1 killed
process 2 killed
process 3 killed
process 4 killed
process 5 killed
process 6 killed


## 2.3 Asynchronous example with `asyncio`
The `asyncio` module allows for concurrent code execution without using multiple threads or processes. It is best suited for I/O-bound tasks, such as network operations or reading large files asynchronously.

In [31]:
import asyncio

# Example of asyncio for concurrent tasks
async def print_numbers_async():
    for i in range(1, 6):
        print(i)
        await asyncio.sleep(1)

async def print_letters_async():
    for letter in 'ABCDE':
        print(letter)
        await asyncio.sleep(1.5)

# Run asyncio tasks concurrently
async def main():
    await asyncio.gather(print_numbers_async(), print_letters_async())

# Run the asyncio program
await main()

1
A
2
B
3
C
4
5
D
E


### Async example
Imagine we have several tasks that represent downloading data from different URLs. Using `asyncio`, we can handle all these downloads simultaneously rather than one after anothe

In [33]:
import asyncio
import random

async def download_data(task_name, delay):
    print(f"{task_name} started downloading...")
    # Simulate a network delay
    await asyncio.sleep(delay)
    print(f"{task_name} finished downloading!")
    return f"Data from {task_name}"

async def main():
    # Create multiple download tasks with varying delays
    tasks = [
        download_data("Task 1", random.randint(1, 5)),
        download_data("Task 2", random.randint(1, 5)),
        download_data("Task 3", random.randint(1, 5)),
    ]
    
    # Run the tasks concurrently and gather results
    results = await asyncio.gather(*tasks)
    
    print("\nAll tasks completed!")
    for result in results:
        print(result)

# Run the main function
await main()

## won't work interactively
## asyncio.run(main())

Task 1 started downloading...
Task 2 started downloading...
Task 3 started downloading...
Task 2 finished downloading!
Task 3 finished downloading!
Task 1 finished downloading!

All tasks completed!
Data from Task 1
Data from Task 2
Data from Task 3


## 2.4 Best Practices for Concurrency and Parallelism
- **Use multi-threading** for I/O-bound tasks (e.g., file I/O, network requests).
- **Use multi-processing** for CPU-bound tasks that require significant computational power.
- **Use `asyncio`** for lightweight I/O-bound tasks, such as web scraping or reading files.
- Avoid shared state between threads or processes to prevent race conditions and deadlocks.

### Exercises
1. Write a script using subprocess.Popen to run a shell command that lists files in a directory, then filters the results to only display files that contain a specific string (e.g., '.txt'). Capture and print the filtered output.
2. Write a script that uses the subprocess module to:
   - Create a directory called test_subprocess_dir if it doesn't already exist.
   - Inside that directory, create three empty text files named file1.txt, file2.txt, and file3.txt.
   - Use the ls (or dir for Windows) command to list the contents of the directory and print the results.
3. Create a script that uses multiple threads to perform both I/O-bound and CPU-bound tasks concurrently. One thread should write to a file while the other calculates the factorial of a number.
4. Write a script that uses multi-processing to perform two CPU-intensive tasks: calculating the factorial of a number and generating Fibonacci numbers. Run both tasks in parallel using the `multiprocessing` module.

### ++ Exercise
Write a Python script that:
- Uses `multiprocessing` to spawn three parallel processes
- Each process will use `subprocess` to perform a task (e.g., pinging a website, checking a directory for files, or calling a shell command)
- Combine the results and print the output from each process.