# Program, Process and Threads

## Program
A program is a set of instructions written in a programming language that is executed by a computer to perform a specific task. It is a passive entity stored on disk (e.g., an executable file) that becomes a process when loaded into memory and executed.

### Example of a Program
- A Python script that calculates the sum of two numbers

## Process
A process is an instance of a program that is being executed. It contains the program code and its current activity. A process has its own memory space and resources. Processes are independent of each other and do not share memory.

### Examples of Processes
- A web browser running on your computer
- A text editor like Microsoft Word
- A media player playing a video

## Thread
A thread is the smallest unit of execution within a process. Threads share the same memory space and resources of the process they belong to. Multiple threads within the same process can run concurrently, allowing for parallel execution of tasks.

### Examples of Threads
- A web browser with multiple tabs, each tab running in a separate thread
- A text editor performing spell check while you type
- A media player downloading a video while playing it



# Process Structure and Its Usage in the Operating System

## Process Structure
A process is an instance of a program that is being executed. It consists of several components, including:

1. **Program Code**: The set of instructions that the process executes.
2. **Program Counter (PC)**: A register that holds the address of the next instruction to be executed.
3. **Stack**: Contains temporary data such as function parameters, return addresses, and local variables.
4. **Heap**: Dynamically allocated memory used during the process's runtime.
5. **Data Section**: Contains global and static variables.
6. **Process Control Block (PCB)**: A data structure maintained by the operating system that contains important information about the process, such as process state, process ID, CPU registers, memory management information, and I/O status.

## Usage in the Operating System
The operating system (OS) is responsible for managing processes. It performs several key functions related to process management:

1. **Process Creation and Termination**: The OS creates new processes and terminates existing ones.
2. **Process Scheduling**: The OS decides the order in which processes are executed, ensuring efficient use of CPU resources.
3. **Process Synchronization**: The OS coordinates the execution of processes to ensure they do not interfere with each other.
4. **Process Communication**: The OS provides mechanisms for processes to communicate with each other, such as inter-process communication (IPC).
5. **Memory Management**: The OS allocates and deallocates memory for processes, ensuring each process has its own memory space.

## Features of Processes
Processes have several important features that ensure the stability and security of the operating system:

1. **Isolation**: Each process runs in its own memory space, preventing one process from accessing or corrupting the memory of another process.
2. **Resource Allocation**: The OS allocates resources such as CPU time, memory, and I/O devices to processes, ensuring fair and efficient use of resources.
3. **Concurrency**: Multiple processes can run concurrently, allowing for multitasking and better utilization of system resources.
4. **Protection**: The OS enforces access control policies to protect processes from unauthorized access and ensure system security.
5. **Inter-Process Communication (IPC)**: Processes can communicate with each other using mechanisms provided by the OS, such as message passing, shared memory, and pipes.

By managing processes effectively, the operating system ensures that multiple programs can run simultaneously without interfering with each other, providing a stable and efficient computing environment.

# Understanding Threads and Processes

## What is a Thread?

A thread is the smallest unit of a process that can be scheduled and executed by the operating system. It is a sequence of executable instructions that can run concurrently with other threads within the same process. Threads share the same memory space and resources of the process they belong to, which makes communication between threads more efficient.

## How Threads Work in the OS

In an operating system, threads are managed by a thread scheduler, which is responsible for allocating CPU time to each thread. The scheduler uses various algorithms to determine the order in which threads are executed. Threads can be in different states such as running, waiting, or blocked, depending on their execution status and resource availability.

## Difference Between Threads and Processes

- **Memory Space**: Threads share the same memory space within a process, while processes have separate memory spaces.
- **Resource Sharing**: Threads can easily share resources like variables and files, whereas processes require inter-process communication (IPC) mechanisms to share resources.
- **Overhead**: Creating and managing threads is generally less resource-intensive compared to processes.
- **Isolation**: Processes are isolated from each other, which enhances security and stability. Threads, on the other hand, can affect each other since they share the same memory space.

## Benefits of Using Threads

- **Concurrency**: Threads allow multiple tasks to run concurrently, improving the efficiency and responsiveness of applications.
- **Resource Sharing**: Since threads share the same memory space, they can communicate and share data more easily and quickly.
- **Responsiveness**: In GUI applications, threads can keep the interface responsive by handling background tasks without freezing the main application.

## Types of Threads

1. **User Threads**: Managed by user-level libraries rather than the operating system. They are lightweight and fast but require a user-level thread library for management.
2. **Kernel Threads**: Managed directly by the operating system. They are more powerful and can take advantage of multiple processors, but they are heavier and slower to create and manage compared to user threads.
3. **Hybrid Threads**: Combine the advantages of user and kernel threads. They are managed by both the user-level library and the operating system, providing a balance between performance and flexibility.

Understanding the differences and benefits of threads and processes is crucial for designing efficient and responsive applications. By leveraging multithreading, developers can improve the performance and user experience of their software.

# Multithreading in Python

## What is Multithreading?

Multithreading is a technique that allows multiple threads to run concurrently within a single process. Each thread represents a separate path of execution, enabling parallelism and improving the efficiency and responsiveness of applications. Multithreading is particularly useful for I/O-bound tasks, such as reading/writing files, network operations, and handling user input.

## How Multithreading is Done in Python

Python provides several ways to implement multithreading, with the most common being the `threading` module. The `threading` module allows you to create and manage threads easily.

### Key Concepts

1. **Thread**: The smallest unit of execution within a process.
2. **GIL (Global Interpreter Lock)**: A mutex that protects access to Python objects, preventing multiple native threads from executing Python bytecodes simultaneously. This can be a limitation for CPU-bound tasks but does not affect I/O-bound tasks significantly.

### Creating and Managing Threads

To create and manage threads in Python, you can use the `threading` module. Here are the basic steps:

1. **Import the `threading` module**:
    ```python
    import threading
    ```

2. **Define a function to be executed by the thread**:
    ```python
    def print_numbers():
        for i in range(10):
            print(i)
    ```

3. **Create a Thread object and start the thread**:
    ```python
    thread = threading.Thread(target=print_numbers)
    thread.start()
    ```

4. **Wait for the thread to complete (optional)**:
    ```python
    thread.join()
    ```

### Example

Here is a complete example of creating and running multiple threads in Python:

```python
import threading

def print_numbers():
    for i in range(10):
        print(i)

# Create multiple threads
threads = []
for _ in range(5):
    thread = threading.Thread(target=print_numbers)
    threads.append(thread)
    thread.start()

# Wait for all threads to complete
for thread in threads:
    thread.join()
```

### Synchronization

When multiple threads access shared resources, synchronization is necessary to prevent data corruption and ensure consistency. Python provides several synchronization primitives, such as `Lock`, `RLock`, `Semaphore`, `Event`, and `Condition`.

#### Using Lock

A `Lock` is the simplest synchronization primitive. It ensures that only one thread can access a shared resource at a time.

```python
import threading

lock = threading.Lock()

def print_numbers():
    with lock:
        for i in range(10):
            print(i)

# Create and start threads
threads = []
for _ in range(5):
    thread = threading.Thread(target=print_numbers)
    threads.append(thread)
    thread.start()

# Wait for all threads to complete
for thread in threads:
    thread.join()
```

## Conclusion

Multithreading in Python allows for concurrent execution of tasks, improving the efficiency and responsiveness of applications. While the Global Interpreter Lock (GIL) can be a limitation for CPU-bound tasks, multithreading is still highly effective for I/O-bound tasks. By using the `threading` module and synchronization primitives, you can create and manage threads safely and efficiently in your Python programs.

In [4]:
## Usuall way we do the program when wwe have some io or task to do in between the time
import time

def print_numbers():
        for i in range(4):
            time.sleep(2)
            print(f"Number : {i}")
            
def print_letter():
    for letter in "abcde":
        time.sleep(2)
        print(f"letter : {letter}")
        
t = time.time()
print_numbers()
print_letter()

print(time.time()-t)

Number : 0
Number : 1
Number : 2
Number : 3
letter : a
letter : b
letter : c
letter : d
letter : e
18.004987955093384


In [None]:
## Now we will see how we can make run the program cuncurently when one function is sleep so we can have the full usage of the time the function is sleeping so here the multithreading come in there
import time
import threading

def print_numbers():
        for i in range(4):
            time.sleep(2)
            print(f"Number : {i}")
            
def print_letter():
    for letter in "abcde":
        time.sleep(2)
        print(f"letter : {letter}")
        
        
## so we will create 2 thread for each function 
t1 = threading.Thread(target=print_numbers)
t2 = threading.Thread(target=print_letter)

t = time.time()

#start the execution of thread
t1.start()
t2.start()
t1.join()
t2.join()
print(time.time()-t)

letter : a
Number : 0
letter : b
Number : 1
letter : c
Number : 2
letter : d
Number : 3
letter : e
10.003304243087769


: 

# Multiprocessing in Python

## What is Multiprocessing?

Multiprocessing is a technique that allows a program to run multiple processes simultaneously. Each process runs independently and has its own memory space. This is particularly useful for CPU-bound tasks, where the program can take advantage of multiple CPU cores to perform computations in parallel.

## Why Do We Need Multiprocessing?

1. **Parallelism**: Multiprocessing allows multiple processes to run in parallel, making full use of multiple CPU cores and improving the performance of CPU-bound tasks.
2. **Isolation**: Each process runs in its own memory space, preventing one process from affecting the memory of another process. This enhances the stability and security of the program.
3. **Concurrency**: Multiprocessing enables concurrent execution of tasks, improving the efficiency and responsiveness of applications.

## How to Use Multiprocessing in Python

Python provides the `multiprocessing` module to create and manage processes. Here are the basic steps to use multiprocessing:

1. **Import the `multiprocessing` module**:
    ```python
    import multiprocessing

    def print_numbers():
        for i in range(10):
            print(i)

    # Create multiple processes
    processes = []
    for _ in range(5):
        process = multiprocessing.Process(target=print_numbers)
        processes.append(process)
        process.start()

    # Wait for all processes to complete
    for process in processes:
        process.join()
    ```
2. **Define a function to be executed by the process**:
    ```python
    def print_numbers():
        for i in range(10):
            print(i)
    ```

3. **Create a Process object and start the process**:
    ```python
    process = multiprocessing.Process(target=print_numbers)
    process.start()
    ```

4. **Wait for the process to complete (optional)**:
    ```python
    process.join()
    ```

    ## Using Pool

    The `multiprocessing.Pool` class provides a convenient way to parallelize the execution of a function across multiple input values, distributing the input data across processes.

    ### Example

    ```python
    import multiprocessing

    def square(x):
        return x * x

    if __name__ == "__main__":
        with multiprocessing.Pool(4) as pool:
            results = pool.map(square, range(10))
        print(results)
    ```

    ## Using Queue

    The `multiprocessing.Queue` class allows you to create a shared queue that can be used to exchange data between processes.

    ### Example

    ```python
    import multiprocessing

    def worker(queue):
        while not queue.empty():
            item = queue.get()
            print(f"Processed {item}")

    if __name__ == "__main__":
        queue = multiprocessing.Queue()
        for item in range(10):
            queue.put(item)

        processes = []
        for _ in range(4):
            process = multiprocessing.Process(target=worker, args=(queue,))
            processes.append(process)
            process.start()

        for process in processes:
            process.join()
    ```

    ## Sharing Data Between Processes

    The `multiprocessing` module provides `Value` and `Array` classes to share data between processes.

    ### Example

    ```python
    import multiprocessing

    def increment_value(shared_value):
        for _ in range(1000):
            with shared_value.get_lock():
                shared_value.value += 1

    if __name__ == "__main__":
        shared_value = multiprocessing.Value('i', 0)

        processes = []
        for _ in range(4):
            process = multiprocessing.Process(target=increment_value, args=(shared_value,))
            processes.append(process)
            process.start()

        for process in processes:
            process.join()

        print(f"Final value: {shared_value.value}")
    ```

    By using the `multiprocessing` module, you can take full advantage of multiple CPU cores and improve the performance of your Python programs.



In [1]:
## PRocesses that run in parallel
### CPU-Bound Tasks-Tasks that are heavy on CPU usage (e.g., mathematical computations, data processing).
## PArallel execution- Multiple cores of the CPU

import multiprocessing
from multiprocessing import Process

import time

def square_numbers():
    for i in range(5):
        time.sleep(1)
        print(f"Square: {i*i}")

def cube_numbers():
    for i in range(5):
        time.sleep(1.5)
        print(f"Cube: {i * i * i}")

if __name__ == "__main__":
    
    ## create 2 processes
    p1=Process(target=square_numbers)
    p2=Process(target=cube_numbers)
    t=time.time()

    ## start the process
    p1.start()
    p2.start()

    ## Wait for the process to complete
    p1.join()
    p2.join()

    finished_time=time.time()-t
    print(finished_time)

0.16669631004333496


# ThreadPoolExecutor in Python

## What is ThreadPoolExecutor?

`ThreadPoolExecutor` is a high-level interface provided by the `concurrent.futures` module in Python for managing a pool of threads. It simplifies the process of creating and managing multiple threads, allowing you to execute tasks concurrently with ease.

## Why Use ThreadPoolExecutor?

1. **Simplified Thread Management**: `ThreadPoolExecutor` abstracts the complexity of thread creation and management, making it easier to work with multiple threads.
2. **Resource Efficiency**: By reusing a pool of threads, `ThreadPoolExecutor` reduces the overhead associated with creating and destroying threads.
3. **Concurrency**: It allows you to execute multiple tasks concurrently, improving the efficiency and responsiveness of your application.
4. **Scalability**: `ThreadPoolExecutor` can manage a large number of threads efficiently, making it suitable for applications that require high concurrency.

## How to Use ThreadPoolExecutor

### Basic Usage

1. **Import the `ThreadPoolExecutor` class**:
    ```python
    from concurrent.futures import ThreadPoolExecutor
    ```

2. **Define a function to be executed by the threads**:
    ```python
    def task(n):
        print(f"Task {n} is running")
    ```

3. **Create a `ThreadPoolExecutor` and submit tasks**:
    ```python
    with ThreadPoolExecutor(max_workers=5) as executor:
        futures = [executor.submit(task, i) for i in range(10)]
    ```

4. **Wait for all tasks to complete (optional)**:
    ```python
    for future in futures:
        future.result()
    ```

### Example

Here is a complete example of using `ThreadPoolExecutor` to execute multiple tasks concurrently:

```python
from concurrent.futures import ThreadPoolExecutor
import time

def task(n):
    time.sleep(1)
    print(f"Task {n} is running")

if __name__ == "__main__":
    with ThreadPoolExecutor(max_workers=5) as executor:
        futures = [executor.submit(task, i) for i in range(10)]
        
    for future in futures:
        future.result()
```

### Advanced Usage

You can also use `ThreadPoolExecutor` to handle the results of tasks as they complete:

```python
from concurrent.futures import ThreadPoolExecutor, as_completed

def task(n):
    time.sleep(1)
    return f"Task {n} completed"

if __name__ == "__main__":
    with ThreadPoolExecutor(max_workers=5) as executor:
        futures = [executor.submit(task, i) for i in range(10)]
        
        for future in as_completed(futures):
            print(future.result())
```

## Effectiveness of ThreadPoolExecutor

1. **Improved Performance**: By executing tasks concurrently, `ThreadPoolExecutor` can significantly improve the performance of I/O-bound applications.
2. **Efficient Resource Utilization**: It efficiently manages a pool of threads, reducing the overhead associated with thread creation and destruction.
3. **Scalability**: `ThreadPoolExecutor` can handle a large number of tasks concurrently, making it suitable for scalable applications.
4. **Ease of Use**: The high-level interface of `ThreadPoolExecutor` simplifies the process of working with multiple threads, making it accessible to developers with varying levels of experience.

In summary, `ThreadPoolExecutor` is a powerful and efficient tool for managing concurrent execution in Python, providing improved performance, resource efficiency, and scalability.

In [2]:
from concurrent.futures import ThreadPoolExecutor
import time

def print_number(number):
    time.sleep(1)
    return f"Number :{number}"

numbers=[1,2,3,4,5,6,7,8,9,0,1,2,3]

with ThreadPoolExecutor(max_workers=3) as executor:
    results=executor.map(print_number,numbers)

for result in results:
    print(result)

Number :1
Number :2
Number :3
Number :4
Number :5
Number :6
Number :7
Number :8
Number :9
Number :0
Number :1
Number :2
Number :3


In [5]:
# THIS SAME CAN BE DONE WITH THE MULTIPROCESSING TOO LETS SEE HOW HERE WE USE THE PROCESS POLL EXECUTER
from concurrent.futures import ProcessPoolExecutor, as_completed
import time

def square_number(number):
    try:
        time.sleep(2)
        return f"Square: {number * number}"
    except Exception as e:
        return f"Error processing number {number}: {e}"

numbers=[1,2,3,4,5,6,7,8,9,11,2,3,12,14]

with ProcessPoolExecutor(max_workers=3) as executor:
    futures = [executor.submit(square_number, number) for number in numbers]

for future in as_completed(futures):
    print(future.result())

BrokenProcessPool: A process in the process pool was terminated abruptly while the future was running or pending.

## Real world example 
#### Web scraping
##### Multithreading Fore i/o bound Tasks

In [None]:
'''
Real-World Example: Multithreading for I/O-bound Tasks
Scenario: Web Scraping
Web scraping often involves making numerous network requests to 
fetch web pages. These tasks are I/O-bound because they spend a lot of
time waiting for responses from servers. Multithreading can significantly
improve the performance by allowing multiple web pages to be fetched concurrently.

'''

'''

https://python.langchain.com/v0.2/docs/introduction/

https://python.langchain.com/v0.2/docs/concepts/

https://python.langchain.com/v0.2/docs/tutorials/
'''

import threading
import requests
from bs4 import BeautifulSoup

urls=[
'https://python.langchain.com/v0.2/docs/introduction/',

'https://python.langchain.com/v0.2/docs/concepts/',

'https://python.langchain.com/v0.2/docs/tutorials/'

]

def fetch_content(url):
    response=requests.get(url)
    soup=BeautifulSoup(response.content,'html.parser')
    print(f'Fetched {len(soup.text)} characters from {url}')

threads=[]

for url in urls:
    thread=threading.Thread(target=fetch_content,args=(url,))
    threads.append(thread)
    thread.start()

for thread in threads:
    thread.join()

print("All web pages fetched")