# Generator vs List Comprehension in Python

This notebook contains an experiment to understand the differences in performance and memory usage between list comprehensions and generator expressions in Python.

## Experiment Results

Based on the experiment conducted over 10,000 iterations, here are the averaged results:

### Execution Time
- **Average List Comprehension Time**: 7.090044021606445e-06 seconds
- **Average Generator Expression Time**: 6.876826286315918e-06 seconds

### Memory Usage
- **Average List Comprehension Memory Usage**: 360.0 bytes
- **Average Generator Expression Memory Usage**: 144.0152 bytes

## System and Environment Information

The experiment was conducted with the following system and environment setup:



In [20]:
import platform
import sys
import subprocess

# Function to get system and environment information
def get_environment_info():
    info = {
        "Platform": platform.platform(),
        "Python Version": sys.version
        # "Installed Libraries": subprocess.check_output([sys.executable, "-m", "pip", "freeze"]).decode("utf-8")
    }
    return info

# Print environment information
environment_info = get_environment_info()
print("\nSystem and Environment Information:")
for key, value in environment_info.items():
    print(f"{key}: {value}")


System and Environment Information:
Platform: macOS-14.5-arm64-arm-64bit
Python Version: 3.9.18 | packaged by conda-forge | (main, Dec 23 2023, 16:35:41) 
[Clang 16.0.6 ]


In [17]:
import tracemalloc
import time

# Function to measure time and memory for list comprehension
def measure_list_comprehension():
    tracemalloc.start()
    start_time = time.time()
    list_comp = [x**10 for x in range(10)]
    for _ in list_comp:
        pass
    end_time = time.time()
    snapshot = tracemalloc.take_snapshot()
    tracemalloc.stop()
    return end_time - start_time, snapshot

# Function to measure time and memory for generator expression
def measure_generator_expression():
    tracemalloc.start()
    start_time = time.time()
    gen = (x**10 for x in range(10))
    for _ in gen:
        pass
    end_time = time.time()
    snapshot = tracemalloc.take_snapshot()
    tracemalloc.stop()
    return end_time - start_time, snapshot

# Function to calculate total memory usage from a snapshot
def calculate_total_memory(snapshot):
    snapshot = snapshot.filter_traces((
        tracemalloc.Filter(False, "<frozen importlib._bootstrap>"),
        tracemalloc.Filter(False, "<unknown>"),
    ))
    total = sum(stat.size for stat in snapshot.statistics("lineno"))
    return total

# Function to display memory usage statistics
def display_memory_usage(snapshot, label):
    print(f"{label} Memory Usage:")
    total_memory = calculate_total_memory(snapshot)
    print(f"Total allocated size: {total_memory} bytes")

# Run the experiment multiple times and average the results
num_iterations = 10000
list_times = []
gen_times = []
list_memory_usages = []
gen_memory_usages = []

for _ in range(num_iterations):
    list_time, list_snapshot = measure_list_comprehension()
    list_times.append(list_time)
    list_memory_usages.append(calculate_total_memory(list_snapshot))

    gen_time, gen_snapshot = measure_generator_expression()
    gen_times.append(gen_time)
    gen_memory_usages.append(calculate_total_memory(gen_snapshot))

# Calculate average times and memory usages
avg_list_time = sum(list_times) / num_iterations
avg_gen_time = sum(gen_times) / num_iterations
avg_list_memory = sum(list_memory_usages) / num_iterations
avg_gen_memory = sum(gen_memory_usages) / num_iterations

print(f"Average List Comprehension Time: {avg_list_time} seconds")
print(f"Average Generator Expression Time: {avg_gen_time} seconds")
print(f"Average List Comprehension Memory Usage: {avg_list_memory} bytes")
print(f"Average Generator Expression Memory Usage: {avg_gen_memory} bytes")


Average List Comprehension Time: 7.090044021606445e-06 seconds
Average Generator Expression Time: 6.876826286315918e-06 seconds
Average List Comprehension Memory Usage: 360.0 bytes
Average Generator Expression Memory Usage: 144.0152 bytes



## Inference

The experiment demonstrates that:

### Execution Time

- **Generator expressions** and **list comprehensions** have very similar execution times, with generator expressions being slightly faster on average. This similarity in execution time indicates that both methods are efficient for small-scale operations.

### Memory Usage

- **Generator expressions** use significantly less memory compared to list comprehensions. This is because generator expressions produce items on-the-fly and do not store the entire list in memory. This makes generators particularly useful when dealing with large datasets or when memory efficiency is a concern.

## Detailed Explanation

### List Comprehension

- **Execution**: The list comprehension `[x**10 for x in range(10)]` creates a list of 10 items, where each item is `x**10`.
- **Memory Consumption**: The entire list is created and stored in memory, which consumes more memory.
- **Time Consumption**: The time taken includes the time to create the list and iterate over it.

### Generator Expression

- **Execution**: The generator expression `(x**10 for x in range(10))` creates a generator object that produces items on-the-fly.
- **Memory Consumption**: The generator object itself is small and does not store the entire list in memory. It only keeps track of the current state and yields the next value when required.
- **Time Consumption**: The time taken is primarily for the iteration, as the values are computed on-the-fly.

## Use Cases

- **List Comprehension**: Use when you need to store the entire list in memory for multiple iterations or when the list size is small and memory usage is not a concern.
- **Generator Expression**: Use when you need to iterate over items once and want to minimize memory usage, especially with large datasets or infinite sequences. (*Data Streaming cases*)

## Summary

The experiment highlights the efficiency of generator expressions in terms of both execution time and memory usage compared to list comprehensions. Generators are a powerful tool for optimizing memory consumption and improving performance, especially when dealing with large datasets or **streams of data**.


Another example having large range - 1000

In [23]:
import tracemalloc
import time

# Function to measure time and memory for list comprehension
def measure_list_comprehension():
    tracemalloc.start()
    start_time = time.time()
    list_comp = [x**10 for x in range(1000)]
    for _ in list_comp:
        pass
    end_time = time.time()
    snapshot = tracemalloc.take_snapshot()
    tracemalloc.stop()
    return end_time - start_time, snapshot

# Function to measure time and memory for generator expression
def measure_generator_expression():
    tracemalloc.start()
    start_time = time.time()
    gen = (x**10 for x in range(1000))
    for _ in gen:
        pass
    end_time = time.time()
    snapshot = tracemalloc.take_snapshot()
    tracemalloc.stop()
    return end_time - start_time, snapshot

# Function to calculate total memory usage from a snapshot
def calculate_total_memory(snapshot):
    snapshot = snapshot.filter_traces((
        tracemalloc.Filter(False, "<frozen importlib._bootstrap>"),
        tracemalloc.Filter(False, "<unknown>"),
    ))
    total = sum(stat.size for stat in snapshot.statistics("lineno"))
    return total

# Function to display memory usage statistics
def display_memory_usage(snapshot, label):
    print(f"{label} Memory Usage:")
    total_memory = calculate_total_memory(snapshot)
    print(f"Total allocated size: {total_memory} bytes")

# Run the experiment multiple times and average the results
num_iterations = 10000
list_times = []
gen_times = []
list_memory_usages = []
gen_memory_usages = []

for _ in range(num_iterations):
    list_time, list_snapshot = measure_list_comprehension()
    list_times.append(list_time)
    list_memory_usages.append(calculate_total_memory(list_snapshot))

    gen_time, gen_snapshot = measure_generator_expression()
    gen_times.append(gen_time)
    gen_memory_usages.append(calculate_total_memory(gen_snapshot))

# Calculate average times and memory usages
avg_list_time = sum(list_times) / num_iterations
avg_gen_time = sum(gen_times) / num_iterations
avg_list_memory = sum(list_memory_usages) / num_iterations
avg_gen_memory = sum(gen_memory_usages) / num_iterations

print(f"Average List Comprehension Time: {avg_list_time} seconds")
print(f"Average Generator Expression Time: {avg_gen_time} seconds")
print(f"Average List Comprehension Memory Usage: {avg_list_memory} bytes")
print(f"Average Generator Expression Memory Usage: {avg_gen_memory} bytes")


Average List Comprehension Time: 0.0012172140598297118 seconds
Average Generator Expression Time: 0.0012257498025894165 seconds
Average List Comprehension Memory Usage: 48200.3793 bytes
Average Generator Expression Memory Usage: 152.0496 bytes


**Can observe significant difference in memory usage!**