# Data Processing & Iteration Tools
Sometimes, we need to apply the same operation to every element in a sequence, such as converting units, formatting strings, or scaling numbers.

## 1. `map()` – Transform Items in a Sequence

### What is `map()`?
The `map()` function is used to apply a given function to **each item in an iterable** (like a list or tuple).  
It returns a new iterable (a `map` object), which can be converted to a list.

Think of it like saying:  
-> “Take this function and apply it to everything in this list.”


#### Business Use Case: Length Conversion for a Product Catalog

Imagine you're managing an online clothing store.  
Your product measurements (e.g., length of jeans or sleeves) are stored in **inches**,  
but your international customers need to see the sizes in **centimeters**.

You can use `map()` to convert all measurements at once.

Conversion rate: **1 inch = 2.54 cm**


In [1]:
# Function to convert inches to centimeters

def inches_to_cm(length_in_inches):
    return length_in_inches * 2.54

In [2]:
# List of product lengths in inches

lengths_in_inches = [28, 30, 32, 34]

In [3]:
# Apply conversion using map

lengths_in_cm = list(map(inches_to_cm, lengths_in_inches))

In [5]:
# Print the converted lengths

print("Product lengths in cm:", lengths_in_cm)

Product lengths in cm: [71.12, 76.2, 81.28, 86.36]


In [6]:
# Another example to use map() in a given function and iteraete each item
def square(x):
    return x*x

In [8]:
numbers = [1,2,3,4]
list(map(square,numbers)) # Applies the 'square' function on the 'numbers' iterable

[1, 4, 9, 16]

In [10]:
# It returns a map object. To ger the results , convert it to a list or iterate over it
result = map(str.upper,['apple','banana'])
list(result)

['APPLE', 'BANANA']

It works with any iterable, not just lists. It works with tuples, sets or even generators

It can also be combined with lambda functions for simple transformations

`map(lambda x: x + 10, data)`

It offers a clean and memory-efficient way to apply tranformations to data. 

## 2. `filter()` – Keep Only Matching Items

### What is `filter()`?
The `filter()` function is used to **extract items from a list** (or any iterable) that meet a certain condition.  
It takes a function that returns `True` or `False`, and **keeps only the items where the function returns `True`**.


#### Business Use Case: Filtering Active Products from Inventory

You are working with an inventory management system.  
Each product in your catalog has a flag that indicates whether it is active (available for sale) or inactive (discontinued).  
You want to generate a list of only the **active products**.

In [14]:
# Function to check if a product is active

def is_active(product):
    return product["status"] == "active"

In [15]:
# List of products in the inventory

products = [
    {"name": "T-shirt", "status": "active"},
    {"name": "Jeans", "status": "inactive"},
    {"name": "Sneakers", "status": "active"},
    {"name": "Jacket", "status": "inactive"},
]

In [16]:
# Use filter to get only active products

active_products = list(filter(is_active, products))

In [18]:
# Print the result

print("Active products:")

for product in active_products:
    print("-", product["name"])

Active products:
- T-shirt
- Sneakers


Unlike **map()**, which transforms every item, **filter()** helps in filtering data based on the logic you define. This is useful when cleaning data, removing invalid values, or selecting specific categories. For example, you might want to keep only positive numbers, non-empty strings, or rows where a condition is met.


Similar to **map()**, we can use **filter()** as: **filter(function, iterable)**. The function should return a Boolean value for each element. Like **map()**, the result is an object, so you often convert it to a list or loop through it. 

In [19]:
# Example to illustrate the use of the filter() function

numbers = [1,2,3,4,5,6]

def is_even(number):
    return number%2 == 0  # Returns True or False

In [20]:
even_numbers = filter(is_even,numbers)

In [21]:
print(list(even_numbers))

[2, 4, 6]


In [23]:
# we can use filter() with lambda expressions as well mentioned in below example

nums = [1,-2,3,0,-5]

positive = list(filter(lambda x:x>0,nums))
print(positive)

[1, 3]


Because **filter()** doesn’t create a new list until needed, it can also be memory-efficient when working with large sequences or datasets. 

## 3. `reduce()` – Reduce a Sequence to a Single Value
Sometimes, we might need to combine a list of values into a single result, like summing a list of numbers or multiplying several numbers together. Python’s __reduce()__ function helps us do that by applying a function cumulatively across elements of a sequence. 
### What is `reduce()`?
`reduce()` comes from the `functools` module.  
It applies a function **cumulatively** to items in a list, **reducing** the list to a single final value.

For example:  
`reduce(add, [1, 2, 3, 4]) → (((1 + 2) + 3) + 4) → 10`

#### Business Use Case: Total Sales Calculation

You work in the finance department of an online store.  
At the end of the day, you have a list of sales amounts for individual orders,  
and you want to calculate the **total sales revenue**.

In [24]:
from functools import reduce

# Function to add two numbers

def add_sales(total, current_sale):
    return total + current_sale

In [25]:
# List of order amounts

daily_sales = [1200, 2500, 3200, 1500, 2750]

In [26]:
# Use reduce to calculate total sales

total_revenue = reduce(add_sales, daily_sales)

In [28]:
# Print the result

print("Total Revenue for the day:", total_revenue)

Total Revenue for the day: 11150


The reduce() function takes two arguments: a binary function (a function that takes two inputs) and an iterable. It applies the function to the first two elements, then takes the result and applies the function to the next element, and so on, until one final result is produced.

To use **reduce()**, you need to import it from Python’s **functools** module:

In [29]:
# Another example
from functools import reduce

def add(x,y):
    return (x+y)

In [30]:
reduce(add,[1,2,3,4])

10

In [31]:
# It can also be used with lambda functions
reduce(lambda x,y:x+y,[1,2,3,4])

10

This is equivalent to:  ( ( ( 1 + 2 ) + 3 ) + 4 )

It’s useful for operations like cumulative sums, finding maximums, building strings from pieces, or even combining multiple dictionaries or sets. You can also provide an optional initial value, which is used as the first argument instead of the first item of the list.

Although **reduce()** can sometimes be replaced by built-in functions like **sum()** or **max()**, it shines in cases where you need to define how values should be combined. reduce() is a powerful tool for collapsing a sequence of values into a single result through repeated application of a function. 

## 4. `zip()` – Combine Multiple Lists Element-Wise
When working with multiple lists or sequences, we may need to combine them element by element. Python’s **zip()** function makes this easy. It allows you to pair elements from two or more iterables, creating tuples that group corresponding values together. 

### What is `zip()`?
`zip()` is used to **combine two or more iterables** (like lists or tuples) **element by element**.  
It returns a zip object with tuples containing one element from each iterable.

Think of it as “zipping up” two lists into pairs.

#### Business Use Case: Matching Product Names with Prices

You’re preparing a price list for your product catalog.  
You have one list with **product names** and another with **their prices**.  
You want to **combine them together** to show each product with its corresponding price.

In [32]:
# List of product names

products = ["T-shirt", "Jeans", "Sneakers", "Jacket"]

In [33]:
# List of prices

prices = [799, 1299, 2199, 1799]

In [34]:
# Use zip to pair each product with its price

product_catalog = list(zip(products, prices))

In [36]:
# Print the combined product-price pairs

print("Product Catalog:")

for product, price in product_catalog:
    print(f"- {product}: ₹{price}")

Product Catalog:
- T-shirt: ₹799
- Jeans: ₹1299
- Sneakers: ₹2199
- Jacket: ₹1799


The **zip()** function takes any number of iterables and returns an iterator of tuples. Each tuple contains one element from each iterable, grouped by their position. 

In [37]:
# Another example

names = ['Shadab','JP','Loki']
scores = [85,92,78]

for name, score in zip(names,scores):
    print(f'{name}:{score}')

Shadab:85
JP:92
Loki:78


If the iterables are of unequal length, **zip()** stops at the shortest one by default. This prevents errors from missing elements.

In [38]:
# We can also use zip() to unzip data by applying it with unpacking

pairs = [('a',1),('b',2),('c',3)]
letters,numbers = zip(*pairs)

In [40]:
print(letters)

('a', 'b', 'c')


In [41]:
print(numbers)

(1, 2, 3)


The **zip()** function helps streamline the process of combining multiple sequences in parallel, making your code neater and reducing the need for index-based access. It is especially useful in data alignment, table construction, and many iteration-based tasks.

## 5. `enumerate()` – Loop with an Automatic Index
While iterating over a sequence, you may need both the value and the index of each item in a sequence. Python’s **enumerate()** function lets you loop over an iterable while also keeping track of the index, without using manual counters. 
### What is `enumerate()`?
The `enumerate()` function adds a counter to an iterable (like a list) and returns a tuple:  
`(index, item)`

This is especially useful when you need the **position of each item** while looping.


#### Business Use Case: Numbering Customer Feedback

Suppose you're analysing customer feedback.  
You have a list of comments, and you want to **print them with serial numbers** for a summary report.

Instead of manually using a counter, `enumerate()` can do this for you.

In [42]:
# List of customer feedback

feedback_list = [
    "Great service!",
    "Fast delivery.",
    "Product quality could be better.",
    "Will definitely buy again."
]

In [44]:
# Print feedback with numbering using enumerate

print("Customer Feedback Summary:")

for index, comment in enumerate(feedback_list, start=1):
    print(f"{index}. {comment}")

Customer Feedback Summary:
1. Great service!
2. Fast delivery.
3. Product quality could be better.
4. Will definitely buy again.


The **enumerate()** function adds a counter to an iterable and returns it as an enumerate object. This object can be looped over, where each element is a pair: the index and the item.

In [45]:
# Another example
items = ['apple','banana','cherry']

for index , item in enumerate(items):
    print(index,item)

0 apple
1 banana
2 cherry


In [47]:
# We can also specify a starting index if needed
for index , item in enumerate(items,start =1):
    print(index,item)

1 apple
2 banana
3 cherry


Using **enumerate()** makes your code shorter, safer, and easier to read compared to tracking the index manually with something like:

`for i in range( len( items ) )`

This is especially useful when working with labelled data, structured records, or in cases where the position of the item carries meaning.

## 6. `any()` – Check If At Least One Condition Is True
Python’s built-in **any()** function lets us check if at least one item in a list or sequence meets a certain condition. It returns **True** if any element in the iterable is **True** and **False** otherwise.
### What is `any()`?
The `any()` function returns `True` if **at least one item** in an iterable is `True`.

It’s a shortcut to check:  
-> “Is there **any** truthy value in this list?”

#### Business Use Case: Checking for Urgent Support Tickets

Imagine you're working in customer support.  
At the start of the day, your dashboard shows a list of support tickets and whether any are marked as **urgent**.  
You want to check:  
-> “Do we have **any urgent tickets** to prioritise?”

In [48]:
# List of support tickets marked with urgency

ticket_urgency_flags = [False, False, True, False]

In [49]:
# Check if any ticket is urgent

has_urgent_ticket = any(ticket_urgency_flags)

In [51]:
# Print the outout

if has_urgent_ticket:
    print("🚨 Urgent ticket(s) found. Prioritise immediately!")
else:
    print("✅ No urgent tickets at the moment.")

🚨 Urgent ticket(s) found. Prioritise immediately!


The **any()** function takes a single iterable like a list, set, or generator, and checks whether at least one item is logically **True**. If all items are **False**, it returns **False**. 

In [53]:
# A simple example
values = [0,0,5,0]
any(values) # returns True because 5 , represented as a boolean is True

True

This can help validate input, check if a row has any non-zero values, or decide whether to proceed based on a condition. It’s often used with list comprehensions or generator expressions

In [54]:
scores = [0,0,0,0]
any(score>=50 for score in scores) # will return False

False

The **any()** function short-circuits, meaning it stops as soon as it finds a **True** value, making it efficient for large sequences as well.

## 7. `all()` – Check If All Conditions Are True

### What is `all()`?
When evaluating datasets or user inputs, it’s often necessary to confirm that every item meets a given condition. For example, checking that all fields in a form are filled, or that all scores are above a threshold. 
The `all()` function returns `True` **only if every item** in the iterable is `True`.

Think of it as asking:  
-> “Are **all conditions met**?”


#### Business Use Case: Verifying Checklist Completion for Order Dispatch

Imagine you're in charge of order fulfillment.  
Each order must go through a checklist before dispatch:  
- Payment done  
- Packaging completed  
- Address verified

You store the status of each checklist item as a Boolean (`True`/`False`) and want to check if an order is **ready to be shipped**.


In [55]:
# Checklist for an order before dispatch

order_checklist = {
    "payment_done": True,
    "packaging_done": True,
    "address_verified": True
}

In [56]:
# Use all() to check if everything is ready

is_ready_to_dispatch = all(order_checklist.values())

In [58]:
# Print the outout

if is_ready_to_dispatch:
    print("📦 Order is ready to be dispatched.")
else:
    print("⏳ Some checks are still pending.")

📦 Order is ready to be dispatched.


The **all()** function returns **True** if every item in the provided iterable is logically **True**. If even a single element evaluates to **False**, the result is **False**.

In [59]:
# Example
temperatures = [22,25,27,20]
all(t>0 for t in temperatures) # returns True

True

**all()** also short-circuits and stops evaluating as soon as it encounters a **False** value. It can also work directly with sequences of Boolean values

In [60]:
# Example
flags = [True,True,False]
all(flags)  # Returns False

False

**all()** is a reliable and efficient tool when the requirement is to ensure that every element in a collection satisfies a condition. It allows code to remain concise and readable, especially in validation-heavy workflows.

## 8. `sum()`, `max()`, `min()` – Aggregate Values from a List

### What Are These?
In data-focused tasks, it’s common to calculate totals, find the highest or lowest values, or make simple aggregations. Python provides built-in functions like **sum(), max(), and min()** that help you perform these operations efficiently and without extra loops. 

- `sum(iterable)` → Returns the **total** of all items.
- `max(iterable)` → Returns the **largest** item.
- `min(iterable)` → Returns the **smallest** item.

These are built-in aggregation functions — great for summarising numeric data.

#### Business Use Case: Daily Order Statistics

You work in the analytics team of a delivery service.  
Each day, your system records the number of orders delivered per city.  
You want to generate a summary:
- Total orders delivered  
- City with maximum deliveries  
- City with minimum deliveries

In [62]:
# Dictionary with city-wise delivery counts

orders_per_city = {
    "Mumbai": 150,
    "Delhi": 120,
    "Bangalore": 180,
    "Chennai": 90,
    "Kolkata": 110
}

In [63]:
# Calculate total deliveries

total_orders = sum(orders_per_city.values())

In [64]:
# Find city with max and min deliveries

max_city = max(orders_per_city, key=orders_per_city.get)
min_city = min(orders_per_city, key=orders_per_city.get)

In [66]:
# Print the summary

print("📊 Daily Delivery Summary:")
print("Total orders delivered:", total_orders)
print("City with highest deliveries:", max_city)
print("City with lowest deliveries:", min_city)

📊 Daily Delivery Summary:
Total orders delivered: 650
City with highest deliveries: Bangalore
City with lowest deliveries: Chennai


In [67]:
# The sum() function returns the total of all numeric elements in an iterable
scores = [10,20,30]
total = sum(scores)

In [69]:
# it also accepts an optional start value
sum(scores,100)

160

In [71]:
# The max() and min() functions return the largest and smalled elements in a sequence
values = [4,1,9,3]
print(max(values))
print(min(values))

9
1


In [72]:
# They work not only with numbers but also with strings, tuples or any comparable elements. 
max(['cat','zebra','ant'])

'zebra'

These functions are used frequently in data summarisation, exploratory analysis, and preprocessing.

## 9. `key=` Parameter – Customise How Values Are Compared

### What Is the `key=` Parameter?
In functions like **max(), min(), and sorted()**, Python offers an optional **key** parameter that allows you to define custom logic for comparing elements. This makes it possible to work with more complex data structures, such as dictionaries, tuples, or objects, and decide what exactly should be considered when evaluating the values. 
The `key=` parameter is used in functions like `sorted()`, `max()`, and `min()` to specify **how the values should be compared**.

#### Business Use Case: Find the Highest Rated Product

Imagine you run a product review site.  
Each product has a name and a customer rating out of 5.

You want to find the product with the **highest rating** using `max()`.  
But instead of comparing full dictionaries, you use the `key=` parameter to compare based on `"rating"`.

In [73]:
# List of products with ratings

products = [
    {"name": "Smartphone", "rating": 4.6},
    {"name": "Laptop", "rating": 4.8},
    {"name": "Tablet", "rating": 4.3}
]

In [74]:
# Use key= to find product with highest rating

top_product = max(products, key=lambda product: product["rating"])

In [76]:
# Print the result

print("🌟 Highest rated product:")
print(f"{top_product['name']} (Rating: {top_product['rating']})")

🌟 Highest rated product:
Laptop (Rating: 4.8)


The **key** parameter takes a function as its value. That function is applied to each item before the main comparison happens. This does not change the data itself but just the way it is compared.

In [77]:
# Example to find the longest string in a list
words = ['apple','banana','fig']
max(words,key=len)

'banana'

In [79]:
# when dealing with tuples
data = [('Shadab',88),('JP',95),('Loki',90)]
max(data,key=lambda x:x[1])

('JP', 95)

This is especially useful when working with structured data, such as rows from a CSV file or elements from a database query. Instead of extracting and comparing values manually, you define how comparisons should happen using **key**.

In [80]:
# The same logic applies to sorting. Here , the list is sorted based on the second item in each tuple
sorted(data,key=lambda x:x[1])

[('Shadab', 88), ('Loki', 90), ('JP', 95)]

## 10. `tqdm` – Progress Bars Made Easy

### What is `tqdm`?
When working with large datasets or time-consuming loops, it's helpful to know how long a task will take or how much progress has been made. The **tqdm** library provides a simple and efficient way to add progress bars to Python loops, giving instant visual feedback during execution.
`tqdm` is a third-party Python library that displays a **progress bar** for loops and iterable processing.

It’s especially useful when:
- You have **long-running operations**
- You want to show **live feedback** (e.g., in data processing, simulations, uploads)

> **Note:** You must install it first using:
> ```bash
> pip install tqdm
> ```


#### Business Use Case: Processing Orders with Progress Bar

Imagine you’re processing a large number of customer orders (e.g., applying tax, formatting data).  
You want to show a **progress bar** so the user knows how much work is done and how much is left.


In [81]:
from tqdm import tqdm
import time  # Simulate delay for each order

In [82]:
# List of order IDs

order_ids = [f"ORD{1000 + i}" for i in range(10)]

In [83]:
# Simulate processing each order with a progress bar

print("🔄 Processing orders...")

for order in tqdm(order_ids):
    
    # Simulate order processing time
    time.sleep(0.2)

🔄 Processing orders...


100%|███████████████████████████████████████████| 10/10 [00:02<00:00,  4.88it/s]


**tqdm** wraps around any iterable, such as a list or range, and displays a real-time progress bar in the console or notebook. 

This automatically shows the loop's current position, estimated time remaining, and speed of execution.

tqdm can also be used with file reading/writing operations and list comprehensions.

`results = [process(x) for x in tqdm(data)]`

It works seamlessly in scripts, Jupyter notebooks, and even inside multiprocessing environments.