Image Batch Processing Simulation
Scenario: You are working with a dataset of image metadata. Before feeding them into a machine learning model, you need to simulate loading each image based on its specified resolution and then normalize it.

Your Task:

Load the metadata DataFrame.

Write a function that takes a resolution string (e.g., '128x128') as input, parses it to get the height and width, and then creates a random NumPy array of that specific size.

Use the .apply() method to run this function on your resolution column, creating a new image_array column. Each element in this new column should be a NumPy array with the correct dimensions from the resolution column.

Write a second function to normalize an image array and use .apply() again on your new image_array column to create a final normalized_array column.

In [1]:
import numpy as np
import pandas as pd

image_metadata = {
    'image_id': ['img_001', 'img_002', 'img_003', 'img_004'],
    'label': ['cat', 'dog', 'cat', 'bird'],
    'resolution': ['128x128', '256x256', '128x128', '512x512']
}

metadata = pd.DataFrame(image_metadata)
# Get the dimentions by splitting each string by the "x" character
dimentions = pd.DataFrame(list(metadata['resolution'].str.split('x'))).astype(int)
# Create mock images with noise inside by populating the dimentions provided 
# (we have different dimention, so I have to use non-vectorized method .apply)
mock_imgs = dimentions.apply(lambda x: np.random.randint(0, 256, size=(x[0], x[1], 3),dtype=np.uint8), axis=1)
# Normalize data for the model to treat it equally for images with different brightness and contrast
normalized = mock_imgs.apply(lambda img: (img - img.mean()) / img.std())
# Values are not 0 to 1, as normalization in this case means avoiding the bightness and contrast bias, not standartizing the values from 0 to 1
print(normalized)

0    [[[-0.8551341632910463, 1.591122467167661, -1....
1    [[[-0.16976850816399625, -0.6165526248967987, ...
2    [[[-0.8888082591105001, 0.42430563301726, -1.5...
3    [[[0.1410415480284149, 1.4540779190106174, -0....
dtype: object
