# How to replicate Apple's "Memories" feature

## What this is about

This Python notebook is about replicating Apple's Memories feature that comes with iOS.

I'm highly suspicious of the following:
* Apple runs inference ONLY on device to generate these memories
  * This is since even if you don't upload your photos to iCloud, you still get memories created every now and then.
* Apple probably uses anomaly detection algorithms (maybe even shallow learning) to know when you've traveled somewhere else
  * This is in particular about the trip memories iPhones create.
  * I'm not ruling out a more simple heuristic like "if the user has moved locations after being in the same city for 30 days, consider that a trip and make a memories collection"
* Apple also uses a type of image classification to know when you've taken a picture at a wedding, with pets, kids or a few other types of photos.
  * This is since apple generates special memories for each of these types (e.g. "Playtime with Steven" where Steven is your toddler)

## The plan

My plan only encompasses creating memories of a specific type of category. I won't deal with anomaly detection to find "trips" since that will require its own dataset that I can find at another time and elaborate on how that problem can be solved.

What I'll focus on is categorizing images into a few categories:
1. Vacation
2. Romantic
4. Nature
5. Art
6. City Landscapes
7. Animals
8. Selfies
9. Food
10. Sports
11. None

The plan is to:

1. Download an open image dataset
2. Use a large VLM (Anywhere from 1-5B parameters) to classify images into categories that can be used in a smaller classification model.
3. Fine-tune a smaller classification model (50M parameters or less), in this case I'll use Resnet50 since it's a well known model that just works for image classification and is small enough.

After fine-tuning Resnet50:
* Inference will run really fast if we were to create either a web app or if we were to run inference on device
* We would not have high memory requirements, loading 50M parameters can be done on modern phones or laptops.

For the purposes of experimentation this can work!

## The Dataset

This is a link to the dataset:

https://huggingface.co/datasets/opendiffusionai/pexels-photos-janpf

This is a link to the specific zip file to test with, the dataset is huge:

https://huggingface.co/datasets/opendiffusionai/pexels-photos-janpf/blob/main/0_0-3.zip

In [12]:
from transformers import AutoModelForCausalLM, pipeline
from PIL import Image
from torch import nn
import pandas as pd
import torchvision.models as models
from torch.utils.data import DataLoader, random_split
from torchvision import transforms, datasets
from torchsummary import summary
import torch

In [None]:
from transformers import AutoModelForVision2Seq

# Microsoft's Moondream model
moondream_model = AutoModelForCausalLM.from_pretrained(
    "vikhyatk/moondream2",
    revision="2025-06-21",
    trust_remote_code=True,
    device_map={"": "mps"}  # ...or 'mps', on Apple Silicon
)

## Step 1 - Labeling the images for training

In [None]:
# Delete existing file
import os

try:
  os.remove('./labels.csv')
except FileNotFoundError:
    print(f"Error: File './labels.csv' not found.")

In [None]:
import os

try:
  os.remove('./labels.csv')
except FileNotFoundError:
    print(f"Error: File './labels.csv' not found.")

query = """
We are creating a 'memories' photo library for user, classify this image to one of the following categories:
1. Vacation
2. Romantic
4. Nature
5. Art
6. City Landscapes
7. Animals
8. Selfies
9. Food
10. Sports
11. No Category

If you find that the image does not fit very well to any category, just return "None".
"""

print("Starting labeling process using Moondream model...")

labeling_testing_batch_size = 30
images_folder = "images/pexels_dataset"
image_paths = os.listdir(images_folder)
memory_examples = {
    'Vacation': [],
    'Romantic': [],
    'Nature': [],
    'Art': [],
    'City Landscapes': [],
    'Animals': [],
    'Selfies': [],
    'Food': [],
    'Sports': [],
}

# try:
#     with open('./labels.csv', 'w') as file:
#         for image_path in image_paths:
#             image = Image.open(f"{images_folder}/{image_path}")
#             model_response = moondream_model.query(image, query)["answer"]
#             print(f"{str(i)} - File path: {image_path} \nModel label output: {model_response}")
#             file.write(f"{image_path},{model_response}\n")

#             print("-"*100)
#             i += 1
# except FileExistsError:
#     print("Error: file label.csv already exists")

print()
print(f"Finished writing to file labels.csv!")
print(f"Finished labeling testing!")


## Count the number of images per category

In this part I've counted the number of photos per category, to get an idea of how many and what kind of memories we can create.

In [8]:
# Count the number of images in each category
import pandas as pd

# Read the labels.csv file
df = pd.read_csv('labels.csv')

# Count the number of images in each category
category_counts = df['category'].value_counts()

# Print the counts
print(category_counts)



category
Nature             835
No Category        826
Food               109
Romantic            88
City Landscapes     86
Sports              38
Vacation             9
Selfies              7
9                    5
5. Art               3
5                    2
Art                  1
8                    1
Name: count, dtype: int64


## Fine-tune Resnet50

In this section I'll do fine-tuning for Resnet50 with the `labels.csv` that was created using Moondream2.

In [None]:
device = torch.device(
    "cuda"
    if torch.cuda.is_available()
    else "mps"
    if torch.backends.mps.is_available()
    else "cpu"
)
print(f"Using {device} device")

In [None]:
model = models.resnet50(weights=models.ResNet50_Weights.DEFAULT)

number_of_classes = 10

print(model.fc.out_features)

new_layer = nn.Linear(model.fc.in_features, number_of_classes)
model.fc = new_layer

summary(model, (3, 224, 224))

model.to(device)



In [None]:
# Version manually reading the files from the directory
from PIL import Image

train_batch_size = 64
test_batch_size = 64

# Nature             835
# No Category        826
# Food               109
# Romantic            88
# City Landscapes     86
# Sports              38
# Vacation             9
# Selfies              7
# 9                    5
# 5. Art               3
# 5                    2
# Art                  1
# 8                    1

mappings = {
  "Nature": 0,
  "No Category": 1,
  "Food": 2,
  "Romantic": 3,
  "City Landscapes": 4,
  "Sports": 5,
  "Vacation": 6,
  "Selfies": 7,
  "Art": 8,
  "9": 9,
  "5": 10,
  "5. Art": 10,
  "8": 11,
}

def load_data():
  file = open('./labels.csv', 'r')
  files = []
  i = 0
  for line in file.readlines():
    i += 1
    if i == 1:
      continue
    parts = line.split(',')
    filename, label = parts
    label = mappings[label.replace('\n', '')]
    files.append([filename, int(label)])
  return files

def load_image(path):
  image = Image.open(path).convert('RGB')
  return image

IMAGES_PATH = './images/pexels_dataset/'

data = load_data()

# print(len(data))

transform_image = transforms.Compose(
    [
        # Resize images to 224x224 as required by the AlexNet model
        transforms.Resize((224, 224)),
        transforms.ToTensor(),
        transforms.Normalize(
            mean=[0.485, 0.456, 0.406],
            std=[0.229, 0.224, 0.225],  # Resnet-specific values
        ),
    ]
)

def create_train_dataset(raw_data):
  dataset = []
  for datapoint in raw_data:
    image = load_image(IMAGES_PATH + datapoint[0])
    image = transform_image(image)
    dataset.append([image, datapoint[1]])
  return dataset

train_dataset = create_train_dataset(data)

train_dataset, test_dataset = random_split(train_dataset, [0.8, 0.2])
train_dataset, val_dataset = random_split(train_dataset, [0.8, 0.2])

# print(len(train_dataset))

train_loader = DataLoader(train_dataset, batch_size=train_batch_size, shuffle=True)
val_loader = DataLoader(val_dataset, batch_size=test_batch_size)
test_loader = DataLoader(test_dataset, batch_size=test_batch_size)


027382484c18ef2139e59456c899bdc9.jpg - Nature

02f1cc2edadf2f2d460343ea8c41a134.jpg - Nature

001d152cda4b00291c570b14520c8480.jpg - No Category

033e71c159e788c4986b9b4a6004a0df.jpg - Nature

0304e6064706d00ee0d6a9d1e4266a73.jpg - No Category

001f139ef55c5620dbac25ccccddf60b.jpg - No Category

0097d1105f2da7da7a65c735ed69ed38.jpg - Nature

0319671ab801b1941d3236a28dcffe38.jpg - Nature

0280954c5fc3e423f4da77950c21c2c3.jpg - No Category

00c28821b1f7418aa70b0e4a16211355.jpg - Nature

00d45c4d99113a106df4d7cadc409a44.jpg - No Category

000cdd354f75edd8fa13538e808b5248.jpg - No Category

027928662325c0b7ba40ac33badd2f3c.jpg - No Category

024ccd4170b635c1492b0da2e21dde2c.jpg - Sports

002e037378c4352175af4b9f0307192d.jpg - No Category

02e451f46009d5e4578120026b593fdd.jpg - No Category

02a965b80bf5f44846da62ee6c1d29b1.jpg - No Category

00a2bf893b40631ce4f0985f0c1155bc.jpg - City Landscapes

01f3cfc3f47a07b8406b5b317d099400.jpg - No Category

005a1e95066f1970a90f22e5e4afb6af.jpg - Natu

In [None]:
num_epochs = 5

for epoch in range(num_epochs):
    model.train()
    i = 0
    with open('./labels.csv', 'r') as file:
        for line in file:
            image_path, label = line.strip().split(',')
            image = Image.open(f"{images_folder}/{image_path}")
            inputs = transform(image)
            inputs = inputs.unsqueeze(0)
            labels = torch.tensor(int(label))

# for inputs, labels in train_loader:
#     # labels = torch.tensor(labels)
#     if i > 0:
#         break
#     inputs, labels = inputs.to(device), labels.to(device)
#     optimizer.zero_grad()
#     outputs = model(inputs)
#     loss = criterion(outputs, labels)
#     loss.backward()
#     optimizer.step()

val_accuracy = calculate_accuracy(val_loader)
print(
    f"Epoch {epoch+1}/{num_epochs}, Validation accuracy: {val_accuracy}%"
)