# Overview of Florence 2
Provide a brief overview of the Florence 2 model, its architecture, and its applications in object detection and classification.

In [None]:
# Overview of Florence 2

Florence 2 is a state-of-the-art vision model developed for advanced object detection and classification tasks. It builds upon the success of its predecessor, Florence, by incorporating the latest advancements in deep learning and computer vision. The model is designed to be highly efficient and accurate, making it suitable for a wide range of applications, from autonomous vehicles to content moderation in social media platforms.

## Architecture

Florence 2 utilizes a transformer-based architecture, which allows it to effectively capture long-range dependencies in the input data. This is particularly beneficial for complex scenes where objects may be partially obscured or interact with each other in intricate ways. The model also employs a multi-scale feature extraction technique, enabling it to detect objects of various sizes with high precision.

## Applications

The primary applications of Florence 2 include object detection and classification. In object detection, the model can identify and locate multiple objects within an image, providing bounding boxes and class labels for each detected object. For classification, Florence 2 can accurately categorize the primary subject of an image among a predefined set of classes. This versatility makes Florence 2 an excellent choice for tasks ranging from surveillance and security to wildlife monitoring and urban planning.

# Setting Up the Environment
Guide on setting up the development environment, including the installation of necessary libraries and frameworks.

In [1]:
# Setting Up the Environment

# Install necessary libraries and frameworks
!pip install torch torchvision
!pip install transformers
!pip install florence2

# Verify installation
import torch
print(f"PyTorch version: {torch.__version__}")

import transformers
print(f"Transformers version: {transformers.__version__}")

# Check if CUDA is available for GPU acceleration
if torch.cuda.is_available():
    print("CUDA is available. GPU acceleration can be utilized.")
else:
    print("CUDA is not available. Training will proceed on CPU.")

Collecting torch
  Downloading torch-2.4.0-cp312-cp312-win_amd64.whl.metadata (27 kB)
Collecting torchvision
  Downloading torchvision-0.19.0-cp312-cp312-win_amd64.whl.metadata (6.1 kB)
Downloading torch-2.4.0-cp312-cp312-win_amd64.whl (197.8 MB)
   ---------------------------------------- 0.0/197.8 MB ? eta -:--:--
   ---------------------------------------- 0.2/197.8 MB 5.6 MB/s eta 0:00:36
   ---------------------------------------- 0.5/197.8 MB 6.7 MB/s eta 0:00:30
   ---------------------------------------- 0.9/197.8 MB 7.3 MB/s eta 0:00:28
   ---------------------------------------- 1.4/197.8 MB 8.3 MB/s eta 0:00:24
   ---------------------------------------- 2.0/197.8 MB 9.1 MB/s eta 0:00:22
    --------------------------------------- 2.6/197.8 MB 9.8 MB/s eta 0:00:20
    --------------------------------------- 3.1/197.8 MB 9.8 MB/s eta 0:00:20
    --------------------------------------- 3.6/197.8 MB 10.0 MB/s eta 0:00:20
    --------------------------------------- 4.1/197.8 MB 

ERROR: Could not find a version that satisfies the requirement florence2 (from versions: none)
ERROR: No matching distribution found for florence2


OSError: [WinError 126] The specified module could not be found. Error loading "c:\Users\varun\anaconda3\Lib\site-packages\torch\lib\fbgemm.dll" or one of its dependencies.

# PyTorch: Fine-Tuning a Pretrained Model
Demonstrate how to fine-tune a pretrained Florence 2 model using PyTorch for a custom object detection or classification task.

In [None]:
# Load the pretrained Florence 2 model
from transformers import AutoModelForImageClassification, AutoFeatureExtractor
model_name = "microsoft/florence2-base"
model = AutoModelForImageClassification.from_pretrained(model_name)
feature_extractor = AutoFeatureExtractor.from_pretrained(model_name)

# Prepare a custom dataset for fine-tuning
from torchvision.datasets import ImageFolder
from torchvision.transforms import Compose, Resize, ToTensor, Normalize

transform = Compose([
    Resize((224, 224)),
    ToTensor(),
    Normalize(mean=[0.485, 0.456, 0.406], std=[0.229, 0.224, 0.225]),
])

dataset = ImageFolder("path/to/your/dataset", transform=transform)

# Split the dataset into training and validation sets
from torch.utils.data import DataLoader, random_split
train_size = int(0.8 * len(dataset))
val_size = len(dataset) - train_size
train_dataset, val_dataset = random_split(dataset, [train_size, val_size])

train_loader = DataLoader(train_dataset, batch_size=32, shuffle=True)
val_loader = DataLoader(val_dataset, batch_size=32)

# Define the training loop
def train(model, dataloader, optimizer, criterion, device):
    model.train()
    running_loss = 0.0
    for images, labels in dataloader:
        images, labels = images.to(device), labels.to(device)
        optimizer.zero_grad()
        outputs = model(images)
        loss = criterion(outputs.logits, labels)
        loss.backward()
        optimizer.step()
        running_loss += loss.item()
    return running_loss / len(dataloader)

# Define the validation loop
def validate(model, dataloader, criterion, device):
    model.eval()
    running_loss = 0.0
    with torch.no_grad():
        for images, labels in dataloader:
            images, labels = images.to(device), labels.to(device)
            outputs = model(images)
            loss = criterion(outputs.logits, labels)
            running_loss += loss.item()
    return running_loss / len(dataloader)

# Fine-tune the model
import torch.optim as optim
from torch.nn import CrossEntropyLoss

device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
model.to(device)

optimizer = optim.Adam(model.parameters(), lr=1e-4)
criterion = CrossEntropyLoss()

epochs = 5
for epoch in range(epochs):
    train_loss = train(model, train_loader, optimizer, criterion, device)
    val_loss = validate(model, val_loader, criterion, device)
    print(f"Epoch {epoch+1}, Train Loss: {train_loss}, Val Loss: {val_loss}")

# TensorFlow and Keras: Fine-Tuning a Pretrained Model
Show how to fine-tune a pretrained Florence 2 model using TensorFlow and Keras for a specific task.

In [None]:
# TensorFlow and Keras: Fine-Tuning a Pretrained Model

# Import TensorFlow and Keras
import tensorflow as tf
from tensorflow import keras
from tensorflow.keras import layers, models

# Load and preprocess the dataset for TensorFlow
def preprocess_image(image):
    image = tf.image.resize(image, (224, 224))
    image = keras.applications.resnet.preprocess_input(image)
    return image

def prepare_dataset(dataset_path, batch_size=32):
    dataset = tf.keras.preprocessing.image_dataset_from_directory(
        dataset_path,
        image_size=(224, 224),
        batch_size=batch_size,
        label_mode='categorical'
    ).map(lambda x, y: (preprocess_image(x), y))
    return dataset

# Prepare TensorFlow dataset
dataset_path = "path/to/your/dataset"
batch_size = 32
tf_dataset = prepare_dataset(dataset_path, batch_size)
tf_train_dataset = tf_dataset.take(int(0.8 * len(tf_dataset)))
tf_val_dataset = tf_dataset.skip(int(0.8 * len(tf_dataset)))

# Load a pretrained model
base_model = tf.keras.applications.ResNet50(weights='imagenet', include_top=False)
base_model.trainable = False  # Freeze the base model

# Add custom layers on top of the base model
model = models.Sequential([
    base_model,
    layers.GlobalAveragePooling2D(),
    layers.Dense(1024, activation='relu'),
    layers.Dropout(0.2),
    layers.Dense(len(dataset.classes), activation='softmax')  # Assuming 'dataset' has a 'classes' attribute
])

# Compile the model
model.compile(optimizer=keras.optimizers.Adam(1e-4),
              loss='categorical_crossentropy',
              metrics=['accuracy'])

# Train the model
history = model.fit(tf_train_dataset, validation_data=tf_val_dataset, epochs=5)

# Unfreeze some layers of the base model for fine-tuning
base_model.trainable = True
fine_tune_at = 100  # Example: unfreeze layers from this layer onwards
for layer in base_model.layers[:fine_tune_at]:
    layer.trainable = False

# Recompile the model (necessary after modifying layer.trainability)
model.compile(optimizer=keras.optimizers.Adam(1e-5),  # Lower learning rate
              loss='categorical_crossentropy',
              metrics=['accuracy'])

# Continue training (fine-tuning)
fine_tune_epochs = 5
total_epochs = epochs + fine_tune_epochs

history_fine = model.fit(tf_train_dataset,
                         validation_data=tf_val_dataset,
                         epochs=total_epochs,
                         initial_epoch=history.epoch[-1])

# Detectron2: Fine-Tuning for Object Detection
Explain how to use Detectron2 for fine-tuning Florence 2 on an object detection task, including dataset preparation and training.

In [None]:
# Install Detectron2
!pip install detectron2 -f https://dl.fbaipublicfiles.com/detectron2/wheels/cu102/torch1.8/index.html

# Import Detectron2 utilities
from detectron2.engine import DefaultTrainer
from detectron2.config import get_cfg
from detectron2 import model_zoo
from detectron2.data import MetadataCatalog, DatasetCatalog
from detectron2.utils.visualizer import Visualizer
from detectron2.data.datasets import register_coco_instances

# Register the custom dataset for Detectron2
dataset_name = "custom_dataset"
dataset_path = "path/to/your/dataset"
json_annotation = "path/to/your/annotation.json"

register_coco_instances(dataset_name, {}, json_annotation, dataset_path)

# Prepare the Detectron2 configuration
cfg = get_cfg()
cfg.merge_from_file(model_zoo.get_config_file("COCO-Detection/faster_rcnn_R_50_FPN_3x.yaml"))
cfg.DATASETS.TRAIN = (dataset_name,)
cfg.DATASETS.TEST = ()  # No testing dataset
cfg.DATALOADER.NUM_WORKERS = 2
cfg.MODEL.WEIGHTS = model_zoo.get_checkpoint_url("COCO-Detection/faster_rcnn_R_50_FPN_3x.yaml")  # Initialize from model zoo
cfg.SOLVER.IMS_PER_BATCH = 2
cfg.SOLVER.BASE_LR = 0.02
cfg.SOLVER.MAX_ITER = 1000  # Adjust according to dataset size
cfg.MODEL.ROI_HEADS.NUM_CLASSES = len(MetadataCatalog.get(dataset_name).thing_classes)  # Set the number of classes

# Initialize the Detectron2 trainer and start fine-tuning
trainer = DefaultTrainer(cfg)
trainer.resume_or_load(resume=False)
trainer.train()

# MMdetection: Fine-Tuning for Object Detection
Detail the process of fine-tuning Florence 2 with MMdetection for object detection, covering dataset setup and model training.

In [None]:
# Install MMdetection
!pip install openmim
!mim install mmdet

# Import MMdetection utilities
from mmdet.apis import init_detector, inference_detector, show_result_pyplot
from mmdet.datasets import build_dataset
from mmdet.models import build_detector
from mmcv import Config
from mmdet.apis import train_detector

# Load configuration file for fine-tuning
config_file = 'path/to/your/config.py'  # Specify the path to the MMdetection config file tailored for Florence 2
checkpoint_file = 'path/to/your/checkpoint.pth'  # Specify the path to the pretrained Florence 2 weights if available

# Modify the configuration to adapt to your dataset
cfg = Config.fromfile(config_file)
cfg.model.pretrained = None  # We are fine-tuning, so we don't need the pretrained weights
cfg.data.train.ann_file = 'path/to/your/train/annotation.json'  # Path to training annotations in COCO format
cfg.data.train.img_prefix = 'path/to/your/train/images/'  # Path to training images
cfg.data.val.ann_file = 'path/to/your/val/annotation.json'  # Path to validation annotations in COCO format
cfg.data.val.img_prefix = 'path/to/your/val/images/'  # Path to validation images
cfg.data.test.ann_file = 'path/to/your/test/annotation.json'  # Path to test annotations in COCO format
cfg.data.test.img_prefix = 'path/to/your/test/images/'  # Path to test images

# Adjust the number of classes based on your dataset
cfg.model.roi_head.bbox_head.num_classes = len(dataset.classes)  # Assuming 'dataset' has a 'classes' attribute

# Adjust the learning rate and other hyperparameters based on your dataset size and desired training duration
cfg.optimizer.lr = 1e-4
cfg.lr_config.warmup = None
cfg.log_config.interval = 10

# Build dataset and model
datasets = [build_dataset(cfg.data.train)]
model = build_detector(cfg.model, train_cfg=cfg.get('train_cfg'), test_cfg=cfg.get('test_cfg'))
model.CLASSES = datasets[0].CLASSES

# Fine-tune the model
train_detector(model, datasets, cfg, distributed=False, validate=True)