<a href="https://colab.research.google.com/github/teoalcdor/tfg_teoalcdor/blob/main/od_youtube.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Detección de Objetos en un video de Youtube

Vamos a aplicar nuestros modelos YOLOv5m y SSD300 a un video de youtube para hacer detección en tiempo real.

## Librerías

Instalamos e importamos las librerías necesarias:

In [None]:
!pip install --force-reinstall https://github.com/yt-dlp/yt-dlp/archive/master.tar.gz
!pip install ultralytics

Collecting https://github.com/yt-dlp/yt-dlp/archive/master.tar.gz
  Downloading https://github.com/yt-dlp/yt-dlp/archive/master.tar.gz
[2K     [32m-[0m [32m2.8 MB[0m [31m10.1 MB/s[0m [33m0:00:00[0m
[?25h  Installing build dependencies ... [?25l[?25hdone
  Getting requirements to build wheel ... [?25l[?25hdone
  Preparing metadata (pyproject.toml) ... [?25l[?25hdone
Building wheels for collected packages: yt-dlp
  Building wheel for yt-dlp (pyproject.toml) ... [?25l[?25hdone
  Created wheel for yt-dlp: filename=yt_dlp-2025.5.22-py3-none-any.whl size=3013202 sha256=ea1a7fe911f2eaff036daeaf770122ec8d9038d0d6d243cbda30f012735c56f3
  Stored in directory: /tmp/pip-ephem-wheel-cache-jhq7gifr/wheels/2d/79/97/7209650ef73114e0fe0603480da012ad3afacb9cae6b8acd9a
Successfully built yt-dlp
Installing collected packages: yt-dlp
Successfully installed yt-dlp-2025.5.22
Collecting ultralytics
  Downloading ultralytics-8.3.146-py3-none-any.whl.metadata (37 kB)
Collecting ultralytics-tho

In [None]:
from base64 import b64encode
import cv2
from google.colab import drive
from IPython.display import HTML
from IPython.display import Video
import numpy as np
import matplotlib.pyplot as plt
import os
import shutil
from time import time
from sklearn.preprocessing import LabelEncoder
import torch
from torchvision import models
from torchvision import transforms
from torchvision.models.detection.ssd import SSDHead
from torchvision.ops import nms
from ultralytics import YOLO
from yt_dlp import YoutubeDL

Creating new Ultralytics Settings v0.0.6 file ✅ 
View Ultralytics Settings with 'yolo settings' or at '/root/.config/Ultralytics/settings.json'
Update Settings with 'yolo settings key=value', i.e. 'yolo settings runs_dir=path/to/dir'. For help see https://docs.ultralytics.com/quickstart/#ultralytics-settings.


Tratamos de utilizar la GPU:

In [None]:
DEVICE = "cuda" if torch.cuda.is_available() else "cpu"
print(DEVICE)

cuda


## Funciones Auxiliares

In [None]:
label_encoder = LabelEncoder()
label_encoder.classes_ = np.array([
    "Tank (TANK)",
    "Infantry fighting vehicle (IFV)",
    "Armored personnel carrier (APC)",
    "Engineering vehicle (EV)",
    "Assault helicopter (AH)",
    "Transport helicopter (TH)",
    "Assault airplane (AAP)",
    "Transport airplane (TA)",
    "Anti-aircraft vehicle (AA)",
    "Towed artillery (TART)",
    "Self-propelled artillery (SPART)",
    "Human"
])


def create_class_colors(num_classes, seed):
    """
    Crea un diccionario con los colores de cada clase elegidos de forma
    aleatoria.
    """

    # Reproducibilidad
    np.random.seed(seed)

    # Inicializacion
    class_colors = {}

    # Para cada clase, creamos su color
    for i in range(num_classes):

        # Mientras no aceptemos el color creado para la clase, seguimos creando
        # colores
        not_accepted = True
        while not_accepted:

            # Elegimos el valor para cada canal
            r = np.random.randint(0, 255)
            g = np.random.randint(0, 255)
            b = np.random.randint(0, 255)

            # El color no debe ser muy gris
            not_gray = abs(r - g) > 50 or abs(r - b) > 50 and abs(g - b) > 50

            # El color no debe ser muy blanco
            not_white = r < 230 and g < 230 and b < 230

            # Si no es muy gris ni muy blanco, lo aceptamos
            if not_gray and not_white :
                not_accepted = False
                class_colors[i] = (r, g, b)

    return class_colors

CLASS_COLORS = {
    0: (255,   0,   0),   # red
    1: (  0,   0, 255),   # blue
    2: (  0, 128,   0),   # green
    3: (255, 165,   0),   # orange
    4: (128,   0, 128),   # purple
    5: (255, 255,   0),   # yellow
    6: (  0,   0, 139),   # darkblue
    7: (255,   0, 255),   # magenta
    8: (255, 192, 203),   # pink
    9: (165,  42,  42),   # brown
    10: (128, 128, 128),   # grey
    11: (  0, 100,   0)    # darkgreen
}


def get_model(num_classes):
    """
    Nos permite obtener un modelo SSD300 listo para aplicar transfer learning
    (con su cabeza descongelada) y su optimizador (Adam).
    """

    # Obtenemos el modelo pre-entrenado
    model = models.detection.ssd300_vgg16(weights=models.detection.SSD300_VGG16_Weights.DEFAULT)

    # Congelamos todos los parametros de la backbone
    for param in model.backbone.parameters():
        param.requires_grad = False

    # Calculamos el numero de canales de entrada
    in_channels = \
    [layer.in_channels for layer in model.head.classification_head.module_list]

    # Calculamos el numero de cajas por clase (son 4 y 6, como en teoria)
    boxes_per_class_1 = 364 // 91
    boxes_per_class_2 = 546 // 91

    # Calculamos el numero de cajas por defecto
    num_anchors = [boxes_per_class_1] + \
      3 * [boxes_per_class_2] + \
      2 * [boxes_per_class_1]

    # Instanciamos un objeto de tipo SSDHead adaptado a nuestras necesidades que
    # usamos para hacer las detecciones
    model.head = SSDHead(
        in_channels=in_channels,
        num_anchors=num_anchors,
        num_classes=num_classes + 1
    )

    return model.to(DEVICE)


def decode_output_ssd(output, conf_threshold=0.5):
    """
    Decodifica la salida de SSD300 para que sea posible representarla y
    calcular la mAP.
    """

    # Extraemos las partes de la salida
    bbs = output["boxes"].to("cpu").detach()
    labels = torch.tensor([i - 1 for i in output["labels"].to("cpu")])
    confs = output["scores"].to("cpu").detach()

    # Filtramos en las detecciones
    all_ixs = torch.arange(0, len(confs))
    ixs = torch.tensor([], dtype=torch.int32)

    for label in labels.unique():
        if label != 11:
            label_mask = labels == label # Seleccionamos las detecciones de una clase

            label_ixs = nms(bbs[label_mask], confs[label_mask], 0.05) # Hacemos Non-max suppression

            real_ixs = all_ixs[label_mask][label_ixs] # Seleccionamos solo las detecciones que pasan la nms

            final_ixs = real_ixs[confs[real_ixs] > 0.5] # Seleccionamos solo los indices con confianza > 0.5

            ixs = torch.cat((ixs, final_ixs)) # Nos quesamos con los indices de las detecciones de esta clase que pasan el filtro

    bbs, confs, labels = [tensor[ixs] for tensor in [bbs, confs, labels]] # Filtramos las cajas, confianzas y etiquetas

    return bbs, confs, labels


def adapt_bbs(bbs, image):
    """
    Adapta las cajas de las detecciones de una imagen de formato 300x300 a las
    dimensiones de la imagen
    """
    real_w, real_h = image.shape[1], image.shape[0]
    bbs[:, 0:3:2] = bbs[:, 0:3:2] * real_w / 300
    bbs[:, 1:4:2] = bbs[:, 1:4:2] * real_h / 300

    return bbs


@torch.no_grad()
def predict_ssd(model, image, device):
    """
    Realiza una prediccion con un modelo SSD.
    """

    model.eval() # Modo inferencia

    # Redimensionamos la imagen
    resized_image = transforms.Compose([
        transforms.ToPILImage(),
        transforms.Resize((300, 300)),
        transforms.ToTensor()])(image)

    # Realizamos la prediccion
    output = model(resized_image.unsqueeze(0).to(device))[0]

    # Decodificamos y adaptamos la prediccion
    bbs, confs, labels = decode_output_ssd(output)
    bbs = adapt_bbs(bbs, image)

    return bbs, confs, labels


def plot_bbox(image, bbs, confs, labels, label_encoder=label_encoder,
              class_colors=CLASS_COLORS):
    """
    Dibuja unas detecciones a partir de la informacion de las cajas, confianzas
    y etiquetas.
    """

    for ix, bb in enumerate(bbs):

        # La etiqueta de la caja
        complete_label = \
            label_encoder.inverse_transform([labels[ix].item()])[0] + " - " +  \
            str(round(confs[ix].item(), 2))

        font = cv2.FONT_HERSHEY_SIMPLEX
        font_scale = 0.9
        font_thickness = 2
        text_size, _ = cv2.getTextSize(complete_label, font, font_scale, font_thickness)
        text_width, text_height = text_size

        # Tomamos medidas de la caja para centrar la etiqueta
        x0, y0, x1, y1 = bb
        x12 = (x0 + x1) / 2
        x2 = x12 - text_width / 2
        x3 = x12 + text_width / 2

        x0, y0 = int(x0), int(y0)
        x1, y1 = int(x1), int(y1)
        x2, x3 = int(x2), int(x3)

        # Pintamos caja y etiqueta
        bgr = class_colors[int(labels[ix].item())]
        image = cv2.rectangle(image, (x0, y0), (x1, y1), bgr, thickness=4)
        image = cv2.rectangle(image, (x2 - 20, y0), (x3 + 20, y0 - text_height - 20), bgr, -1)
        image = cv2.putText(image, complete_label, (x2, y0 - 10), font, 0.9, (255, 255, 255), 2)

    return image


def decode_output_yolo(output, conf_threshold=0.2):
    """
    Decodifica la salida de YOLOv5 para que sea posible representarla y
    calcular la mAP.
    """

    # Extraemos las partes de la salida
    bbs = output[0].boxes.xyxy.to("cpu")
    confs = output[0].boxes.conf.to("cpu")
    labels = output[0].boxes.cls.to("cpu").int()

    # Nos quedamos con las detecciones que superen cierto umbral
    ixs = (confs > conf_threshold) & (labels != 11)

    # Filtramos las detecctiones
    bbs = bbs[ixs]
    confs = confs[ixs]
    labels = labels[ixs]

    return bbs, confs, labels


def predict_yolo(model, image, device):
    """
    Realiza una prediccion con un modelo YOLO.
    """

    # Realizamos la prediccion
    output = model.predict(image, device=device, verbose = False)

    # Decodificamos la prediccion
    bbs, confs, labels = decode_output_yolo(output)

    return bbs, confs, labels


# Parametros de los modelos
SSD_FNS = {
    "predict": predict_ssd,
    "plot_boxes": plot_bbox
}

YOLO_FNS = {
    "predict": predict_yolo,
    "plot_boxes": plot_bbox
}

CLASS_INFO = {
    "class2label": label_encoder,
    "class_colors": CLASS_COLORS
}

## Clase de los Modelos de Detección

Creamos una clase de modelo de detección de objetos en un video de YouTube:

In [None]:
class VideoParserObjectDetectionModel:
    """
    Clase de un modelo de Detección de Objetos que realiza predicciones en un
    video de YouTube frame a frame.
    """

    def __init__(self, model, model_fns, class_info):
        self.model = model
        self.device = "cuda" if torch.cuda.is_available() else "cpu"
        self.model.to(self.device)

        self.predict = model_fns["predict"]
        self.plot_boxes = model_fns["plot_boxes"]
        self.class2label = class_info["class2label"]
        if "class_colors" not in class_info:
            class_colors = {}
            for i in range(len(self.classes)):
                class_colors[i] = (220, 40, 10)
            self.class_colors = class_colors
        else:
            self.class_colors = class_info["class_colors"]

    def get_video_from_url(self, url):
        """
        Obtiene un video de una URL
        """

        with YoutubeDL({"format": "bestvideo"}) as ydl:
            info = ydl.extract_info(url, download=False)

        if info is None:
            raise Exception("No se pudo obtener el stream del video")

        return cv2.VideoCapture(info["url"])

    def score_frame(self, frame):
        """
        Realiza una prediccion en un frame.
        """

        output = self.predict(self.model, frame, self.device)
        return output

    def plot_bbs_in_frame(self, output, frame):
        """
        Dibuja las detecciones en un frame.
        """

        bbs, confs, labels = output
        frame = self.plot_boxes(frame, bbs, confs, labels, self.class2label,
                                self.class_colors)
        return frame

    def __call__(self, url, out_file):
        """
        Realiza una detección en tiempo real en un video de YouTube.
        """

        # Obtenemos el video de YouTube
        player = self.get_video_from_url(url)

        # Comprobamos que el player este abierto
        assert player.isOpened()

        # Inicializamos para poder hacer la prediccion sobre los frames
        x_shape = int(player.get(cv2.CAP_PROP_FRAME_WIDTH))
        y_shape = int(player.get(cv2.CAP_PROP_FRAME_HEIGHT))
        four_cc = cv2.VideoWriter_fourcc(*"MJPG")
        out = cv2.VideoWriter(out_file, four_cc, 20, (x_shape, y_shape))
        fc = 0
        fps = 0
        tfc = int(player.get(cv2.CAP_PROP_FRAME_COUNT))
        tfcc = 0
        i = 0
        while True:
            if i == 9500: # No hacemos la prediccion sobre mas de 9500 frames
                print("Completed!")
                break
            fc += 1
            start_time = time()
            ret, frame = player.read()
            if not ret:
                print("Completed!")
                break

            # Leemos el frame y predecimos sobr el
            frame = cv2.cvtColor(frame, cv2.COLOR_BGR2RGB)
            results = self.score_frame(frame)
            frame = self.plot_bbs_in_frame(results, frame)
            frame = cv2.cvtColor(frame, cv2.COLOR_RGB2BGR)

            # Calculamos estadisticas de los FPS a los que predecimos si procede
            end_time = time()
            fps += 1/np.round(end_time - start_time, 3)
            if fc == 10:
                fps = int(fps / 10)
                tfcc += fc
                fc = 0
                per_com = int(tfcc / tfc * 100)
                print(f"Frames Per Second : {fps} || Percentage Parsed : {per_com}")
            out.write(frame)
            i += 1
        player.release()


Hacemos detección en tiempo real con un video de YouTube:

In [None]:
# URL = "https://www.youtube.com/watch?v=ZuZtQeGHxi8" # Ucrania dificil
URL = "https://www.youtube.com/watch?v=xV24-uVq3WY" # Desfile fuerzas armadas
# URL = "https://www.youtube.com/watch?v=-l_2An11-P0" # Ucrania HD
# URL = "https://www.youtube.com/watch?v=TR-ORlwlbuA" # Palestina combate
# URL = "https://www.youtube.com/watch?v=lRFDAM3dcFQ" # Palestina Variado

Conectamos con nuestro Drive, de donde sacaremos los checkpoints de los modelos:

In [None]:
drive.mount("/content/drive/")

Mounted at /content/drive/


## SSD300 Básico

In [None]:
model = get_model(11)

In [None]:
checkpoint = torch.load("/content/drive/MyDrive/tfg/models/ssd300.pth", weights_only=False)
model.load_state_dict(checkpoint["best_model"]["model_state_dict"])

In [None]:
ssd300_rtod = VideoParserObjectDetectionModel(model, SSD_FNS, CLASS_INFO)

In [None]:
ssd300_rtod(URL, "fuerzas_armadas_ssd300.mp4")

Salvamos el video resultante del proceso en nuestro Drive:

In [None]:
# Input video path
save_path = "/content/fuerzas_armadas_ssd300.mp4"

# Compressed video path
compressed_path = "/content/fuerzas_armadas_ssd300_compressed.mp4"

os.system(f"ffmpeg -i {save_path} -vcodec libx264 {compressed_path}")

In [None]:
shutil.copy(
    "/content/fuerzas_armadas_ssd300_compressed.mp4",
    "/content/drive/MyDrive/tfg/fuerzas_armadas_compressed.mp4"
)

## SSD300 con Data Augmentation

In [None]:
model = get_model(11)



In [None]:
checkpoint = torch.load("/content/drive/MyDrive/tfg/models/ssd300_augmented.pth", weights_only=False)
model.load_state_dict(checkpoint["best_model"]["model_state_dict"])

<All keys matched successfully>

In [None]:
ssd300_rtod = VideoParserObjectDetectionModel(model, SSD_FNS, CLASS_INFO)

In [None]:
ssd300_rtod(URL, "fuerzas_armadas_ssd300_augmented.mp4")

[youtube] Extracting URL: https://www.youtube.com/watch?v=xV24-uVq3WY
[youtube] xV24-uVq3WY: Downloading webpage
[youtube] xV24-uVq3WY: Downloading tv client config
[youtube] xV24-uVq3WY: Downloading player 4fcd6e4a
[youtube] xV24-uVq3WY: Downloading tv player API JSON
[youtube] xV24-uVq3WY: Downloading ios player API JSON
[youtube] xV24-uVq3WY: Downloading m3u8 information
Frames Per Second : 27 || Percentage Parsed : 0
Frames Per Second : 30 || Percentage Parsed : 0
Frames Per Second : 31 || Percentage Parsed : 0
Frames Per Second : 32 || Percentage Parsed : 0
Frames Per Second : 30 || Percentage Parsed : 0
Frames Per Second : 31 || Percentage Parsed : 0
Frames Per Second : 31 || Percentage Parsed : 0
Frames Per Second : 31 || Percentage Parsed : 0
Frames Per Second : 31 || Percentage Parsed : 0
Frames Per Second : 32 || Percentage Parsed : 0
Frames Per Second : 32 || Percentage Parsed : 0
Frames Per Second : 32 || Percentage Parsed : 0
Frames Per Second : 32 || Percentage Parsed : 0

Salvamos el video resultante del proceso en nuestro Drive:

In [None]:
# Input video path
save_path = "/content/fuerzas_armadas_ssd300_augmented.mp4"

# Compressed video path
compressed_path = "/content/fuerzas_armadas_ssd300_augmented_compressed.mp4"

os.system(f"ffmpeg -i {save_path} -vcodec libx264 {compressed_path}")

0

In [None]:
shutil.copy(
    "/content/fuerzas_armadas_ssd300_augmented_compressed.mp4",
    "/content/drive/MyDrive/tfg/fuerzas_armadas_ssd300_augmented_compressed.mp4"
)

'/content/drive/MyDrive/tfg/fuerzas_armadas_ssd300_augmented_compressed.mp4'

## SSD300 con Data Augmentation y Datos de Humanos

In [None]:
model = get_model(12)



In [None]:
checkpoint = torch.load("/content/drive/MyDrive/tfg/models/ssd300_augmented_humans.pth", weights_only=False)
model.load_state_dict(checkpoint["best_model"]["model_state_dict"])

<All keys matched successfully>

In [None]:
ssd300_rtod = VideoParserObjectDetectionModel(model, SSD_FNS, CLASS_INFO)

In [None]:
ssd300_rtod(URL, "fuerzas_armadas_ssd300_augmented_humans.mp4")

[youtube] Extracting URL: https://www.youtube.com/watch?v=xV24-uVq3WY
[youtube] xV24-uVq3WY: Downloading webpage
[youtube] xV24-uVq3WY: Downloading tv client config
[youtube] xV24-uVq3WY: Downloading player 4fcd6e4a
[youtube] xV24-uVq3WY: Downloading tv player API JSON
[youtube] xV24-uVq3WY: Downloading ios player API JSON
[youtube] xV24-uVq3WY: Downloading m3u8 information
Frames Per Second : 29 || Percentage Parsed : 0
Frames Per Second : 32 || Percentage Parsed : 0
Frames Per Second : 33 || Percentage Parsed : 0
Frames Per Second : 32 || Percentage Parsed : 0
Frames Per Second : 32 || Percentage Parsed : 0
Frames Per Second : 32 || Percentage Parsed : 0
Frames Per Second : 32 || Percentage Parsed : 0
Frames Per Second : 31 || Percentage Parsed : 0
Frames Per Second : 32 || Percentage Parsed : 0
Frames Per Second : 31 || Percentage Parsed : 0
Frames Per Second : 31 || Percentage Parsed : 0
Frames Per Second : 31 || Percentage Parsed : 0
Frames Per Second : 31 || Percentage Parsed : 0

Salvamos el video resultante del proceso en nuestro Drive:

In [None]:
# Input video path
save_path = "/content/fuerzas_armadas_ssd300_augmented_humans.mp4"

# Compressed video path
compressed_path = "/content/fuerzas_armadas_ssd300_augmented_humans_compressed_02.mp4"

os.system(f"ffmpeg -i {save_path} -vcodec libx264 {compressed_path}")

0

In [None]:
shutil.copy(
    "/content/fuerzas_armadas_ssd300_augmented_humans_compressed_02.mp4",
    "/content/drive/MyDrive/tfg/fuerzas_armadas_ssd300_augmented_humans_compressed_02.mp4"
)

'/content/drive/MyDrive/tfg/fuerzas_armadas_ssd300_augmented_humans_compressed_02.mp4'

## YOLOv5 Básico

In [None]:
zip_path = "/content/drive/MyDrive/tfg/models/runs_yolov5.zip"
extract_path = "/content/"
!unzip -q "$zip_path" -d "$extract_path"

In [None]:
model = YOLO("runs/detect/train/weights/best.pt")

In [None]:
yolov5_rtod = VideoParserObjectDetectionModel(model, YOLO_FNS, CLASS_INFO)

In [None]:
yolov5_rtod(URL, "fuerzas_armadas_yolov5.mp4")

[youtube] Extracting URL: https://www.youtube.com/watch?v=xV24-uVq3WY
[youtube] xV24-uVq3WY: Downloading webpage
[youtube] xV24-uVq3WY: Downloading tv client config
[youtube] xV24-uVq3WY: Downloading player 91e7c654-main
[youtube] xV24-uVq3WY: Downloading tv player API JSON
[youtube] xV24-uVq3WY: Downloading ios player API JSON
[youtube] xV24-uVq3WY: Downloading m3u8 information
Frames Per Second : 38 || Percentage Parsed : 0
Frames Per Second : 51 || Percentage Parsed : 0
Frames Per Second : 50 || Percentage Parsed : 0
Frames Per Second : 53 || Percentage Parsed : 0
Frames Per Second : 53 || Percentage Parsed : 0
Frames Per Second : 53 || Percentage Parsed : 0
Frames Per Second : 52 || Percentage Parsed : 0
Frames Per Second : 54 || Percentage Parsed : 0
Frames Per Second : 54 || Percentage Parsed : 0
Frames Per Second : 53 || Percentage Parsed : 0
Frames Per Second : 51 || Percentage Parsed : 0
Frames Per Second : 50 || Percentage Parsed : 0
Frames Per Second : 50 || Percentage Parse

Salvamos el video resultante del proceso en nuestro Drive:

In [None]:
# Input video path
save_path = "/content/fuerzas_armadas_yolov5.mp4"

# Compressed video path
compressed_path = "/content/fuerzas_armadas_yolov5_compressed.mp4"

os.system(f"ffmpeg -i {save_path} -vcodec libx264 {compressed_path}")

0

In [None]:
shutil.copy(
    "/content/fuerzas_armadas_yolov5_compressed.mp4",
    "/content/drive/MyDrive/tfg/videos/fuerzas_armadas_yolov5_compressed.mp4"
)

'/content/drive/MyDrive/tfg/videos/fuerzas_armadas_yolov5_compressed.mp4'

In [None]:
shutil.rmtree("/content/runs")

## YOLOv5 con Data Augmentation

In [None]:
zip_path = "/content/drive/MyDrive/tfg/models/runs_yolov5_augmented.zip"
extract_path = "/content/"
!unzip -q "$zip_path" -d "$extract_path"

In [None]:
model = YOLO("runs/detect/train/weights/best.pt")

In [None]:
yolov5_rtod = VideoParserObjectDetectionModel(model, YOLO_FNS, CLASS_INFO)

In [None]:
yolov5_rtod(URL, "fuerzas_armadas_yolov5_augmented.mp4")

[youtube] Extracting URL: https://www.youtube.com/watch?v=xV24-uVq3WY
[youtube] xV24-uVq3WY: Downloading webpage
[youtube] xV24-uVq3WY: Downloading tv client config
[youtube] xV24-uVq3WY: Downloading tv player API JSON
[youtube] xV24-uVq3WY: Downloading ios player API JSON
[youtube] xV24-uVq3WY: Downloading m3u8 information
Frames Per Second : 44 || Percentage Parsed : 0
Frames Per Second : 52 || Percentage Parsed : 0
Frames Per Second : 52 || Percentage Parsed : 0
Frames Per Second : 53 || Percentage Parsed : 0
Frames Per Second : 53 || Percentage Parsed : 0
Frames Per Second : 50 || Percentage Parsed : 0
Frames Per Second : 53 || Percentage Parsed : 0
Frames Per Second : 54 || Percentage Parsed : 0
Frames Per Second : 53 || Percentage Parsed : 0
Frames Per Second : 53 || Percentage Parsed : 0
Frames Per Second : 54 || Percentage Parsed : 0
Frames Per Second : 54 || Percentage Parsed : 0
Frames Per Second : 54 || Percentage Parsed : 0
Frames Per Second : 54 || Percentage Parsed : 0
Fr

Salvamos el video resultante del proceso en nuestro Drive:

In [None]:
# Input video path
save_path = "/content/fuerzas_armadas_yolov5_augmented.mp4"

# Compressed video path
compressed_path = "/content/fuerzas_armadas_yolov5_augmented_compressed.mp4"

os.system(f"ffmpeg -i {save_path} -vcodec libx264 {compressed_path}")

0

In [None]:
shutil.copy(
    "/content/fuerzas_armadas_yolov5_augmented_compressed.mp4",
    "/content/drive/MyDrive/tfg/videos/fuerzas_armadas_yolov5_augmented_compressed.mp4"
)

'/content/drive/MyDrive/tfg/videos/fuerzas_armadas_yolov5_augmented_compressed.mp4'

In [None]:
shutil.rmtree("/content/runs")

## YOLOv5 con Data Augmentation e Imágenes de Humanos

In [None]:
zip_path = "/content/drive/MyDrive/tfg/models/runs_yolov5_augmented_humans.zip"
extract_path = "/content/"
!unzip -q "$zip_path" -d "$extract_path"

In [None]:
model = YOLO("runs/detect/train/weights/best.pt")

In [None]:
yolov5_rtod = VideoParserObjectDetectionModel(model, YOLO_FNS, CLASS_INFO)

In [None]:
yolov5_rtod(URL, "fuerzas_armadas_yolov5_augmented_humans.mp4")

[youtube] Extracting URL: https://www.youtube.com/watch?v=xV24-uVq3WY
[youtube] xV24-uVq3WY: Downloading webpage
[youtube] xV24-uVq3WY: Downloading tv client config
[youtube] xV24-uVq3WY: Downloading player 91e7c654-main
[youtube] xV24-uVq3WY: Downloading tv player API JSON
[youtube] xV24-uVq3WY: Downloading ios player API JSON
[youtube] xV24-uVq3WY: Downloading m3u8 information
Frames Per Second : 41 || Percentage Parsed : 0
Frames Per Second : 51 || Percentage Parsed : 0
Frames Per Second : 53 || Percentage Parsed : 0
Frames Per Second : 53 || Percentage Parsed : 0
Frames Per Second : 51 || Percentage Parsed : 0
Frames Per Second : 53 || Percentage Parsed : 0
Frames Per Second : 53 || Percentage Parsed : 0
Frames Per Second : 53 || Percentage Parsed : 0
Frames Per Second : 53 || Percentage Parsed : 0
Frames Per Second : 52 || Percentage Parsed : 0
Frames Per Second : 54 || Percentage Parsed : 0
Frames Per Second : 54 || Percentage Parsed : 0
Frames Per Second : 52 || Percentage Parse

Salvamos el video resultante del proceso en nuestro Drive:

In [None]:
# Input video path
save_path = "/content/fuerzas_armadas_yolov5_augmented_humans.mp4"

# Compressed video path
compressed_path = "/content/fuerzas_armadas_yolov5_augmented_humans_compressed_02.mp4"

os.system(f"ffmpeg -i {save_path} -vcodec libx264 {compressed_path}")

0

In [None]:
shutil.copy(
    "/content/fuerzas_armadas_yolov5_augmented_humans_compressed_02.mp4",
    "/content/drive/MyDrive/tfg/videos/fuerzas_armadas_yolov5_augmented_humans_compressed_02_true.mp4"
)

'/content/drive/MyDrive/tfg/videos/fuerzas_armadas_yolov5_augmented_humans_compressed_02_true.mp4'

In [None]:
shutil.rmtree("/content/runs")