
# Two-Stream Action Recognition (RGB + Optical Flow) on UCF50 (PyTorch)

Chương trình nguồn này huấn luyện một mô hình Two-Stream cho action recognition với:

***RGB stream***: các image frames được lấy mẫu từ videos

***Optical-Flow stream***: các stacks của trường dòng $(u,v)$ được tính giữa các consecutive frames (on the fly)

Code hỗ trợ folder-based datasets như UCF50, thực hiện **train/validation** split từ class folders, ghi lại metrics, lưu checkpoints và vẽ các đường cong **loss/accuracy**.

**Note**: **Optical flow** được tính bằng OpenCV (TV-L1 hoặc Farneback). Việc tính flow on the fly tốn nhiều tài nguyên CPU; bạn có thể giảm num_segments, clip_len_rgb, và flow_stack để thử nghiệm nhanh hơn hoặc **precompute/cached flows**.


### Gọi thư viện
Mục đích: Nạp thư viện (PyTorch, Torchvision, OpenCV, NumPy, Matplotlib, sklearn), kiểm tra phiên bản, chọn device.
Đầu vào: Môi trường Python đã cài các gói cần thiết.
Đầu ra: In ra phiên bản Torch/Torchvision/OpenCV; biến device (CUDA nếu có GPU).
Công cụ/Xử lý chính:

import torch, torchvision, cv2, numpy, matplotlib, sklearn.metrics.

device = torch.device("cuda" if torch.cuda.is_available() else "cpu")

In [1]:

#@title Environment & Imports
import os, sys, json, time, random, math, shutil
from pathlib import Path
from dataclasses import dataclass, asdict
from typing import List, Tuple, Dict, Optional

import numpy as np
import cv2

import torch
import torch.nn as nn
import torch.nn.functional as F
from torch.utils.data import Dataset, DataLoader

import torchvision
from torchvision import models, transforms

import matplotlib.pyplot as plt

from sklearn.metrics import confusion_matrix, classification_report
import itertools

print("Torch:", torch.__version__)
print("Torchvision:", torchvision.__version__)
print("OpenCV:", cv2.__version__)

device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
device


Torch: 2.6.0+cpu
Torchvision: 0.21.0+cpu
OpenCV: 4.10.0


device(type='cpu')

### Cấu hình  - Đặt lại đường dẫn phù hợp 

**Mục đích**: Khai báo cấu hình huấn luyện bằng @dataclass và lưu config.json.
**Đầu vào**: Không (giá trị mặc định có thể sửa trực tiếp).
**Đầu ra**: File config.json trong output_dir; in cấu hình.
***Công cụ/Xử lý chính:***
**Config**: đường dẫn data_root, output_dir; tham số data (train_ratio, num_segments, clip_len_rgb, flow_stack, frame_size, flow_method), train (batch_size, epochs, lr, weight_decay), mô hình (rgb_backbone, flow_backbone, fusion, trọng số fusion), logging, v.v.
*Tạo thư mục output* và ghi JSON.

**Tham số quan trọng**:

*data_root*: thư mục UCF50 (mỗi class là 1 thư mục).

*num_segments, flow_stack, frame_size, flow_method.*

**fusion**: "late_sum" (nhanh/đơn giản) hoặc "concat_fc" (học được trọng số).

In [2]:
#@title Configuration

@dataclass
class Config:
    # Paths
    data_root: str = "D:/Teach_n_Train/Advanced_Lessons_CV/LABS/CV_3D_Data/UCF50"  #@param {type:"string"}
    output_dir: str = "D:/Teach_n_Train/Advanced_Lessons_CV/LABS/CV_3D_Data/two_stream_runs/ucf50_exp1"  #@param {type:"string"}

    # Data settings
    train_ratio: float = 0.8  #@param {type:"number"}
    seed: int = 42  #@param {type:"number"}
    # Video sampling
    num_segments: int = 3  # number of temporal segments per video  #@param {type:"integer"}
    clip_len_rgb: int = 1  # frames per RGB segment (use 1 for TSN-like)  #@param {type:"integer"}
    flow_stack: int = 10    # number of consecutive flows (= 2*(flow_stack) channels)  #@param {type:"integer"}
    frame_size: int = 224   # resize short side to this (keeping aspect) then center crop 224x224  #@param {type:"integer"}
    # Flow method: 'tvl1' or 'farneback'
    flow_method: str = "tvl1"  #@param ["tvl1","farneback"]
    # Train
    batch_size: int = 8  #@param {type:"integer"}
    num_workers: int = 4  #@param {type:"integer"}
    epochs: int = 10  #@param {type:"integer"}
    lr: float = 1e-3  #@param {type:"number"}
    weight_decay: float = 1e-4  #@param {type:"number"}
    # Model
    rgb_backbone: str = "resnet18"  #@param ["resnet18","resnet34","resnet50"]
    flow_backbone: str = "resnet18"  #@param ["resnet18","resnet34","resnet50"]
    fusion: str = "late_sum"  #@param ["late_sum","concat_fc"]
    fusion_weight_rgb: float = 0.5  # weight for RGB logits in late_sum  #@param {type:"number"}
    fusion_weight_flow: float = 0.5  # weight for Flow logits in late_sum  #@param {type:"number"}
    # Checkpointing & logging
    save_every: int = 1  # save per epoch  #@param {type:"integer"}
    log_interval: int = 20  # batches  #@param {type:"integer"}
    # Evaluation
    compute_confusion: bool = True  #@param {type:"boolean"}

cfg = Config()

# Prepare output dirs
os.makedirs(cfg.output_dir, exist_ok=True)
with open(os.path.join(cfg.output_dir, "config.json"), "w", encoding="utf-8") as f:
    json.dump(asdict(cfg), f, indent=2, ensure_ascii=False)

print("Config saved to:", os.path.join(cfg.output_dir, "config.json"))
print(cfg)


Config saved to: D:/Teach_n_Train/Advanced_Lessons_CV/LABS/CV_3D_Data/two_stream_runs/ucf50_exp1\config.json
Config(data_root='D:/Teach_n_Train/Advanced_Lessons_CV/LABS/CV_3D_Data/UCF50', output_dir='D:/Teach_n_Train/Advanced_Lessons_CV/LABS/CV_3D_Data/two_stream_runs/ucf50_exp1', train_ratio=0.8, seed=42, num_segments=3, clip_len_rgb=1, flow_stack=10, frame_size=224, flow_method='tvl1', batch_size=8, num_workers=4, epochs=10, lr=0.001, weight_decay=0.0001, rgb_backbone='resnet18', flow_backbone='resnet18', fusion='late_sum', fusion_weight_rgb=0.5, fusion_weight_flow=0.5, save_every=1, log_interval=20, compute_confusion=True)


### Data Utilities: listing, splitting, frame/flow sampling

**Mục đích**: Các tiện ích dữ liệu: liệt kê video theo lớp, tách train/val, đọc toàn bộ khung hình, resize+center-crop, tính flow stack quanh khung tâm (on-the-fly).

**Đầu vào**: 
root dữ liệu; video_path.
Số segment, độ dài clip, phương pháp flow.

**Đầu ra**:
class_to_files (dict lớp → danh sách video).
splits (list (path,label) cho train/val).
frames đã đọc; ảnh đã resize+crop; uv_stack (H×W×(2*stack)).

**Công cụ/Xử lý chính**:
*cv2.VideoCapture* đọc video sang RGB; resize theo “short side = frame_size”, center-crop về frame_size×frame_size.
*sample_indices*: chia đều mốc thời gian kiểu TSN.
*compute_flow_stack*:
TV-L1 (cv2.optflow.DualTVL1OpticalFlow_create) hoặc Farneback (cv2.calcOpticalFlowFarneback).
Ghép stack cặp (u,v) liên tiếp xung quanh frame tâm → (H, W, 2*stack); chuẩn hoá bằng clip theo 95th percentile rồi chia để ~[-1,1].

In [3]:

#@title Data Utilities: listing, splitting, frame/flow sampling

def list_video_files(root: str, exts=(".avi", ".mp4", ".mov", ".mkv")) -> Dict[str, List[str]]:
    root_p = Path(root)
    class_to_files = {}
    for cls_dir in sorted([p for p in root_p.iterdir() if p.is_dir()]):
        files = []
        for ext in exts:
            files.extend([str(p) for p in cls_dir.rglob(f"*{ext}")])
        if files:
            class_to_files[cls_dir.name] = sorted(files)
    return class_to_files

def split_train_val(class_to_files: Dict[str, List[str]], train_ratio: float, seed: int=42):
    rng = random.Random(seed)
    splits = {"train": [], "val": []}
    class_names = sorted(class_to_files.keys())
    for ci, cls in enumerate(class_names):
        files = class_to_files[cls][:]
        rng.shuffle(files)
        k = int(len(files) * train_ratio)
        train_files = files[:k]
        val_files = files[k:]
        for fp in train_files:
            splits["train"].append((fp, ci))
        for fp in val_files:
            splits["val"].append((fp, ci))
    return splits, class_names

def sample_indices(num_frames: int, num_segments: int, clip_len: int=1):
    """Uniformly sample indices for TSN-style segments."""
    # Divide [0, num_frames-1] into num_segments intervals and pick center indexes
    segment_length = num_frames / float(num_segments)
    inds = []
    for s in range(num_segments):
        start = int(round(segment_length * s))
        end = int(round(segment_length * (s+1))) - 1
        if end < start:
            end = start
        center = (start + end) // 2
        # For clip_len>1, create a small window around center
        half = clip_len // 2
        seg_inds = [min(max(center + i, 0), num_frames-1) for i in range(-half, -half + clip_len)]
        inds.append(seg_inds)
    return inds  # list of [clip_len] each

def read_all_frames_cv(video_path: str) -> List[np.ndarray]:
    cap = cv2.VideoCapture(video_path)
    if not cap.isOpened():
        raise IOError(f"Failed to open video: {video_path}")
    frames = []
    ok, frame = cap.read()
    while ok:
        frame = cv2.cvtColor(frame, cv2.COLOR_BGR2RGB)
        frames.append(frame)
        ok, frame = cap.read()
    cap.release()
    if len(frames) == 0:
        raise ValueError(f"No frames read from video: {video_path}")
    return frames

def resize_and_center_crop(img: np.ndarray, size: int=224) -> np.ndarray:
    h, w, _ = img.shape
    # Resize short side to size
    if h < w:
        new_h, new_w = size, int(w * (size / h))
    else:
        new_w, new_h = size, int(h * (size / w))
    img = cv2.resize(img, (new_w, new_h), interpolation=cv2.INTER_LINEAR)
    # Center crop
    h, w, _ = img.shape
    top = (h - size) // 2
    left = (w - size) // 2
    img = img[top:top+size, left:left+size, :]
    return img

def compute_flow_stack(frames: List[np.ndarray], center_index: int, stack: int=10, method="tvl1") -> np.ndarray:
    """Compute a stack of (u,v) optical flows around center_index using consecutive frames.
    Returns array shape (H, W, 2*stack), values scaled to [-1,1] approximately.
    """
    h, w, _ = frames[0].shape
    flows = []
    # define range: e.g., center- stack to center-1
    start = max(center_index - stack, 0)
    end = min(center_index + stack, len(frames)-1)
    # Construct pairs from [start..center-1] to [start+1..center]
    # If not enough before, pad from after; keep exactly 'stack' pairs
    pairs = []
    # Try backward pairs ending at center
    b_needed = stack
    for i in range(center_index - b_needed, center_index):
        a = max(0, i)
        b = min(len(frames)-1, a+1)
        pairs.append((a, b))
    pairs = pairs[-stack:]
    # If less than stack (start near 0), append forward pairs after center
    while len(pairs) < stack:
        i = min(len(frames)-2, pairs[-1][1] if pairs else center_index)
        a = i
        b = i+1
        pairs.append((a, b))

    # Compute flow for each pair
    if method == "tvl1":
        tvl1 = cv2.optflow.DualTVL1OpticalFlow_create()
        for (i, j) in pairs:
            prev = cv2.cvtColor(frames[i], cv2.COLOR_RGB2GRAY)
            nxt  = cv2.cvtColor(frames[j], cv2.COLOR_RGB2GRAY)
            flow = tvl1.calc(prev, nxt, None)  # HxWx2 float32
            flows.append(flow)
    else:  # Farneback
        for (i, j) in pairs:
            prev = cv2.cvtColor(frames[i], cv2.COLOR_RGB2GRAY)
            nxt  = cv2.cvtColor(frames[j], cv2.COLOR_RGB2GRAY)
            flow = cv2.calcOpticalFlowFarneback(prev, nxt, None,
                                                pyr_scale=0.5, levels=3, winsize=15,
                                                iterations=3, poly_n=5, poly_sigma=1.1, flags=0)
            # Farneback returns 2-channel flow
            flows.append(flow.astype(np.float32))
    # Stack along channel axis
    uv_stack = np.concatenate([f for f in flows], axis=2)  # H x W x (2*stack)
    # Normalize: clip large values and scale
    mag = np.sqrt(np.sum(uv_stack**2, axis=2, keepdims=True))
    m95 = np.percentile(mag, 95) + 1e-6
    uv_stack = np.clip(uv_stack, -m95, m95) / m95  # roughly in [-1,1]
    return uv_stack.astype(np.float32)


## Dataset (TSN-style sampling for Two-Stream)

**Mục đích**: Định nghĩa TwoStreamVideoDataset (version on-the-fly).
**Đầu vào**: samples (list (video_path, label)), class_names, cfg.
**Đầu ra**: Cho mỗi item:
rgb_clip: tensor K×3×H×W.
flow_clip: tensor K×(2*stack)×H×W.
label (int).

**Công cụ/Xử lý chính**:
Đọc và preprocess tất cả frames.
Lấy K frame tâm cho RGB; với mỗi tâm, tính uv_stack cho Flow.
Chuẩn hoá RGB theo ImageNet mean/std.
Ghép thành batch ở DataLoader.

In [4]:

#@title Dataset (TSN-style sampling for Two-Stream)

class TwoStreamVideoDataset(Dataset):
    def __init__(self, samples: List[Tuple[str,int]], class_names: List[str], cfg):
        self.samples = samples
        self.class_names = class_names
        self.cfg = cfg
        self.rgb_tf = transforms.Compose([
            transforms.ToTensor(),
            transforms.Normalize(mean=[0.485, 0.456, 0.406],
                                 std=[0.229, 0.224, 0.225])
        ])
        # For flow stream we keep as float32 already scaled to [-1,1]; then standardize lightly
        self.flow_tf = transforms.Compose([
            transforms.ToTensor(),  # HxWxC -> CxHxW
        ])

    def __len__(self):
        return len(self.samples)

    def __getitem__(self, idx):
        video_path, label = self.samples[idx]
        frames_all = read_all_frames_cv(video_path)
        # Resize+Center-crop all frames first for speed consistency
        frames_all = [resize_and_center_crop(f, self.cfg.frame_size) for f in frames_all]
        n = len(frames_all)

        # Sample indices for RGB (TSN style)
        seg_inds_rgb = sample_indices(n, self.cfg.num_segments, self.cfg.clip_len_rgb)  # list of lists
        # Pick center frame for each segment (clip_len_rgb may be >1; we use the center item index)
        rgb_indices = [inds[len(inds)//2] for inds in seg_inds_rgb]

        # For Flow segments, compute flow stacks centered at same positions
        flow_centers = rgb_indices  # align flow centers with RGB segments

        # Build RGB tensor: stack K frames into Kx3xHxW
        rgb_imgs = [frames_all[i] for i in rgb_indices]
        rgb_tensors = [self.rgb_tf(img) for img in rgb_imgs]  # 3xHxW
        rgb_clip = torch.stack(rgb_tensors, dim=0)  # Kx3xHxW

        # Build Flow tensor: for each center, compute uv_stack => C=2*flow_stack
        flow_stacks = []
        for c in flow_centers:
            uv = compute_flow_stack(frames_all, c, stack=self.cfg.flow_stack, method=self.cfg.flow_method)
            flow_stacks.append(self.flow_tf(uv))  # CxHxW
        flow_clip = torch.stack(flow_stacks, dim=0)  # Kx(2*stack)xHxW

        return rgb_clip, flow_clip, label


## Two-Stream Model Definition

**Mục đích:** Định nghĩa backbone ResNet cho RGB và Flow; module Two-Stream; các kiểu fusion.
**Đầu vào**: num_classes, cfg.
**Đầu ra**: TwoStreamModel (nn.Module).
**Công cụ/Xử lý chính**:

*make_resnet_backbone(name, in_channels)*:
Tải weights ImageNet cho RGB (3 kênh).
Thay conv1 cho Flow (số kênh = 2*flow_stack), khởi tạo lặp/scale từ kernel 3 kênh.
Bỏ fc gốc → Identity, trả feat_dim.

*TwoStreamModel.forward*:
Gộp B×K×C×H×W → tính đặc trưng cho từng frame → trung bình theo K (temporal average).

*Fusion*:
concat_fc: concat (RGB|Flow) → FC.
late_sum: hai head FC riêng, tổng có trọng số theo fusion_weight_rgb/flow.

In [5]:

#@title Two-Stream Model Definition

def make_resnet_backbone(name: str, in_channels: int=3, pretrained: bool=True):
    if name == "resnet18":
        m = models.resnet18(weights=models.ResNet18_Weights.DEFAULT if pretrained and in_channels==3 else None)
    elif name == "resnet34":
        m = models.resnet34(weights=models.ResNet34_Weights.DEFAULT if pretrained and in_channels==3 else None)
    elif name == "resnet50":
        m = models.resnet50(weights=models.ResNet50_Weights.DEFAULT if pretrained and in_channels==3 else None)
    else:
        raise ValueError("Unsupported backbone: " + name)
    # Adjust first conv if in_channels != 3
    if in_channels != 3:
        w = m.conv1.weight.data
        m.conv1 = nn.Conv2d(in_channels, m.conv1.out_channels, kernel_size=7, stride=2, padding=3, bias=False)
        if w.shape[1] == 3:
            with torch.no_grad():
                # repeat/average to new in_channels
                repeat = int(math.ceil(in_channels / 3))
                w_rep = w.repeat(1, repeat, 1, 1)[:, :in_channels, :, :]
                w_new = w_rep * (3.0 / in_channels)  # preserve activation scale roughly
                m.conv1.weight.copy_(w_new)
    # Replace classifier with identity; return feature_dim
    feat_dim = m.fc.in_features
    m.fc = nn.Identity()
    return m, feat_dim

class TwoStreamModel(nn.Module):
    def __init__(self, num_classes: int, cfg):
        super().__init__()
        self.cfg = cfg
        self.rgb_backbone, rgb_dim = make_resnet_backbone(cfg.rgb_backbone, in_channels=3, pretrained=True)
        flow_in_ch = 2 * cfg.flow_stack  # 2 per flow (u,v) times stack
        self.flow_backbone, flow_dim = make_resnet_backbone(cfg.flow_backbone, in_channels=flow_in_ch, pretrained=False)

        if cfg.fusion == "concat_fc":
            self.classifier = nn.Linear(rgb_dim + flow_dim, num_classes)
        else:
            self.rgb_fc = nn.Linear(rgb_dim, num_classes)
            self.flow_fc = nn.Linear(flow_dim, num_classes)
        self.init_weights()

    def init_weights(self):
        for m in self.modules():
            if isinstance(m, nn.Linear):
                nn.init.normal_(m.weight, 0, 0.01)
                if m.bias is not None:
                    nn.init.constant_(m.bias, 0)

    def forward_stream(self, backbone, x_clip):
        # x_clip: B x K x C x H x W (we will merge BK then average logits across K)
        B, K, C, H, W = x_clip.shape
        x = x_clip.view(B*K, C, H, W)
        feats = backbone(x)  # (B*K) x D
        feats = feats.view(B, K, -1)  # B x K x D
        feats = feats.mean(dim=1)     # average over temporal segments
        return feats  # B x D

    def forward(self, rgb_clip, flow_clip):
        rgb_feats = self.forward_stream(self.rgb_backbone, rgb_clip)
        flow_feats = self.forward_stream(self.flow_backbone, flow_clip)

        if self.cfg.fusion == "concat_fc":
            feats = torch.cat([rgb_feats, flow_feats], dim=1)
            logits = self.classifier(feats)
        else:  # late_sum
            rgb_logits = self.rgb_fc(rgb_feats)
            flow_logits = self.flow_fc(flow_feats)
            logits = self.cfg.fusion_weight_rgb * rgb_logits + self.cfg.fusion_weight_flow * flow_logits
        return logits


In [6]:

#@title Training / Evaluation Helpers

def set_seed(seed: int):
    random.seed(seed); np.random.seed(seed); torch.manual_seed(seed); torch.cuda.manual_seed_all(seed)

def accuracy_top1(logits, targets):
    preds = logits.argmax(dim=1)
    return (preds == targets).float().mean().item()

def train_one_epoch(model, loader, optimizer, epoch, cfg):
    model.train()
    running_loss, running_acc = 0.0, 0.0
    start = time.time()
    for i, (rgb, flow, y) in enumerate(loader, 1):
        rgb = rgb.to(device, non_blocking=True)
        flow = flow.to(device, non_blocking=True)
        y = y.to(device, non_blocking=True)

        logits = model(rgb, flow)
        loss = F.cross_entropy(logits, y)
        acc = accuracy_top1(logits, y)

        optimizer.zero_grad()
        loss.backward()
        optimizer.step()

        running_loss += loss.item()
        running_acc += acc

        if i % cfg.log_interval == 0:
            print(f"[Epoch {epoch}] Step {i}/{len(loader)}  loss={running_loss/i:.4f}  acc={running_acc/i:.4f}  time={(time.time()-start):.1f}s")

    return running_loss/len(loader), running_acc/len(loader)

@torch.no_grad()
def evaluate(model, loader):
    model.eval()
    eloss, eacc = 0.0, 0.0
    all_y, all_p = [], []
    for rgb, flow, y in loader:
        rgb = rgb.to(device, non_blocking=True)
        flow = flow.to(device, non_blocking=True)
        y = y.to(device, non_blocking=True)
        logits = model(rgb, flow)
        loss = F.cross_entropy(logits, y)
        acc = accuracy_top1(logits, y)
        eloss += loss.item()
        eacc += acc
        all_y.append(y.cpu().numpy())
        all_p.append(logits.argmax(dim=1).cpu().numpy())
    all_y = np.concatenate(all_y) if all_y else np.array([])
    all_p = np.concatenate(all_p) if all_p else np.array([])
    return eloss/len(loader), eacc/len(loader), all_y, all_p

def save_ckpt(model, optimizer, epoch, cfg, class_names):
    path = os.path.join(cfg.output_dir, f"checkpoint_epoch_{epoch}.pth")
    payload = {
        "epoch": epoch,
        "model_state": model.state_dict(),
        "optimizer_state": optimizer.state_dict(),
        "cfg": asdict(cfg),
        "class_names": class_names,
    }
    torch.save(payload, path)
    print("Saved checkpoint:", path)
    return path

def plot_curves(history, out_dir):
    plt.figure()
    plt.plot(history["train_loss"], label="train_loss")
    plt.plot(history["val_loss"], label="val_loss")
    plt.xlabel("epoch"); plt.ylabel("loss"); plt.title("Loss"); plt.legend(); plt.grid(True)
    plt.tight_layout()
    loss_path = os.path.join(out_dir, "loss_curve.png")
    plt.savefig(loss_path, dpi=150); plt.show()

    plt.figure()
    plt.plot(history["train_acc"], label="train_acc")
    plt.plot(history["val_acc"], label="val_acc")
    plt.xlabel("epoch"); plt.ylabel("accuracy"); plt.title("Accuracy"); plt.legend(); plt.grid(True)
    plt.tight_layout()
    acc_path = os.path.join(out_dir, "acc_curve.png")
    plt.savefig(acc_path, dpi=150); plt.show()

    return loss_path, acc_path

def plot_confusion_matrix(cm, classes, normalize=True, title='Confusion matrix'):
    if normalize:
        cm = cm.astype('float') / (cm.sum(axis=1, keepdims=True) + 1e-8)
    plt.figure(figsize=(6,6))
    plt.imshow(cm, interpolation='nearest')
    plt.title(title); plt.colorbar()
    tick_marks = np.arange(len(classes))
    plt.xticks(tick_marks, classes, rotation=90)
    plt.yticks(tick_marks, classes)
    fmt = '.2f' if normalize else 'd'
    thresh = cm.max() / 2.
    for i, j in itertools.product(range(cm.shape[0]), range(cm.shape[1])):
        plt.text(j, i, format(cm[i, j], fmt),
                 horizontalalignment="center",
                 color="white" if cm[i, j] > thresh else "black", fontsize=7)
    plt.ylabel('True label'); plt.xlabel('Predicted label')
    plt.tight_layout()


In [7]:

#@title Build Datasets and DataLoaders

set_seed(cfg.seed)

class_to_files = list_video_files(cfg.data_root)
if not class_to_files:
    print("No class folders / videos were found. Please check `cfg.data_root`.")
else:
    print("Found classes:", list(class_to_files.keys())[:5], "... (total:", len(class_to_files), ")")

splits, class_names = split_train_val(class_to_files, cfg.train_ratio, cfg.seed)
num_classes = len(class_names)
print("Num classes:", num_classes, "| Train videos:", len(splits["train"]), "| Val videos:", len(splits["val"]))

train_ds = TwoStreamVideoDataset(splits["train"], class_names, cfg)
val_ds   = TwoStreamVideoDataset(splits["val"],   class_names, cfg)

train_loader = DataLoader(train_ds, batch_size=cfg.batch_size, shuffle=True, num_workers=cfg.num_workers, pin_memory=True, drop_last=True)
val_loader   = DataLoader(val_ds,   batch_size=cfg.batch_size, shuffle=False, num_workers=cfg.num_workers, pin_memory=True, drop_last=False)

class_to_idx = {c:i for i,c in enumerate(class_names)}
with open(os.path.join(cfg.output_dir, "class_to_idx.json"), "w", encoding="utf-8") as f:
    json.dump(class_to_idx, f, indent=2, ensure_ascii=False)

print("Class mapping saved to class_to_idx.json")


Found classes: ['BaseballPitch', 'Basketball', 'BenchPress', 'Biking', 'Billiards'] ... (total: 50 )
Num classes: 50 | Train videos: 5326 | Val videos: 1355
Class mapping saved to class_to_idx.json


#### Tải weight của resnet18 để dựng mô hình

In [8]:

#@title Initialize Model & Optimizer

model = TwoStreamModel(num_classes=num_classes, cfg=cfg).to(device)
optimizer = torch.optim.AdamW(model.parameters(), lr=cfg.lr, weight_decay=cfg.weight_decay)

total_params = sum(p.numel() for p in model.parameters())
trainable_params = sum(p.numel() for p in model.parameters() if p.requires_grad)
print(f"Total params: {total_params:,}  |  Trainable: {trainable_params:,}")


Downloading: "https://download.pytorch.org/models/resnet18-f37072fd.pth" to C:\Users\WinIF Chung/.cache\torch\hub\checkpoints\resnet18-f37072fd.pth
100%|█████████████████████████████████████████████████████████████████████████████| 44.7M/44.7M [00:01<00:00, 35.4MB/s]


Total params: 22,457,636  |  Trainable: 22,457,636


In [None]:

#@title Train

history = {"train_loss": [], "train_acc": [], "val_loss": [], "val_acc": []}
best_val = -1.0
metrics_log = []

for epoch in range(1, cfg.epochs+1):
    tr_loss, tr_acc = train_one_epoch(model, train_loader, optimizer, epoch, cfg)
    va_loss, va_acc, y_true, y_pred = evaluate(model, val_loader)

    history["train_loss"].append(tr_loss)
    history["train_acc"].append(tr_acc)
    history["val_loss"].append(va_loss)
    history["val_acc"].append(va_acc)

    ep_metrics = {
        "epoch": epoch,
        "train_loss": tr_loss,
        "train_acc": tr_acc,
        "val_loss": va_loss,
        "val_acc": va_acc
    }
    metrics_log.append(ep_metrics)

    print(f"Epoch {epoch:02d}: train_loss={tr_loss:.4f}, train_acc={tr_acc:.4f} | val_loss={va_loss:.4f}, val_acc={va_acc:.4f}")

    if epoch % cfg.save_every == 0:
        save_ckpt(model, optimizer, epoch, cfg, class_names)

    # Track best
    if va_acc > best_val:
        best_val = va_acc
        best_path = os.path.join(cfg.output_dir, "best_model.pth")
        torch.save({
            "epoch": epoch,
            "model_state": model.state_dict(),
            "cfg": asdict(cfg),
            "class_names": class_names
        }, best_path)
        print("Saved best model to:", best_path)

# Save history & metrics
with open(os.path.join(cfg.output_dir, "history.json"), "w", encoding="utf-8") as f:
    json.dump(history, f, indent=2, ensure_ascii=False)
with open(os.path.join(cfg.output_dir, "metrics_log.json"), "w", encoding="utf-8") as f:
    json.dump(metrics_log, f, indent=2, ensure_ascii=False)

print("Training complete. Logs saved in", cfg.output_dir)


In [None]:

#@title Plot Curves & Confusion Matrix (optional)

loss_path, acc_path = plot_curves(history, cfg.output_dir)
print("Saved plots to:", loss_path, "and", acc_path)

if cfg.compute_confusion and len(val_ds) > 0:
    # Recompute predictions on val set for CM
    _, _, y_true, y_pred = evaluate(model, val_loader)
    if len(y_true) > 0:
        cm = confusion_matrix(y_true, y_pred, labels=list(range(len(class_names))))
        plot_confusion_matrix(cm, classes=class_names, normalize=True, title="Normalized Confusion Matrix")
        cm_path = os.path.join(cfg.output_dir, "confusion_matrix.png")
        plt.savefig(cm_path, dpi=150); plt.show()
        print("Saved confusion matrix to:", cm_path)

        # Also save text report
        report = classification_report(y_true, y_pred, target_names=class_names, digits=4)
        rep_path = os.path.join(cfg.output_dir, "classification_report.txt")
        with open(rep_path, "w", encoding="utf-8") as f:
            f.write(report)
        print("Saved classification report to:", rep_path)


In [None]:

#@title Inference on a Single Video

@torch.no_grad()
def predict_video(model, video_path: str, class_names: List[str], cfg):
    model.eval()
    frames_all = read_all_frames_cv(video_path)
    frames_all = [resize_and_center_crop(f, cfg.frame_size) for f in frames_all]
    n = len(frames_all)
    seg_inds_rgb = sample_indices(n, cfg.num_segments, cfg.clip_len_rgb)
    rgb_indices = [inds[len(inds)//2] for inds in seg_inds_rgb]
    flow_centers = rgb_indices

    rgb_imgs = [frames_all[i] for i in rgb_indices]
    rgb_tensors = [transforms.ToTensor()(img) for img in rgb_imgs]
    rgb_tensors = [transforms.Normalize(mean=[0.485,0.456,0.406], std=[0.229,0.224,0.225])(t) for t in rgb_tensors]
    rgb_clip = torch.stack(rgb_tensors, dim=0).unsqueeze(0).to(device)  # 1xKx3xHxW

    flow_stacks = []
    for c in flow_centers:
        uv = compute_flow_stack(frames_all, c, stack=cfg.flow_stack, method=cfg.flow_method)
        flow_stacks.append(transforms.ToTensor()(uv))
    flow_clip = torch.stack(flow_stacks, dim=0).unsqueeze(0).to(device)  # 1xKxCxHxW

    logits = model(rgb_clip, flow_clip)
    probs = F.softmax(logits, dim=1).cpu().numpy()[0]
    topk = probs.argsort()[::-1][:5]
    return [(class_names[i], float(probs[i])) for i in topk]

# Example (update the path)
# demo_path = "/path/to/UCF50/ApplyEyeMakeup/v_ApplyEyeMakeup_g01_c01.avi"
# print(predict_video(model, demo_path, class_names, cfg))



## Notes & Tips

- **Data layout** đề xuất:
  ```
  UCF50/
    ClassA/
      video1.avi
      video2.avi
      ...
    ClassB/
      v_*.avi
      ...
    ...
  ```
- **Speed**: Việc tính TV-L1 on the fly khá nặng. Hãy thử:
  - Giảm `num_segments` (e.g., 2)
  - Giảm `flow_stack` (e.g., 5)
  - Đổi `flow_method` thành `"farneback"`
  - Precompute và cache flows (chỉnh Dataset để load từ file .npy)..
- **Memory**: Dùng `batch_size` nhỏ hơn nếu gặp lỗi OOM.
- **Backbones**: Có thể đổi thành `resnet50` để có accuracy tốt hơn (nhưng sẽ chậm hơn).
- **Fusion**: Thử dùng `concat_fc` để kết hợp learnable (có thể chậm hơn).
- **Checkpoints**: Trạng thái mô hình tốt nhất được lưu tại `best_model.pth`. Checkpoints theo epoch là `checkpoint_epoch_*.pth`.
- **Reproducibility**:Kết quả có thể thay đổi tùy thuộc vào sự khác biệt trong video decoding và các tham số flow computation..



## Extensions: Flow Cache, LR Scheduler, Early Stopping, TorchScript Export
Bên dưới chúng ta bổ sung:
**Flow precompute & cache to disk** để tăng tốc quá trình training.
**Learning rate scheduler** (CosineAnnealingLR / StepLR / ReduceLROnPlateau).
**Early stopping** dựa trên **validation accuracy**.
**Export**: lưu .pt (state dict) và TorchScript cho mục đích deployment.


In [None]:

#@title Extended Configuration (cache + LR scheduler + early stopping)
from dataclasses import asdict

# ---- Flow cache options ----
use_flow_cache = True            #@param {type:"boolean"}
flow_cache_dir = "./two_stream_runs/flow_cache"  #@param {type:"string"}
precompute_flow_now = False      #@param {type:"boolean"}
precompute_on_split = "train+val"  #@param ["train","val","train+val","all"]
cache_flow_method = "tvl1"       #@param ["tvl1","farneback"]

# ---- Scheduler options ----
lr_scheduler_type = "cosine"     #@param ["cosine","step","plateau","none"]
step_size = 5                    #@param {type:"integer"}
gamma = 0.1                      #@param {type:"number"}
cosine_Tmax = 10                 # cycles over epochs  #@param {type:"integer"}

# ---- Early stopping ----
early_stop_on = "val_acc"        #@param ["val_acc","val_loss"]
early_stop_mode = "max"          #@param ["max","min"]
early_stop_patience = 5          #@param {type:"integer"}

# ---- Export options ----
export_dir = "./two_stream_runs/exports"  #@param {type:"string"}
os.makedirs(export_dir, exist_ok=True)

# attach to cfg safely (cfg is defined above)
for k, v in dict(
    use_flow_cache=use_flow_cache,
    flow_cache_dir=flow_cache_dir,
    precompute_flow_now=precompute_flow_now,
    precompute_on_split=precompute_on_split,
    cache_flow_method=cache_flow_method,
    lr_scheduler_type=lr_scheduler_type,
    step_size=step_size,
    gamma=gamma,
    cosine_Tmax=cosine_Tmax,
    early_stop_on=early_stop_on,
    early_stop_mode=early_stop_mode,
    early_stop_patience=early_stop_patience,
    export_dir=export_dir,
).items():
    setattr(cfg, k, v)

# persist extra config
extra_cfg_path = os.path.join(cfg.output_dir, "config_extra.json")
with open(extra_cfg_path, "w", encoding="utf-8") as f:
    json.dump({
        "use_flow_cache": use_flow_cache,
        "flow_cache_dir": flow_cache_dir,
        "precompute_flow_now": precompute_flow_now,
        "precompute_on_split": precompute_on_split,
        "cache_flow_method": cache_flow_method,
        "lr_scheduler_type": lr_scheduler_type,
        "step_size": step_size,
        "gamma": gamma,
        "cosine_Tmax": cosine_Tmax,
        "early_stop_on": early_stop_on,
        "early_stop_mode": early_stop_mode,
        "early_stop_patience": early_stop_patience,
        "export_dir": export_dir
    }, f, indent=2, ensure_ascii=False)
print("Saved extra config to:", extra_cfg_path)


In [None]:

#@title Flow Precompute & Caching Utilities
import hashlib
from pathlib import Path

def _safe_relpath(video_path: str, root: str) -> str:
    try:
        rp = os.path.relpath(video_path, root)
    except Exception:
        rp = os.path.basename(video_path)
    return rp.replace("\\", "/")

def _cache_path_for_video(video_path: str, cache_root: str) -> str:
    # deterministic path inside cache_root mirroring class/subdir structure, with .npy filename
    base = Path(video_path)
    # hash full path to avoid collisions
    h = hashlib.sha1(video_path.encode("utf-8")).hexdigest()[:10]
    rel = _safe_relpath(video_path, cfg.data_root)
    rel_no_ext = os.path.splitext(rel)[0]
    dst_dir = Path(cache_root) / Path(rel_no_ext).parent
    dst_dir.mkdir(parents=True, exist_ok=True)
    return str(dst_dir / f"{Path(rel_no_ext).name}__{h}.npy")

def compute_pairwise_flows(frames: list, method: str="tvl1"):
    flows = []
    if method == "tvl1":
        tvl1 = cv2.optflow.DualTVL1OpticalFlow_create()
        for t in range(len(frames)-1):
            prev = cv2.cvtColor(frames[t], cv2.COLOR_RGB2GRAY)
            nxt  = cv2.cvtColor(frames[t+1], cv2.COLOR_RGB2GRAY)
            flow = tvl1.calc(prev, nxt, None).astype(np.float32)  # HxWx2
            flows.append(flow)
    else:
        for t in range(len(frames)-1):
            prev = cv2.cvtColor(frames[t], cv2.COLOR_RGB2GRAY)
            nxt  = cv2.cvtColor(frames[t+1], cv2.COLOR_RGB2GRAY)
            flow = cv2.calcOpticalFlowFarneback(prev, nxt, None,
                                                pyr_scale=0.5, levels=3, winsize=15,
                                                iterations=3, poly_n=5, poly_sigma=1.1, flags=0).astype(np.float32)
            flows.append(flow)
    return flows  # list of (H,W,2)

def precompute_flow_for_video(video_path: str, out_path: str, frame_size: int, method: str):
    # Read frames, resize+crop exactly like training
    frames_all = read_all_frames_cv(video_path)
    frames_all = [resize_and_center_crop(f, frame_size) for f in frames_all]
    if len(frames_all) < 2:
        np.save(out_path, np.zeros((0, frame_size, frame_size, 2), dtype=np.float32))
        return out_path
    flows = compute_pairwise_flows(frames_all, method=method)
    flows = np.stack(flows, axis=0)  # (N-1, H, W, 2)
    # Robust scale per-video (clip 95th percentile then scale to [-1,1] approx)
    mag = np.sqrt(np.sum(flows**2, axis=3, keepdims=True))
    m95 = np.percentile(mag, 95) + 1e-6
    flows = np.clip(flows, -m95, m95) / m95
    np.save(out_path, flows.astype(np.float32))
    return out_path

def precompute_flow_for_split(samples, cache_root: str, frame_size: int, method: str="tvl1"):
    paths = []
    for i, (vp, _) in enumerate(samples, 1):
        dst = _cache_path_for_video(vp, cache_root)
        if not os.path.exists(dst):
            try:
                precompute_flow_for_video(vp, dst, frame_size, method)
                print(f"[{i}/{len(samples)}] cached flow:", dst)
            except Exception as e:
                print(f"[{i}/{len(samples)}] FAILED {vp}: {e}")
        else:
            # print(f"[{i}/{len(samples)}] exists:", dst)
            pass
        paths.append((vp, dst))
    # Save an index mapping
    index_path = os.path.join(cache_root, "index.json")
    os.makedirs(cache_root, exist_ok=True)
    # read existing index (merge)
    idx = {}
    if os.path.exists(index_path):
        try:
            with open(index_path, "r", encoding="utf-8") as f:
                idx = json.load(f)
        except Exception:
            idx = {}
    for vp, dst in paths:
        idx[vp] = dst
    with open(index_path, "w", encoding="utf-8") as f:
        json.dump(idx, f, indent=2, ensure_ascii=False)
    print("Updated flow cache index:", index_path)
    return index_path

# Optional: run precompute now
os.makedirs(cfg.flow_cache_dir, exist_ok=True)
if cfg.precompute_flow_now:
    target = []
    if cfg.precompute_on_split in ("train", "train+val", "all"):
        target += splits["train"]
    if cfg.precompute_on_split in ("val", "train+val", "all"):
        target += splits["val"]
    precompute_flow_for_split(target, cfg.flow_cache_dir, cfg.frame_size, method=cfg.cache_flow_method)
else:
    print("Skip flow precompute (set precompute_flow_now=True to enable). Cache dir:", cfg.flow_cache_dir)


In [None]:

#@title Dataset (v2) using Cached Flows if enabled

# Helper to form flow stack from cached pairwise flows around a center index
def flow_stack_from_cache(flow_pairs: np.ndarray, center_index: int, stack: int) -> np.ndarray:
    # flow_pairs shape: (N-1, H, W, 2), where pair t is flow from frame t -> t+1
    N_1 = flow_pairs.shape[0]
    pairs = []  # we want exactly 'stack' pairs ending near center
    # preferentially take backward pairs ending at center: (center-1)->center, ... (center-stack)->(center-stack+1)
    for k in range(stack):
        t = center_index - 1 - k
        if 0 <= t < N_1:
            pairs.append(t)
    pairs = pairs[::-1]  # ascending
    # if not enough, add forward pairs starting at center
    t = center_index
    while len(pairs) < stack and t < N_1:
        pairs.append(t)
        t += 1
    if len(pairs) == 0:
        # degenerate case: no pairs available
        H, W = flow_pairs.shape[1:3]
        return np.zeros((H, W, 2*stack), dtype=np.float32)
    # Gather and concat along channel
    flows = flow_pairs[pairs]  # (S, H, W, 2)
    uv_stack = np.concatenate([f for f in flows], axis=2)  # (H, W, 2S)
    return uv_stack.astype(np.float32)

class TwoStreamVideoDataset(Dataset):
    def __init__(self, samples: List[Tuple[str,int]], class_names: List[str], cfg):
        self.samples = samples
        self.class_names = class_names
        self.cfg = cfg
        self.rgb_tf = transforms.Compose([
            transforms.ToTensor(),
            transforms.Normalize(mean=[0.485, 0.456, 0.406],
                                 std=[0.229, 0.224, 0.225])
        ])
        self.flow_tf = transforms.Compose([ transforms.ToTensor() ])

        # load cache index if any
        self.flow_cache_index = {}
        if getattr(cfg, "use_flow_cache", False):
            index_path = os.path.join(cfg.flow_cache_dir, "index.json")
            if os.path.exists(index_path):
                with open(index_path, "r", encoding="utf-8") as f:
                    self.flow_cache_index = json.load(f)

    def __len__(self):
        return len(self.samples)

    def __getitem__(self, idx):
        video_path, label = self.samples[idx]
        frames_all = read_all_frames_cv(video_path)
        frames_all = [resize_and_center_crop(f, self.cfg.frame_size) for f in frames_all]
        n = len(frames_all)

        seg_inds_rgb = sample_indices(n, self.cfg.num_segments, self.cfg.clip_len_rgb)
        rgb_indices = [inds[len(inds)//2] for inds in seg_inds_rgb]
        flow_centers = rgb_indices

        # RGB clip
        rgb_imgs = [frames_all[i] for i in rgb_indices]
        rgb_tensors = [self.rgb_tf(img) for img in rgb_imgs]
        rgb_clip = torch.stack(rgb_tensors, dim=0)  # Kx3xHxW

        # Flow clip
        flow_stacks = []
        if getattr(self.cfg, "use_flow_cache", False):
            # load cached pairwise flows for this video if exists; otherwise fall back
            flow_cache_path = self.flow_cache_index.get(video_path, _cache_path_for_video(video_path, self.cfg.flow_cache_dir))
            if not os.path.exists(flow_cache_path):
                # compute on the fly once and cache
                try:
                    precompute_flow_for_video(video_path, flow_cache_path, self.cfg.frame_size, getattr(self.cfg, "cache_flow_method", self.cfg.flow_method))
                except Exception as e:
                    # fallback: compute on the fly per stack
                    flow_cache_path = None
            if flow_cache_path and os.path.exists(flow_cache_path):
                pairwise_flows = np.load(flow_cache_path)  # (N-1,H,W,2) scaled
                for c in flow_centers:
                    uv = flow_stack_from_cache(pairwise_flows, c, stack=self.cfg.flow_stack)
                    flow_stacks.append(self.flow_tf(uv))
            else:
                for c in flow_centers:
                    uv = compute_flow_stack(frames_all, c, stack=self.cfg.flow_stack, method=self.cfg.flow_method)
                    flow_stacks.append(self.flow_tf(uv))
        else:
            for c in flow_centers:
                uv = compute_flow_stack(frames_all, c, stack=self.cfg.flow_stack, method=self.cfg.flow_method)
                flow_stacks.append(self.flow_tf(uv))

        flow_clip = torch.stack(flow_stacks, dim=0)  # Kx(2*stack)xHxW

        return rgb_clip, flow_clip, label


In [None]:

#@title Train (with LR scheduler + Early Stopping)

def make_scheduler(optimizer):
    if cfg.lr_scheduler_type == "step":
        return torch.optim.lr_scheduler.StepLR(optimizer, step_size=cfg.step_size, gamma=cfg.gamma)
    elif cfg.lr_scheduler_type == "plateau":
        return torch.optim.lr_scheduler.ReduceLROnPlateau(optimizer, mode="max" if cfg.early_stop_on=="val_acc" else "min",
                                                          factor=cfg.gamma, patience=max(1, cfg.early_stop_patience//2), verbose=True)
    elif cfg.lr_scheduler_type == "cosine":
        return torch.optim.lr_scheduler.CosineAnnealingLR(optimizer, T_max=cfg.cosine_Tmax)
    else:
        return None

def is_better(a, b, mode):
    if b is None: return True
    return (a > b) if mode == "max" else (a < b)

history = {"train_loss": [], "train_acc": [], "val_loss": [], "val_acc": []}
metrics_log = []

best_metric = None
best_epoch = 0
no_improve = 0

scheduler = make_scheduler(optimizer)

for epoch in range(1, cfg.epochs+1):
    tr_loss, tr_acc = train_one_epoch(model, train_loader, optimizer, epoch, cfg)
    va_loss, va_acc, y_true, y_pred = evaluate(model, val_loader)

    history["train_loss"].append(tr_loss); history["train_acc"].append(tr_acc)
    history["val_loss"].append(va_loss);   history["val_acc"].append(va_acc)

    ep_metrics = {"epoch": epoch, "train_loss": tr_loss, "train_acc": tr_acc, "val_loss": va_loss, "val_acc": va_acc}
    metrics_log.append(ep_metrics)

    # choose monitored metric
    current = va_acc if cfg.early_stop_on == "val_acc" else va_loss
    mode = "max" if (cfg.early_stop_on == "val_acc") else "min"

    # step scheduler
    if scheduler is not None:
        if cfg.lr_scheduler_type == "plateau":
            scheduler.step(current)
        else:
            scheduler.step()

    # checkpoint
    print(f"Epoch {epoch:02d}: train_loss={tr_loss:.4f}, train_acc={tr_acc:.4f} | val_loss={va_loss:.4f}, val_acc={va_acc:.4f}")
    save_ckpt(model, optimizer, epoch, cfg, class_names)

    # track best & early stopping
    if is_better(current, best_metric, mode):
        best_metric = current
        best_epoch = epoch
        no_improve = 0
        torch.save({
            "epoch": epoch,
            "model_state": model.state_dict(),
            "cfg": asdict(cfg),
            "class_names": class_names
        }, os.path.join(cfg.output_dir, "best_model.pth"))
        print("Saved best model (epoch", epoch, ")")
    else:
        no_improve += 1
        print(f"No improvement for {no_improve}/{cfg.early_stop_patience} epochs (best @ epoch {best_epoch}).")
        if no_improve >= cfg.early_stop_patience:
            print("Early stopping triggered.")
            break

# Save history & metrics
with open(os.path.join(cfg.output_dir, "history.json"), "w", encoding="utf-8") as f:
    json.dump(history, f, indent=2, ensure_ascii=False)
with open(os.path.join(cfg.output_dir, "metrics_log.json"), "w", encoding="utf-8") as f:
    json.dump(metrics_log, f, indent=2, ensure_ascii=False)

print("Training finished. Best epoch:", best_epoch, "| Best metric:", best_metric)


In [None]:

#@title Export: state_dict (.pt) and TorchScript (.pt)

# Save state_dict (weights only)
state_dict_path = os.path.join(cfg.export_dir, "two_stream_state_dict.pt")
torch.save(model.state_dict(), state_dict_path)
print("Saved state_dict to:", state_dict_path)

# TorchScript (script) export
try:
    scripted = torch.jit.script(model.eval())
    ts_path = os.path.join(cfg.export_dir, "two_stream_scripted.pt")
    scripted.save(ts_path)
    print("Saved TorchScript scripted model to:", ts_path)
except Exception as e:
    print("TorchScript scripting failed, falling back to tracing. Reason:", e)
    # Create dummy inputs for tracing
    K = cfg.num_segments
    C_rgb = 3
    C_flow = 2*cfg.flow_stack
    H = W = cfg.frame_size
    dummy_rgb = torch.randn(1, K, C_rgb, H, W, device=device)
    dummy_flow = torch.randn(1, K, C_flow, H, W, device=device)
    traced = torch.jit.trace(model.eval(), (dummy_rgb, dummy_flow))
    ts_path = os.path.join(cfg.export_dir, "two_stream_traced.pt")
    traced.save(ts_path)
    print("Saved TorchScript traced model to:", ts_path)

print("Export artifacts are in:", cfg.export_dir)
