# Solution Overview


### Data Preprocessing / Feature Engineering

We use the following data processing methods to improve the stability of training and the final performance: 
1) The input seismic data was normalized using the training data global statistics :
    ```python
   (mean = 0,std = 0.01550384)
   ```
3) Seismic data where resized using bilinear interpolation
4) Some models use horizontal  flip data augmentation during training and tta2 test time augmentation during testing;


### Model description

The model predicts subsurface velocity maps from multi-component seismic shot gathers by combining a Vision Transformer (ViT) backbone with a learnable channel fusion step and a lightweight convolutional decoder.

![model](assets/model.png)
#### 1)  Input Preprocessing
* Input shape: ```(Bs, C_in, T, R)```
    * Bs: batch size
    * C_in: number of input seismic components (5 shots )
    * T: temporal samples
    * R: receivers

* Preprocessing: Each shot gather is resampled via bilinear interpolation to a fixed model input size (H_in, W_in) for patch-based processing.

#### 2) Encoder – Vision Transformer
* Base model: eva02 (base and large variants) from timm
* Patch size: P × P
* ````Number of tokens = num_patches_h*num_patches_w````, where:
    ```python
        num_patches_h = H_in // P
        num_patches_w = W_in // P 
    ```    
* Processing stages:
    1. Local encoding: First split_at transformer blocks process each seismic component independently.
    2. Channel Fusion : Two strategies are supported:
        * Mean fusion: Simple average across the C_in components.
        * Weighted fusion: Learnable softmax-normalized weights for each component.
    3. Global encoding: After fusion, the remaining blocks integrate information across all components.
#### 3) Decoder – Token-to-Velocity Map
1. Projection: Tokens (excluding CLS) are projected from embed_dim to 512 channels.
2. Reshape: Tokens are reshaped into a (num_patches_h, num_patches_w) spatial grid.
3. Progressive upsampling using ConvTranspose2d layers:
    ```python
    (num_patches_h , num_patches_w)  → 2× → 4× → 8× → 16×
    ```
4. Final convolution: Produces a single-channel velocity map.
5. Interpolation: Output is resized to (H_out, W_out) using bilinear interpolation.

#### 4) Velocity Map Output
1. Sigmoid activation: values in [0,1]
2. Linear scaling:
   ```python
   velocity = 1.5 + 3.0 × sigmoid_output
   ```
### Training strategy
* We employed a K-Fold cross-validation approach for model selection, followed by retraining on the entire dataset.
* The model was optimized using a hybrid loss function that combines `L1Loss` and `Structural Similarity Index (SSIM)`
* Exponential Moving Average (EMA) was applied to stabilize the training.  


# Solution Reproduction Steps

## 1. Pipeline Configuration

Before running the pipeline, please make sure that all data are included in ```data``` folder. No other changes in the config file are required. 

CONFIGURATION CAVEAT:

* If the evaluation instance has 4 GPU, feel free to add ```CUDA_VISIBLE_DEVICES=$num_gpus``` to the training script in (```exp/expXXX.sh```) files to run training in parallel;

* Training was done on 2 RTX3090 gpu (24G of GPU RAM) with 96 CPU RAM

## 2. Environment Setup
Please, run the following command to install all needed libraries and packages.

In [None]:
! pip install -r requirements.txt --quiet

## 3. Data preparation step

In this step, the training and test datasets are formatted to match the model pipeline.  
Each training sample is stored as a single `.npy` file, where the five seismic source components are stacked along the channel dimension. 

In [None]:
import os
import sys
from glob import glob
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
from anytree import Node, RenderTree
from typing import Dict, List
from tqdm import tqdm
from utils import *

Befor running the below code, make sure to extract all the training data in ```data/train``` folder and the test data in ```data/test``` folder.

In [None]:
# Directory path
training_dataset = "./data/train/*"  # 'path to your training data'
test_dataset = "./data/test/*"  # 'path to your training data'
out_train ="./data/prep_train"
out_test ="./data/prep_test"
os.makedirs(out_train,exist_ok=True)
os.makedirs(out_test,exist_ok=True)

In [None]:
sample_paths = glob(training_dataset)
sample_ids = [path.split("/")[-1] for path in sample_paths]
source_coordinates = [1, 75, 150, 225, 300]

for sample_path in tqdm(sample_paths,total=len(sample_paths)):
    file_name=os.path.basename(sample_path)
    
    xs=[]
    for i, s in enumerate(source_coordinates):
        rec_data = np.load(os.path.join(sample_path, f"receiver_data_src_{s}.npy"))
        xs.append(rec_data)
    x_arr=np.stack(xs,0)
    vel_data = np.load(os.path.join(sample_path, f"vp_model.npy"))
    np.save(out_train+"/"+file_name+"_input.npy", x_arr)
    np.save(out_train+"/"+file_name+"_target.npy", vel_data)

In [None]:
df=pd.DataFrame(dict(img_id=sample_ids))
df.to_csv("data/meta_all.csv",index=False)

In [None]:
sample_paths = glob(test_dataset)
sample_ids = [path.split("/")[-1] for path in sample_paths]
source_coordinates = [1, 75, 150, 225, 300]

for sample_path in tqdm(sample_paths,total=len(sample_paths)):
    file_name=os.path.basename(sample_path)
    
    xs=[]
    for i, s in enumerate(source_coordinates):
        rec_data = np.load(os.path.join(sample_path, f"receiver_data_src_{s}.npy"))
        xs.append(rec_data)
    x_arr=np.stack(xs,0)
    
    np.save(out_test+"/"+file_name+".npy", x_arr)

## 4. Training step


You can skip this step if you want to start with the pretrained checkpoints provided in ```./checkpoints```.
Otherwise, uncomment and run the following command to trigger the model training script. 

The checkpoints will be saved to ```./inference_models``` directory.

In [None]:
# ! sh exp/exp149all.sh
# ! sh exp/exp181all.sh
# ! sh exp/exp187all.sh
# ! sh exp/exp188all.sh
# ! sh exp/exp189all.sh
# ! sh exp/exp199all.sh
# ! sh exp/exp207all.sh
# ! sh exp/exp215all.sh

## 5. Inference step
Follow the steps below to run the model inference and generate predictions for the holdout dataset.  
On the public test set, inference takes approximately 30 minutes when using a single RTX 3090 GPU.


In [None]:
import os
import argparse
import pandas as pd
import torch
from torch.utils.data import DataLoader
from tqdm import tqdm
from pathlib import Path
from omegaconf import OmegaConf
import shutil
from src.datasets.fwi import FWIDataset
import numpy as np
import json
from run.init.model import init_model_from_config
from run.init.forwarder import Forwarder
from typing import Optional
import cv2
import yaml
from glob import glob
from torch import nn
from utils import *

In order to change model checkpoints used for inference adjust the ```model_dir_f``` variable below.

In [None]:
exp_names = [(f"exp215_all{seed}",True) for seed in [7,100,700,1000,70000]]
exp_names += [(f"exp207_all{seed}",True) for seed in [7,100,700,1000,70000]]
exp_names += [(f"exp199_all{seed}",False) for seed in [7,100,700,1000,70000]]
exp_names += [(f"exp189_all{seed}",False) for seed in [70,100,700,1000,70000]]
exp_names += [(f"exp188_all{seed}",False) for seed in [71,107, 7001, 10707,71777]]
exp_names += [(f"exp187_all{seed}",False) for seed in [7,100,700,1000,70000]]
exp_names += [(f"exp181_all{seed}",False) for seed in [7,100,700,1000,70000]]
exp_names += [(f"exp149_all{seed}",False) for seed in [7,100,700,1000 ,70000]]
model_dir_f="checkpoints"

In [None]:
def infer(model,  loader,tta=False):
    predictions = []
    img_ids = []
    for batch in tqdm(loader):
        img_ids.extend(batch["image_id"])
        batch["image"] = batch["image"].cuda()
        with torch.no_grad():
            if tta:
                preds = model.predict_tta(batch) 
            else:
                preds = model.predict(batch)
            predictions.append(preds.cpu())
                
    return torch.concat(predictions, dim=0)

In [None]:
device_id = 'cuda:0' 
predictions=[]
for i, (expname, tta) in enumerate(exp_names):
    cfg = OmegaConf.load(f"{model_dir_f}/{expname}/config.yaml")
    cfg.out_dir = "predictions"
    test_df = FWIDataset.create_dataframe(
            -1,
            data_path="data/prep_test",
        ).sort_values(by="img_id", key=lambda x: x)
    dataset = FWIDataset(test_df, phase="test", cfg=cfg.dataset)
    inference_dtype = torch.float32
    batch_size = 8
    loader = DataLoader(dataset, batch_size=batch_size, shuffle=False, num_workers=8)
    device = torch.device(device_id)
    base_model= init_model_from_config(cfg.model)
    model = Forwarder(cfg.forwarder, base_model).eval().to(device,dtype=inference_dtype)
    
    model.model.load_state_dict(
        torch.load(f"{model_dir_f}/{expname}/model_weights_ema.pth", map_location=device),
        strict=True
    )
    preds = infer(model,  loader,tta)
    predictions.append(preds)
final_preds = torch.median(torch.concat(predictions, dim=1), dim=1)[0]

In [None]:
final_preds = final_preds.numpy().astype(np.float64)

In [None]:
img_ids = []
for batch in tqdm(loader):
    img_ids.extend(batch["image_id"])

In [None]:
sub_id="final_submission"
out_folder=f"{sub_id}.npz"

for i,(sample_id, prediction) in enumerate(zip(img_ids, final_preds)):
    create_submission(
        sample_id, prediction, out_folder
    )