# 1. Solution Overview

## 1.1. Data Preprocessing / Feature Engineering
In general, our solution does not use complex feature engineering.

And, we use the following data processing methods to improve the stability of training and the final performance: 
1) Normalize the noisy seismic and denoised seismic on the entire volume to 0 - 255 without clip;
2) Extract 2D slices from 3D volumes for training and test. And use the entire slice (1259x300) for training and testing;
3) Use horizontal flip data augmentation during training and horizontal flip test time augmentation during testing;

## 1.2. Model description

As for the model, we use [NAFNet-width64](https://arxiv.org/abs/2204.04676) as our final model, which is the UNet-based model, exceeds the SOTA methods before and is computationally efficient. 

And we use the [official pretrained model](https://drive.google.com/file/d/14Fht1QQJ2gMlk4N1ERCRuElg8JfjrWWR/view?usp=sharing) on [SIDD dataset](https://abdokamel.github.io/sidd/).


The architecture of NAFNet is shown in the figure below. It is a Unet built by NAFNet block, which contains SCA (Simple Channel Attention) module and Simple Gate module.

![NAFNet model schema](images/NAFNet.png "UNET with 'resnet34' encoder")

The image is taken from [here](https://arxiv.org/abs/2204.04676).




We developed this solution based on the [official NAFNet code](https://github.com/megvii-research/NAFNet).

## 1.3. Hardware and environment

- PyTorch  2.1.2

- Python  3.10 (ubuntu22.04)

- Cuda  11.8

- GPU  A40(48GB) * 2

- CPU  30 vCPU AMD EPYC 7543 32-Core Processor


With the current pipeline settings, the training process took about 40 hours (with 2 NVIDIA A40 GPU (2 * 48 GB) available).


# 2. Solution Reproduction Steps

<font color=red>If you only want to reproduce my inference result, you only need to read section 2.1 and 2.6.</font>

<font color=red>If you want to reproduce both my training and inference result, you need to read from section 2.1 step by step.</font>


## 2.1. Environment Setup
Please, run the following command to install all needed libraries and packages.

<font color=red>Custom_NAFNet is modified by me according to the official code of NAFNet, and the training is based on this library</font>

In [None]:
! pip install -r requirements.txt
! cd ./src/Custom_NAFNet && python setup.py develop --no_cuda_ext

## 2.2. Download and unzip data

Download the original competition data and unzip it.

Run the shell script in the next cell to download the original competition train data and unzip it. You need to modify the following variable in the below cell:
- ```SRC_TRAIN_DATA_ROOT```, <font color=red>represents the path to save the downloaded training data.</font>

In [None]:
%%bash

SRC_TRAIN_DATA_ROOT="./data/train_images/"

wget -P $SRC_TRAIN_DATA_ROOT https://xeek-public-287031953319-eb80.s3.amazonaws.com/image-impeccable/image-impeccable-train-data-part1.zip
wget -P $SRC_TRAIN_DATA_ROOT https://xeek-public-287031953319-eb80.s3.amazonaws.com/image-impeccable/image-impeccable-train-data-part2.zip
wget -P $SRC_TRAIN_DATA_ROOT https://xeek-public-287031953319-eb80.s3.amazonaws.com/image-impeccable/image-impeccable-train-data-part3.zip
wget -P $SRC_TRAIN_DATA_ROOT https://xeek-public-287031953319-eb80.s3.amazonaws.com/image-impeccable/image-impeccable-train-data-part4.zip
wget -P $SRC_TRAIN_DATA_ROOT https://xeek-public-287031953319-eb80.s3.amazonaws.com/image-impeccable/image-impeccable-train-data-part5.zip
wget -P $SRC_TRAIN_DATA_ROOT https://xeek-public-287031953319-eb80.s3.amazonaws.com/image-impeccable/image-impeccable-train-data-part6.zip
wget -P $SRC_TRAIN_DATA_ROOT https://xeek-public-287031953319-eb80.s3.amazonaws.com/image-impeccable/image-impeccable-train-data-part7.zip
wget -P $SRC_TRAIN_DATA_ROOT https://xeek-public-287031953319-eb80.s3.amazonaws.com/image-impeccable/image-impeccable-train-data-part8.zip
wget -P $SRC_TRAIN_DATA_ROOT https://xeek-public-287031953319-eb80.s3.amazonaws.com/image-impeccable/image-impeccable-train-data-part9.zip
wget -P $SRC_TRAIN_DATA_ROOT https://xeek-public-287031953319-eb80.s3.amazonaws.com/image-impeccable/image-impeccable-train-data-part10.zip
wget -P $SRC_TRAIN_DATA_ROOT https://xeek-public-287031953319-eb80.s3.amazonaws.com/image-impeccable/image-impeccable-train-data-part11.zip
wget -P $SRC_TRAIN_DATA_ROOT https://xeek-public-287031953319-eb80.s3.amazonaws.com/image-impeccable/image-impeccable-train-data-part12.zip
wget -P $SRC_TRAIN_DATA_ROOT https://xeek-public-287031953319-eb80.s3.amazonaws.com/image-impeccable/image-impeccable-train-data-part13.zip
wget -P $SRC_TRAIN_DATA_ROOT https://xeek-public-287031953319-eb80.s3.amazonaws.com/image-impeccable/image-impeccable-train-data-part14.zip
wget -P $SRC_TRAIN_DATA_ROOT https://xeek-public-287031953319-eb80.s3.amazonaws.com/image-impeccable/image-impeccable-train-data-part15.zip
wget -P $SRC_TRAIN_DATA_ROOT https://xeek-public-287031953319-eb80.s3.amazonaws.com/image-impeccable/image-impeccable-train-data-part16.zip
wget -P $SRC_TRAIN_DATA_ROOT https://xeek-public-287031953319-eb80.s3.amazonaws.com/image-impeccable/image-impeccable-train-data-part17.zip

for file in $(find $SRC_TRAIN_DATA_ROOT/*.zip -type f); do
    echo "$file is a file"
    unzip -q $file -d $SRC_TRAIN_DATA_ROOT
    rm $file

done

## 2.3. Preprocess training data

Because volume-based data normalization is time-consuming, the training data needs to be normalized to the range of 0-255 before training process, and in order to facilitate data reading, all data shapes are transposed to the same shape (1259, 300, 300) offline before training.

Run the following code to get the normalized data. You need to modify the following three variables in the below code:
1) ```SRC_TRAIN_DATA_ROOT```, represents the path of the original training data. This folder contains all the training data.
2) ```DST_TRAIN_DATA_ROOT```, represents the storage path of the normalized and reshaped training data.
3) ```PROCESS_THREAD_NUM```, represents the number of threads when multi-threading is performing data normalization. <font color=red>This variable can not be greater than multiprocessing.cpu_count().</font>

In [None]:
import os
import numpy as np
from natsort import natsorted
import multiprocessing

SRC_TRAIN_DATA_ROOT = r"./data/train_images/"
DST_TRAIN_DATA_ROOT = r"./data/train_images_new_shape/"
PROCESS_THREAD_NUM = 32

os.makedirs(DST_TRAIN_DATA_ROOT, exist_ok=True)

def rescale_volume(seismic, low=0, high=100):
    """
    Rescaling 3D seismic volumes 0-255 range, clipping values between low and high percentiles
    """

    minval = np.percentile(seismic, low)
    maxval = np.percentile(seismic, high)

    seismic = np.clip(seismic, minval, maxval)
    seismic = ((seismic - minval) / (maxval - minval)) * 255

    return seismic

def process(test_id):
    files = natsorted(os.listdir(f"{SRC_TRAIN_DATA_ROOT}/{test_id}"))
    assert "noise" in files[1]
    assert "fullstack" in files[0]
    data = np.load(os.path.join(SRC_TRAIN_DATA_ROOT, test_id, files[1]), allow_pickle=True, mmap_mode="r+")
    label = np.load(os.path.join(SRC_TRAIN_DATA_ROOT, test_id,files[0]), allow_pickle=True, mmap_mode="r+")

    if data.shape != label.shape:
        label = label.T
    # trans to shape (1259, 300, 300)
    if data.shape[1] == 1259:
        data = data.transpose(1, 0, 2)
        label = label.transpose(1, 0, 2)
    elif data.shape[2] == 1259:
        data = data.transpose(2, 0, 1)
        label = label.transpose(2, 0, 1)

    data = data.astype(np.float32)
    label = label.astype(np.float32)

    save_dir = os.path.join(DST_TRAIN_DATA_ROOT, test_id)

    os.makedirs(save_dir, exist_ok=True)

    data = rescale_volume(data)
    label = rescale_volume(label)

    np.save(f"{save_dir}/{files[1]}", data)
    np.save(f"{save_dir}/{files[0]}", label)


with multiprocessing.Pool(processes = PROCESS_THREAD_NUM) as pool:
    test_id_s = natsorted(os.listdir(SRC_TRAIN_DATA_ROOT))
    print(test_id_s)
    result = pool.map(process, test_id_s)

When you run the above code, you should get a folder structure like this in directory ```./data/train_images_new_shape/```

![train data dir](images/train_data_dir.png "train data folder Structure")

## 2.4. Kfold training data

We used local 5-fold cross validation during the training process. In this section, we divide the training data into 5-folds to obtain training and validation data for each fold.

Run the following code to kfold the training data. You need to modify the following two variables in the below code:
1) ```SRC_TRAIN_DATA_ROOT```, represents the path of the original training data. This folder contains all the training data.
2) ```KFOLD_TXT_SAVE_ROOT```, represents the path to save the txt file of training and validation data in each fold, <font color=red>which will be used in the training section</font>

<font color=red>Because the generated 5-fold division txt file is already provided in path ```./data/train_txt/```, you can skip the following code and directly use the txt file in folder ```./data/train_txt/``` for training.</font>

In [18]:
import os
import numpy as np
from sklearn.model_selection import KFold

SRC_TRAIN_DATA_ROOT = r"./data/train_images_new_shape/"
KFOLD_TXT_SAVE_ROOT = r"./train_txt_reproduce/"

NUM_FOLD = 5
RANDOM_SEED=123
os.makedirs(KFOLD_TXT_SAVE_ROOT, exist_ok=True)

all_train_case = np.asarray(os.listdir(SRC_TRAIN_DATA_ROOT))
kf = KFold(n_splits=NUM_FOLD, random_state=RANDOM_SEED, shuffle=True)
for i, (train_index, valid_index) in enumerate(kf.split(all_train_case)):
    train_case = all_train_case[train_index]
    valid_case = all_train_case[valid_index]

    np.savetxt(f"{KFOLD_TXT_SAVE_ROOT}/train_f{i}.txt", train_case, fmt="%s")
    np.savetxt(f"{KFOLD_TXT_SAVE_ROOT}/val_f{i}.txt", valid_case, fmt="%s")

## 2.5. Pipeline Training Configuration

The training configuration file is located in ```./configs/CUSTOM_NAFNet-width64-train.yml```. <font color=red>In order to ensure the success of the training, the following variables in this file need to be modified.</font>
1) ```num_gpu```, represents the number of GPUs to be used for training. We used 2 nvidia A40 GPU for training and write 2 here. <font color=red>If you use AWS SageMaker g5.12xlarge instance, you should write 4 here.</font>
1) ```datasets.train.root_dir```, represents the root directory of the transposed training data, which should contain subfolders for the 249 training image pairs. <font color=red>It should be changed to the ```DST_TRAIN_DATA_ROOT``` where the transposed data was saved in Section 2.3.</font>
2) ```datasets.train.txt_file```, represents the txt file used for training. To reproduce our results, you need to use the fold0 data divided in Section 2.4 for training. <font color=red>It should be changed to the path where the train_f0.txt file saved in section 2.4 is located.</font>
3) ```datasets.val.root_dir```,  represents the root directory of the transposed training data,  should be the same as ```datasets.train.root_dir```.
4) ```datasets.val.txt_file```, represents the txt file used for validation. <font color=red>It should be changed to the path where the val_f0.txt file saved in section 2.4 is located.</font>
5) ```path.pretrain_network_g```, represents the path of the pretrained model. The model will be automatically loaded before training starts. It should be changed to <font color=red><u>./pretrained_model/NAFNet-SIDD-width64.pth</u>.</font>
6) ```datasets.train.batch_size_per_gpu```, represents the mini batchsize of each GPU during training. When I use the A40 (48GB) GPU locally, I can set this variable to a maximum of 2. <font color=red>If you are using an AWS SageMaker g5.12xlarge instance, you can set this variable to 1.</font>


## 2.6. Training step
<font color=red>You can skip this step if you only want to inference with my pre-trained model, which is provided in </font>```./my_checkpoints/net_g_190000.pth```.

Otherwise, run the following command to trigger the model training script. 

In the following command, the meaning of each variable is as follows:
1) ```nproc_per_node```, represents the number of nodes you use for training. I set it to 2 when I training locally using 2xA40. <font color=red>If you are using an AWS SageMaker g5.12xlarge instance, you can set this variable to 4.</font>
2) ```master_port```, communication Ports.
3) ```./src/Custom_NAFNet/basicsr/train.py```, this is the main file of the training script and does not need to be modified.
4) ```-opt```, this is the configuration file for the training process, including the model, data, training strategy, etc. <font color=red>You should modify it to the configuration file modified in section 2.5. The default is ```./configs/CUSTOM_NAFNet-width64-train.yml```.</font>

NOTE:

1) Using the same data and training configuration as mine, you can reproduce my training results.

2) <font color=red>IF you do training process,  The training result will be saved to ```./experiments/final_solution_wushaodong``` directory. And you should select ```./experiments/final_solution_wushaodong/modles/net_g_190000.pth``` as your final training models, which can be used to reproduce my test results.</font>

3) I spend nearly 40 hours to finish training by 2 x A40 locally. The training time can be used as a reference for you

In [None]:
! python -m torch.distributed.launch --nproc_per_node=4 --master_port=4321 ./src/Custom_NAFNet/basicsr/train.py -opt ./configs/CUSTOM_NAFNet-width64-train.yml --launcher pytorch

In [12]:
! scp -r ./src/Custom_NAFNet/experiments/* ./experiments/

cp: cannot stat './src/Custom_NAFNet/experiments/*': No such file or directory


## 2.7. Inference step
To inference the model and form a predictions for test dataset please follow the instructions below. 

Firstly, download the test data, decompress it, and organize the test data into the format described below.

![NAFNet model schema](images/test_data_dir.png)

Run the shell script in the next cell to download the original competition test data and unzip it. You need to modify the following variable in the below cell:
```SRC_TEST_DATA_ROOT```, <font color=red>represents the path to save the downloaded test data.</font>

In [None]:
%%bash

SRC_TEST_DATA_ROOT="./data/test_images/"

wget -P $SRC_TEST_DATA_ROOT https://xeek-public-287031953319-eb80.s3.amazonaws.com/image-impeccable/image-impeccable-test-data.zip

for file in $(find $SRC_TEST_DATA_ROOT/*.zip -type f); do
    echo "$file is a file"
    unzip -q $file -d $SRC_TEST_DATA_ROOT
    rm $file
done

Secondly, you need to load the trained model and perform inference.

In the following command, the meaning of each variable is as follows:

1) ```./src/Custom_NAFNet/basicsr/inference.py```, this is the main file of the inference script and does not need to be modified.

2) ```-opt```, this is the configuration file for the inference process, including the model, IO, pretrained model path, etc. <font color=red>The default file is ```./configs/CUSTOM_NAFNet-width64-test.yml```.</font>  And the following variables in this file need to be modified.</font>
    - ```test_dir```, represents the root directory where the test data is located, which is ```SRC_TEST_DATA_ROOT``` in previous section.
    - ```test_res_save_root```, represents the location where the inference results are stored. <font color=red><u>Note that what is stored here is the inference results of each volume, not the final submission format results.</u>.</font>
    - ```path.pretrain_network_g```,  represents the path of the trained model. It can be changed to <font color=red><u>./my_checkpoints/net_g_190000.pth, which was trained by me locally</u>.</font> Also, it can be changed to the model path you trained in section 2.6.

In [None]:
! python ./src/Custom_NAFNet/basicsr/inference.py -opt ./configs/CUSTOM_NAFNet-width64-test.yml

Finally, you can use the following code to produce the final submission file for scoring.

In [None]:
import tqdm
import numpy as np

SUBMIT_FILE = "./final_submission_wushaodong.npz"
INFER_RES_ROOT = "/root/autodl-tmp/Image_Impeccable_Journey_to_Clarity/data/1_final_test_images_res_width64_in1out1_1259x300_bs4_f0_hfliptta_190000/"
TEST_CASES = ['2024-06-10_0d6402b1', '2024-06-10_1a4e5680', '2024-06-10_1b9a0096', 
            '2024-06-10_2bd82c05', '2024-06-10_3b118e17', '2024-06-10_43537d46', 
            '2024-06-10_662066f4', '2024-06-10_971ac6dd', '2024-06-10_9871c8c6', 
            '2024-06-10_b7c329be', '2024-06-10_bfd43f22', '2024-06-10_c952ed24', 
            '2024-06-10_cec3da7f', '2024-06-10_eb45f27e', '2024-06-11_f46c20fe']


def rescale_volume(seismic, low=0, high=100):
    """
    Rescaling 3D seismic volumes 0-255 range, clipping values between low and high percentiles
    """

    minval = np.percentile(seismic, low)
    maxval = np.percentile(seismic, high)

    seismic = np.clip(seismic, minval, maxval)
    seismic = ((seismic - minval) / (maxval - minval)) * 255

    return seismic

def create_submission(seismic_filenames: list, prediction: list, submission_path: str):
    """Function to create submission file out of all test predictions in one list

    Parameters:
        seismic_filenames: list of survey .npy filenames used for perdiction
        prediction: list with 3D np.ndarrays of predicted missing parts
        submission_path: path to save submission

    Returns:
        None
    """

    submission = dict({})
    for sample_name, sample_prediction in zip(seismic_filenames, prediction):
        i_slices_index = (
            np.array([0.25, 0.5, 0.75]) * sample_prediction.shape[0]
        ).astype(int)
        i_slices_names = [f"{sample_name}_gt.npy-i_{n}" for n in range(0, 3)]
        i_slices = [sample_prediction[s, :, :].astype(np.uint8) for s in i_slices_index]
        submission.update(dict(zip(i_slices_names, i_slices)))

        x_slices_index = (
            np.array([0.25, 0.5, 0.75]) * sample_prediction.shape[1]
        ).astype(int)
        x_slices_names = [f"{sample_name}_gt.npy-x_{n}" for n in range(0, 3)]
        x_slices = [sample_prediction[:, s, :].astype(np.uint8) for s in x_slices_index]
        submission.update(dict(zip(x_slices_names, x_slices)))

    np.savez(submission_path, **submission)


file_names_list = []
pres = []
for case in tqdm.tqdm(TEST_CASES):
    data = np.load(f"{INFER_RES_ROOT}/{case}/infe_res.npy")
    data1 = rescale_volume(data.copy().T)

    file_names_list.append(case)
    pres.append(data1.copy())

create_submission(file_names_list, pres, SUBMIT_FILE)