# 飞桨常规赛：PALM病理性近视预测 6月第3名方案
【常规赛：PALM病理性近视预测】方案
**比赛地址： [https://aistudio.baidu.com/aistudio/competition/detail/85](https://aistudio.baidu.com/aistudio/competition/detail/85)**




# 一、赛题介绍
## 1. 赛题简介
PALM病理性近视预测常规赛的重点是研究和发展与病理性近视诊断相关的算法。该常规赛的目标是评估和比较在一个常见的视网膜眼底图像数据集上检测病理性近视的自动算法。具体任务是将提供的图像分为病理性近视眼底彩照和非病理性近视眼底彩照，其中，非病理性近视眼底彩照包括正常眼底和高度近视眼底彩照。

 ![](https://ai.bdstatic.com/file/EB6E1DA97ECE4AE79697FD6F6A25F679)

## 2.数据简介
PALM病理性近视预测常规赛由中山大学中山眼科中心提供800张带病理性近视分类标注的眼底彩照供选手训练模型，另提供400张带标注数据供平台进行模型测试。

## 3. 数据说明
本次常规赛提供的病理性近视分类金标准是从临床报告中获取，不仅基于眼底彩照，还结合了OCT、视野检查等结果。

## 4. 训练数据集
文件名称：Train
Train文件夹里有一个fundus_image文件夹和一个Classification.xlsx文件。fundus_image文件夹中数据均为眼底彩照，分辨率为1444×1444，或2124×2056。命名形如N0001.jpg、H0001.jpg、P0001.jpg和V0001.jpg。Classification.xlsx文件中为各眼底图像是否属于病理性近视，属于为1，不属于为0。

## 5.测试数据集
文件名称：PALM-Testing400-Images 文件夹里包含400张眼底彩照，命名形如T0001.jpg。

## 6.提交内容及格式
分类结果应在一个名为“Classification_Results.csv”的CSV文件中提供，第一列对应测试眼底图像的文件名(包括扩展名“.jpg”)，对应title为FileName；第二列包含诊断为PM的患者图像的分类预测概率(值从0.0到1.0)，对应title为PM Risk。示例如下：

![](https://ai.bdstatic.com/file/9B4E52D17D184A0893853C7A3A726BFA)

# 二、数据处理

## 1.数据初步处理
* 解压缩
* 重命名文件夹
* 删除临时文件夹

In [3]:
!unzip -qao data/data85133/常规赛：PALM病理性近视预测.zip

In [4]:
!mv '常规赛：PALM病理性近视预测' dataset

In [5]:
!rm __MACOSX/ -rf

## 2.划分训练集和测试集

In [6]:
# 划分训练集和测试集

import pandas as pd
import random



train_excel_file = 'dataset/Train/Classification.xlsx'
pd_list=pd.read_excel(train_excel_file)

pd_list_lenght=len(pd_list)
# 乱序
pd_list=pd_list.sample(frac=1)
offset=int(pd_list_lenght*0.9)
trian_list=pd_list[:offset]
eval_list=pd_list[offset:]
trian_list.to_csv("train_list.txt", index=None, header=None, sep=' ')
eval_list.to_csv("eval_list.txt", index=None, header=None, sep=' ')


# 三、PaddleX配置

## 1.paddlex安装

In [1]:
! pip install paddlex -i https://mirror.baidu.com/pypi/simple

Looking in indexes: https://mirror.baidu.com/pypi/simple
Collecting paddlex
[?25l  Downloading https://mirror.baidu.com/pypi/packages/d6/a2/07435f4aa1e51fe22bdf06c95d03bf1b78b7bc6625adbb51e35dc0804cc7/paddlex-1.3.11-py3-none-any.whl (516kB)
[K     |████████████████████████████████| 522kB 12.9MB/s eta 0:00:01
[?25hCollecting xlwt (from paddlex)
[?25l  Downloading https://mirror.baidu.com/pypi/packages/44/48/def306413b25c3d01753603b1a222a011b8621aed27cd7f89cbc27e6b0f4/xlwt-1.3.0-py2.py3-none-any.whl (99kB)
[K     |████████████████████████████████| 102kB 24.5MB/s ta 0:00:01
[?25hCollecting paddleslim==1.1.1 (from paddlex)
[?25l  Downloading https://mirror.baidu.com/pypi/packages/d1/77/e257227bed9a70ff0d35a4a3c4e70ac2d2362c803834c4c52018f7c4b762/paddleslim-1.1.1-py2.py3-none-any.whl (145kB)
[K     |████████████████████████████████| 153kB 24.8MB/s eta 0:00:01
Collecting paddlehub==2.1.0 (from paddlex)
[?25l  Downloading https://mirror.baidu.com/pypi/packages/7a/29/3bd0ca43c787181e9

## 2.GPU设置、包引入

In [2]:
# 设置使用0号GPU卡（如无GPU，执行此代码后仍然会使用CPU训练模型）
import matplotlib
matplotlib.use('Agg') 
import os
os.environ['CUDA_VISIBLE_DEVICES'] = '0'
import paddlex as pdx

Deprecated in NumPy 1.20; for more details and guidance: https://numpy.org/devdocs/release/1.20.0-notes.html#deprecations
  def convert_to_list(value, n, name, dtype=np.int):


## 3.数据增强配置

In [3]:

from paddlex.cls import transforms
train_transforms = transforms.Compose([
    transforms.RandomCrop(crop_size=1440),
    transforms.RandomHorizontalFlip(),
    transforms.Normalize()
])
eval_transforms = transforms.Compose([
    transforms.ResizeByShort(short_size=1444),
    transforms.CenterCrop(crop_size=1440),
    transforms.Normalize()
])

## 4.数据集配置

In [4]:
train_dataset = pdx.datasets.ImageNet(
    data_dir='dataset/Train/fundus_image',
    file_list='train_list.txt',
    label_list='train_list.txt',
    transforms=train_transforms,
    shuffle=True)
eval_dataset = pdx.datasets.ImageNet(
    data_dir='dataset/Train/fundus_image',
    file_list='eval_list.txt',
    label_list='train_list.txt',
    transforms=eval_transforms)

2021-06-03 23:28:20 [INFO]	Starting to read file list from dataset...
2021-06-03 23:28:20 [INFO]	720 samples in file train_list.txt
2021-06-03 23:28:20 [INFO]	Starting to read file list from dataset...
2021-06-03 23:28:20 [INFO]	80 samples in file eval_list.txt


# 四、开始训练

In [14]:
model = pdx.cls.MobileNetV3_small_ssld(num_classes=2)
model.train(num_epochs=64,
            train_dataset=train_dataset,
            train_batch_size=32,
            eval_dataset=eval_dataset,
            lr_decay_epochs=[4, 6, 8],
            save_interval_epochs=1,
            learning_rate=0.025,
            save_dir='output/mobilenetv3_small_ssld',
            # resume_checkpoint='output/mobilenetv3_small_ssld/epoch_18',
            use_vdl=True)

Deprecated in NumPy 1.20; for more details and guidance: https://numpy.org/devdocs/release/1.20.0-notes.html#deprecations
  elif dtype == np.bool:
  op_type, op_type, EXPRESSION_MAP[method_name]))
2021-06-03 23:28:24,730 - INFO - If regularizer of a Parameter has been set by 'fluid.ParamAttr' or 'fluid.WeightNormParamAttr' already. The Regularization[L2Decay, regularization_coeff=0.000100] in Optimizer will not take effect, and it will only be applied to other Parameters!


2021-06-03 23:28:25 [INFO]	Decompressing output/mobilenetv3_small_ssld/pretrain/MobileNetV3_small_x1_0_ssld_pretrained.tar...
2021-06-03 23:28:31 [INFO]	Load pretrain weights from output/mobilenetv3_small_ssld/pretrain/MobileNetV3_small_x1_0_ssld_pretrained.
2021-06-03 23:28:31 [INFO]	There are 212 varaibles in output/mobilenetv3_small_ssld/pretrain/MobileNetV3_small_x1_0_ssld_pretrained are loaded.
2021-06-03 23:28:54 [INFO]	[TRAIN] Epoch=1/64, Step=2/22, loss=0.649601, acc1=0.625, acc2=1.0, lr=0.025, time_each_step=11.39s, eta=5:3:28
2021-06-03 23:29:06 [INFO]	[TRAIN] Epoch=1/64, Step=4/22, loss=0.317609, acc1=0.9375, acc2=1.0, lr=0.025, time_each_step=8.67s, eta=3:50:41
2021-06-03 23:29:19 [INFO]	[TRAIN] Epoch=1/64, Step=6/22, loss=0.224524, acc1=0.875, acc2=1.0, lr=0.025, time_each_step=7.98s, eta=3:32:2
2021-06-03 23:29:32 [INFO]	[TRAIN] Epoch=1/64, Step=8/22, loss=0.114413, acc1=1.0, acc2=1.0, lr=0.025, time_each_step=7.56s, eta=3:20:37
2021-06-03 23:29:44 [INFO]	[TRAIN] Epoch=1/

  0%|          | 0/3 [00:00<?, ?it/s]share_vars_from is set, scope is ignored.
100%|██████████| 3/3 [00:25<00:00,  8.50s/it]


2021-06-03 23:31:23 [INFO]	[EVAL] Finished, Epoch=1, acc1=0.9125, acc2=1.0 .
2021-06-03 23:31:23 [INFO]	Model saved in output/mobilenetv3_small_ssld/best_model.
2021-06-03 23:31:24 [INFO]	Model saved in output/mobilenetv3_small_ssld/epoch_1.
2021-06-03 23:31:24 [INFO]	Current evaluated best model in eval_dataset is epoch_1, acc1=0.9125
2021-06-03 23:31:45 [INFO]	[TRAIN] Epoch=2/64, Step=2/22, loss=0.329897, acc1=0.90625, acc2=1.0, lr=0.025, time_each_step=6.63s, eta=3:0:58
2021-06-03 23:31:56 [INFO]	[TRAIN] Epoch=2/64, Step=4/22, loss=0.126015, acc1=0.96875, acc2=1.0, lr=0.025, time_each_step=6.54s, eta=3:0:43
2021-06-03 23:32:08 [INFO]	[TRAIN] Epoch=2/64, Step=6/22, loss=0.010053, acc1=1.0, acc2=1.0, lr=0.025, time_each_step=6.49s, eta=3:0:29
2021-06-03 23:32:20 [INFO]	[TRAIN] Epoch=2/64, Step=8/22, loss=0.152919, acc1=0.96875, acc2=1.0, lr=0.025, time_each_step=6.44s, eta=3:0:15
2021-06-03 23:32:31 [INFO]	[TRAIN] Epoch=2/64, Step=10/22, loss=0.015185, acc1=1.0, acc2=1.0, lr=0.025, ti

100%|██████████| 3/3 [00:22<00:00,  7.58s/it]


2021-06-03 23:34:02 [INFO]	[EVAL] Finished, Epoch=2, acc1=0.975, acc2=1.0 .
2021-06-03 23:34:02 [INFO]	Model saved in output/mobilenetv3_small_ssld/best_model.
2021-06-03 23:34:02 [INFO]	Model saved in output/mobilenetv3_small_ssld/epoch_2.
2021-06-03 23:34:02 [INFO]	Current evaluated best model in eval_dataset is epoch_2, acc1=0.975
2021-06-03 23:34:22 [INFO]	[TRAIN] Epoch=3/64, Step=2/22, loss=0.04672, acc1=1.0, acc2=1.0, lr=0.025, time_each_step=6.1s, eta=2:43:32
2021-06-03 23:34:33 [INFO]	[TRAIN] Epoch=3/64, Step=4/22, loss=0.068644, acc1=0.96875, acc2=1.0, lr=0.025, time_each_step=6.09s, eta=2:43:20
2021-06-03 23:34:45 [INFO]	[TRAIN] Epoch=3/64, Step=6/22, loss=0.016702, acc1=1.0, acc2=1.0, lr=0.025, time_each_step=6.09s, eta=2:43:7
2021-06-03 23:34:56 [INFO]	[TRAIN] Epoch=3/64, Step=8/22, loss=0.020491, acc1=1.0, acc2=1.0, lr=0.025, time_each_step=6.07s, eta=2:42:55
2021-06-03 23:35:08 [INFO]	[TRAIN] Epoch=3/64, Step=10/22, loss=0.006912, acc1=1.0, acc2=1.0, lr=0.025, time_each_s

100%|██████████| 3/3 [00:26<00:00,  8.80s/it]


2021-06-03 23:36:40 [INFO]	[EVAL] Finished, Epoch=3, acc1=0.9875, acc2=1.0 .
2021-06-03 23:36:40 [INFO]	Model saved in output/mobilenetv3_small_ssld/best_model.
2021-06-03 23:36:41 [INFO]	Model saved in output/mobilenetv3_small_ssld/epoch_3.
2021-06-03 23:36:41 [INFO]	Current evaluated best model in eval_dataset is epoch_3, acc1=0.9875


# 五、开始预测

In [1]:
# 设置使用0号GPU卡（如无GPU，执行此代码后仍然会使用CPU训练模型）
import matplotlib
matplotlib.use('Agg') 
import os
os.environ['CUDA_VISIBLE_DEVICES'] = '0'
import paddlex as pdx

ModuleNotFoundError: No module named 'paddlex'

In [23]:
# 预测数据集val_list
val_list=[]
for i in range(1,401,1):
# for i in range(1,201,1):
    filename='T'+ str(i).zfill(4)+'.jpg'
    # print(filename)
    val_list.append(filename+'\n')

with open('val_list.txt','w') as f:
    f.writelines(val_list)
    
val_list=[]
with open('val_list.txt', 'r') as f:
    for line in f:
        line='dataset/PALM-Testing400-Images/'+line
        val_list.append(line.split('\n')[0])
        # print(line.split('\n')[0])
# print(val_list)

In [24]:
print(len(val_list))

In [25]:
import paddlex as pdx

result_list=[]
model = pdx.load_model('output/mobilenetv3_small_ssld/best_model')
for image_name in val_list:
    result = model.predict(image_name, topk=2)
    result_list.append(result)
    print("Predict Result:", result)

## 构造pandas dataframe

In [27]:
# 结果列
pd_B=[]
for item in result_list:
    # print(item)
    if item[0]['category_id']==1:
        pd_B.append(item[0]['score'])
    else:
        pd_B.append(item[1]['score'])

# 文件名列
pd_A=[]
with open('val_list.txt', 'r') as f:
    for line in f:
        pd_A.append(line.split('\n')[0])
        # print(line.split('\n')[0])

import pandas as pd
df= pd.DataFrame({'FileName': pd_A, 'PM Risk':pd_B})

# 保存为提交文件
df.to_csv("Classification_Results.csv", index=None)

## 打压缩包下载提交

In [None]:
!zip -q Classification_Results.zip Classification_Results.csv