In [None]:
# 查看当前挂载的数据集目录, 该目录下的变更重启环境后会自动还原
# View dataset directory. 
# This directory will be recovered automatically after resetting environment. 
!ls /home/aistudio/data

In [None]:
# 查看工作区文件, 该目录下的变更将会持久保存. 请及时清理不必要的文件, 避免加载过慢.
# View personal work directory. 
# All changes under this directory will be kept even after reset. 
# Please clean unnecessary files in time to speed up environment loading. 
!ls /home/aistudio/work

In [None]:
# 如果需要进行持久化安装, 需要使用持久化路径, 如下方代码示例:
# If a persistence installation is required, 
# you need to use the persistence path as the following: 
!mkdir /home/aistudio/external-libraries
!pip install beautifulsoup4 -t /home/aistudio/external-libraries

In [None]:
# 同时添加如下代码, 这样每次环境(kernel)启动的时候只要运行下方代码即可: 
# Also add the following code, 
# so that every time the environment (kernel) starts, 
# just run the following code: 
import sys 
sys.path.append('/home/aistudio/external-libraries')

# 主要代码及其说明
## 通过paddleseg实现
步骤：

1.对于数据进行预处理，形成可被paddleseg读取的label图片

2.划分数据集，根据图片目录生成txt文件

3.配置相关文件 使用setr与dnlnet模型 分别对数据进行训练

4.融合模型进行推理

5.压缩预测图片

### 数据准备及预处理部分

In [2]:
#解压数据集
!unzip -qo data/data102901/train_50k_mask.zip -d data/
!unzip -oq data/data102901/B榜测试数据集.zip -d data/
!unzip -oq data/data102901/train_image.zip -d data/
!unzip -oq /home/aistudio/data/data102949/model.zip
!unzip -oq /home/aistudio/PaddleSeg.zip

In [3]:
import sys
import paddle
import numpy as np
import os
import matplotlib.pyplot as plt
from PIL import Image
from tqdm import tqdm
import random

In [4]:
#对label图片进行做图像二值化处理 
#PaddleSeg采用单通道的标注图片，每一种像素值代表一种类别，像素标注类别需要从0开始递增
import cv2
import matplotlib.pyplot as plt
for filename in os.listdir("data/train_50k_mask"):
    print(filename)
    k=os.path.join("data/train_50k_mask",filename)
    for filenamel in os.listdir(k):
        kt=os.path.join(k,filenamel)
        img = cv2.imread(kt)
        img_gray = cv2.cvtColor(img, cv2.COLOR_BGR2GRAY)
        difference = (img_gray.max() - img_gray.min()) // 2
        _, img_binary = cv2.threshold(img_gray, difference, 1, cv2.THRESH_BINARY)
        cv2.imwrite(kt, img_binary)
plt.imshow(img_binary)
plt.show()

In [None]:
import sys
import paddle
import numpy as np
import os
import matplotlib.pyplot as plt
from PIL import Image
import random
#读入数据目录，拼接images与labels
datas_list=[]
for filename in os.listdir("data/train_image"):
    k=os.path.join("data/train_image",filename)
    kt=os.path.join("data/train_50k_mask",filename)
    for filenamel in os.listdir(k):
        datas_list.append([os.path.join(k,filenamel),os.path.join(kt,filenamel)])
#打乱数据
random.shuffle(datas_list)
#划分数据集
train_list= datas_list[ : int(len(datas_list)*0.9)]
test_list = datas_list[int(len(datas_list)*0.9):]
print(len(train_list))
print(len(test_list))
#生成txt文件
with open('train.txt',"a+") as train_file:
    for k in range(len(train_list)):
        train_file.write("%s %s\n"%(train_list[k][0],train_list[k][1]))
with open('val.txt',"a+") as train_file:
    for k in range(len(test_list)):
        train_file.write("%s %s\n"%(test_list[k][0],test_list[k][1]))
import glob
path = glob.glob('data/test_image/*')
f = open('test.txt', 'w')
for i in path:
    f.write(i+'\n')
f.close()
%cd ~/

### 训练部分

#### 数据增强
使用了paddleseg内置的RandomHorizontalFlip、RandomVerticalFlip、RandomDistort、Resize等方式对数据进行处理

#### 使用setr模型 以下为setr模型的配置文件
```
batch_size: 32
iters: 40000

train_dataset:
  type: Dataset
  dataset_root: /home/aistudio
  train_path: /home/aistudio/train.txt
  num_classes: 2
  transforms:
    - type: RandomHorizontalFlip
    - type: RandomVerticalFlip
    - type: RandomDistort
      brightness_range: 0.4
      contrast_range: 0.4
      saturation_range: 0.4
    - type: Resize
      target_size: [256, 256]
    - type: Normalize
  mode: train

val_dataset:
  type: Dataset
  dataset_root: /home/aistudio
  val_path: /home/aistudio/val.txt
  num_classes: 2
  transforms:
    - type: Resize
      target_size: [256, 256]
    - type: Normalize
  mode: val


model:
  type: SegmentationTransformer
  backbone:
    type: ViT_large_patch16_384
    pretrained: https://bj.bcebos.com/paddleseg/dygraph/vit_large_patch16_384.tar.gz
  num_classes: 2
  backbone_indices: [9, 14, 19, 23]
  head: pup
  align_corners: True

optimizer:
  type: sgd
  momentum: 0.9
  weight_decay: 4.0e-5

lr_scheduler:
  type: PolynomialDecay
  learning_rate: 0.01
  end_lr: 1.0e-4
  power: 0.9
    




loss:
  types:
    - type: CrossEntropyLoss
    - type: CrossEntropyLoss
    - type: CrossEntropyLoss
    - type: CrossEntropyLoss
    - type: CrossEntropyLoss
  coef: [1, 0.4, 0.4, 0.4, 0.4]
```

In [None]:
#开始训练
!python PaddleSeg/train.py --config setr.yaml --do_eval --use_vdl --save_dir /home/aistudio/output_setr --save_interval 2000

#### 使用DNLNet模型 以下为DNLNet模型的配置文件
```
batch_size: 32
iters: 40000

train_dataset:
  type: Dataset
  dataset_root: /home/aistudio
  train_path: /home/aistudio/train.txt
  num_classes: 2
  transforms:
    - type: RandomHorizontalFlip
    - type: RandomVerticalFlip
    - type: RandomDistort
      brightness_range: 0.4
      contrast_range: 0.4
      saturation_range: 0.4
    - type: Resize
      target_size: [256, 256]
    - type: Normalize
  mode: train

val_dataset:
  type: Dataset
  dataset_root: /home/aistudio
  val_path: /home/aistudio/val.txt
  num_classes: 2
  transforms:
    - type: Resize
      target_size: [256, 256]
    - type: Normalize
  mode: val


model:
  type: DNLNet
  backbone:
    type: ResNet50_vd
    output_stride: 8
    pretrained: https://bj.bcebos.com/paddleseg/dygraph/resnet50_vd_ssld_v2.tar.gz

optimizer:
  type: sgd
  momentum: 0.9
  weight_decay: 4.0e-5

lr_scheduler:
  type: PolynomialDecay
  learning_rate: 0.01
  end_lr: 0
  power: 0.9
    



loss:
  types:
    - type: CrossEntropyLoss
    - type: CrossEntropyLoss
  coef: [1, 0.4]
```

In [7]:
#开始训练
!python PaddleSeg/train.py --config DNLNet.yaml --do_eval --use_vdl --save_dir /home/aistudio/output_DNLNet_1 --save_interval 2000

#### 两个模型的参数保存与相应的save_dir 目录下，此处由于项目大小限制压缩于model数据集内

### 模型融合
#### 此处参考开源项目  [模型融合-进一步提升精度](https://aistudio.baidu.com/aistudio/projectdetail/1698818?channel=0&channelType=0&shared=1)

In [12]:
#开始验证
!python PaddleSeg/val.py --config_1 DNLNet.yaml --model_path_1 dnl.pdparams --config_2 setr.yaml  --model_path_2 ster.pdparams

2021-08-06 22:40:37 [INFO]	
---------------Config Information---------------
batch_size: 32
iters: 40000
loss:
  coef:
  - 1
  - 0.4
  types:
  - type: CrossEntropyLoss
  - type: CrossEntropyLoss
lr_scheduler:
  end_lr: 0
  learning_rate: 0.01
  power: 0.9
  type: PolynomialDecay
model:
  backbone:
    output_stride: 8
    pretrained: https://bj.bcebos.com/paddleseg/dygraph/resnet50_vd_ssld_v2.tar.gz
    type: ResNet50_vd
  type: DNLNet
optimizer:
  momentum: 0.9
  type: sgd
  weight_decay: 4.0e-05
train_dataset:
  dataset_root: /home/aistudio
  mode: train
  num_classes: 2
  train_path: /home/aistudio/train.txt
  transforms:
  - type: RandomHorizontalFlip
  - type: RandomVerticalFlip
  - brightness_range: 0.4
    contrast_range: 0.4
    saturation_range: 0.4
    type: RandomDistort
  - target_size:
    - 256
    - 256
    type: Resize
  - type: Normalize
  type: Dataset
val_dataset:
  dataset_root: /home/aistudio
  mode: val
  num_classes: 2
  transforms:
  - target_size:
    - 256
  

In [None]:
#开始预测
!python PaddleSeg/predict.py --config_1 DNLNet.yaml --model_path_1 dnl.pdparams --config_2 setr.yaml  --model_path_2 ster.pdparams --image_path data/test_image --save_dir output/result --aug_pred --flip_horizontal --flip_vertical

In [13]:
%cd PaddleSeg
!zip -r -oq /home/aistudio/PaddleSeg.zip ./

/home/aistudio/PaddleSeg


In [None]:
#压缩预测文件便于提交
%cd output/result/results
!zip -r -oq /home/aistudio/preddouble.zip ./
%cd /home/aistudio

请点击[此处](https://ai.baidu.com/docs#/AIStudio_Project_Notebook/a38e5576)查看本环境基本用法.  <br>
Please click [here ](https://ai.baidu.com/docs#/AIStudio_Project_Notebook/a38e5576) for more detailed instructions. 