Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Encoder-Decoder downsample 1/8x, is too coarse to produce 'seg_logits' #35

Open
chenxinfeng4 opened this issue Nov 17, 2022 · 3 comments

Comments

@chenxinfeng4
Copy link

I have the original input feature 830(H)x1280(W), but find seg_logits is downsampled to 1024(Channel)x104(H)x160(W) feature map in ham_head. It's too coarse.

你可以看到 在分割时,动物的边界不是很清晰。这可能是降采太多导致。希望提供指导。

Plus, you will see too many background occuping the image, which is hard to optimize the other class-segmentation. How to optimize the model to overcome this issue. For example the class_weight?

image

# tools/dist_train.sh segnext.large.ratmetric.py 4
# python tools/train.py segnext.large.ratmetric.py
_base_ = [
    'local_configs/segnext/large/segnext.large.512x512.coco_stuff164k.80k.py'
]

num_classes = 3
# load_from = None
load_from = 'work_dirs/segnext.large.ratmetric/latest.pth'

model = dict(
    backbone=dict(init_cfg=dict(type='Pretrained', checkpoint='pretrained/segnext_large_512x512_ade_160k.pth')),
    decode_head=dict(
        num_classes=num_classes,
        loss_decode=dict(
            type='CrossEntropyLoss', use_sigmoid=False, class_weight=[1.0/50, 1.0, 1.0], loss_weight=1.0))
)

runner = dict(type='IterBasedRunner', max_iters=6400)
checkpoint_config = dict(by_epoch=False, interval=800)
evaluation = dict(interval=800, metric='mIoU')

data_root = 'data_rat_metric'
img_dir='images'
ann_dir='annotations'
img_wh = (1280,832)
train_pipeline = [
    dict(type='LoadImageFromFile'),
    dict(type='LoadAnnotations', reduce_zero_label=True),
    dict(type='Resize', img_scale=img_wh, ratio_range=(0.7, 1.5)),
    dict(type='RandomCrop', crop_size=img_wh[::-1], cat_max_ratio=1.0, ignore_index=0),
    dict(type='RandomFlip', prob=0.5),
    dict(type='PhotoMetricDistortion'),
    dict(
        type='Normalize',
        mean=[123.675, 116.28, 103.53],
        std=[58.395, 57.12, 57.375],
        to_rgb=True),
    dict(type='Pad', size=img_wh[::-1], pad_val=0, seg_pad_val=0),
    # dict(type='Pad', size_divisor=32),
    dict(type='DefaultFormatBundle'),
    dict(type='Collect', keys=['img', 'gt_semantic_seg'])
]

test_pipeline = [
    dict(type='LoadImageFromFile'),
    dict(
        type='MultiScaleFlipAug',
        img_scale=img_wh,
        flip=False,
        transforms=[
            dict(type='Resize', keep_ratio=True),
            dict(type='RandomFlip'),
            dict(
                type='Normalize',
                mean=[123.675, 116.28, 103.53],
                std=[58.395, 57.12, 57.375],
                to_rgb=True),
            dict(type='ImageToTensor', keys=['img']),
            dict(type='Collect', keys=['img'])
        ])
]

data = dict(
    samples_per_gpu=2,
    workers_per_gpu=4,
    train=dict(
        type='COCOStuffDatasetRat',
        data_root=data_root,
        img_dir=img_dir,
        ann_dir=ann_dir,
        pipeline=train_pipeline),
    val=dict(
        type='COCOStuffDatasetRat',
        data_root=data_root,
        img_dir=img_dir,
        ann_dir=ann_dir,
        pipeline=test_pipeline),
    test=dict(
        type='COCOStuffDatasetRat',
        data_root=data_root,
        img_dir=img_dir,
        ann_dir=ann_dir,
        pipeline=test_pipeline))
@uyzhang
Copy link
Collaborator

uyzhang commented Nov 17, 2022

You can use class_weight to solve the sample imbalance problem, or use OHEM. And you can change in_index by use [0,1,2,3]

@chenxinfeng4
Copy link
Author

OK, i see the imbalance solution. And how to change the feature map size in ham_head? I think it's too coarse.

@wzp8023391
Copy link

This question I also meet, a simple way to solve this is to modify the source on the downsampling location,

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants