Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

How to modify the activation function? #3013

Closed
zxsitu opened this issue May 2, 2021 · 37 comments
Closed

How to modify the activation function? #3013

zxsitu opened this issue May 2, 2021 · 37 comments
Labels
question Further information is requested

Comments

@zxsitu
Copy link

zxsitu commented May 2, 2021

❔Question

Hello author, I have seen that there are new activation functions added to the program, but I'm not quite sure if I've modified the code correctly, and I'd like you to give me some advice.

Additional context

I see that you have given the prepared lines of code under this question, but I am a bit confused:

yolov5/models/common.py

Lines 34 to 51 in c9c95fb

class Conv(nn.Module):
# Standard convolution
def __init__(self, c1, c2, k=1, s=1, p=None, g=1, act=True): # ch_in, ch_out, kernel, stride, padding, groups
super(Conv, self).__init__()
self.conv = nn.Conv2d(c1, c2, k, s, autopad(k, p), groups=g, bias=False)
self.bn = nn.BatchNorm2d(c2)
# self.act = nn.Identity() if act is True else (act if isinstance(act, nn.Module) else nn.Identity())
# self.act = nn.Tanh() if act is True else (act if isinstance(act, nn.Module) else nn.Identity())
# self.act = nn.Sigmoid() if act is True else (act if isinstance(act, nn.Module) else nn.Identity())
# self.act = nn.ReLU() if act is True else (act if isinstance(act, nn.Module) else nn.Identity())
# self.act = nn.LeakyReLU(0.1) if act is True else (act if isinstance(act, nn.Module) else nn.Identity())
# self.act = nn.Hardswish() if act is True else (act if isinstance(act, nn.Module) else nn.Identity())
# self.act = nn.SiLU() if act is True else (act if isinstance(act, nn.Module) else nn.Identity())
# self.act = Mish() if act is True else (act if isinstance(act, nn.Module) else nn.Identity())
# self.act = AconC() if act is True else (act if isinstance(act, nn.Module) else nn.Identity())
# self.act = MetaAconC() if act is True else (act if isinstance(act, nn.Module) else nn.Identity())
# self.act = SiLU_beta() if act is True else (act if isinstance(act, nn.Module) else nn.Identity())
self.act = MetaAconC(c2) if act is True else (act if isinstance(act, nn.Module) else nn.Identity())

  1. How should FRelu be added to it? What is the format?
    self.act = nn.FRelu() if act is True else (act if isinstance(act, nn.Module) else nn.Identity()) or
    self.act = FRelu() if act is True else (act if isinstance(act, nn.Module) else nn.Identity())
  2. Why some activation functions start with nn.xxx, while some directly start with the name of the activation function? Should I use the former or the latter?
@zxsitu zxsitu added the question Further information is requested label May 2, 2021
@glenn-jocher
Copy link
Member

@ilem777 see Conv() module in activations study branch for example implementations of alternative activation functions:

yolov5/models/common.py

Lines 34 to 57 in 0824388

class Conv(nn.Module):
# Standard convolution
def __init__(self, c1, c2, k=1, s=1, p=None, g=1, act=True): # ch_in, ch_out, kernel, stride, padding, groups
super(Conv, self).__init__()
self.conv = nn.Conv2d(c1, c2, k, s, autopad(k, p), groups=g, bias=False)
self.bn = nn.BatchNorm2d(c2)
# self.act = nn.Identity() if act is True else (act if isinstance(act, nn.Module) else nn.Identity())
# self.act = nn.Tanh() if act is True else (act if isinstance(act, nn.Module) else nn.Identity())
# self.act = nn.Sigmoid() if act is True else (act if isinstance(act, nn.Module) else nn.Identity())
# self.act = nn.ReLU() if act is True else (act if isinstance(act, nn.Module) else nn.Identity())
# self.act = nn.LeakyReLU(0.1) if act is True else (act if isinstance(act, nn.Module) else nn.Identity())
# self.act = nn.Hardswish() if act is True else (act if isinstance(act, nn.Module) else nn.Identity())
# self.act = nn.SiLU() if act is True else (act if isinstance(act, nn.Module) else nn.Identity())
# self.act = Mish() if act is True else (act if isinstance(act, nn.Module) else nn.Identity())
# self.act = FReLU(c2) if act is True else (act if isinstance(act, nn.Module) else nn.Identity())
# self.act = AconC(c2) if act is True else (act if isinstance(act, nn.Module) else nn.Identity())
# self.act = MetaAconC(c2) if act is True else (act if isinstance(act, nn.Module) else nn.Identity())
# self.act = SiLU_beta(c2) if act is True else (act if isinstance(act, nn.Module) else nn.Identity())
self.act = FReLU_noBN_biasFalse(c2) if act is True else (act if isinstance(act, nn.Module) else nn.Identity())
# self.act = FReLU_noBN_biasTrue(c2) if act is True else (act if isinstance(act, nn.Module) else nn.Identity())
def forward(self, x):
return self.act(self.bn(self.conv(x)))

@zxsitu zxsitu closed this as completed May 6, 2021
@Guemann-ui
Copy link

Hi, @glenn-jocher why did you replace the Relu activation function with the sigmoid function in the last version? I'm really curious to understand the results that provides you ( I don't have the time to try all the parameters that's why I'm asking XD)
Thanks.

@glenn-jocher
Copy link
Member

@besmaGuesmi architecture updates are typically informed by empirical results of experiments and studies we run. You can see our Activations Study at https://wandb.ai/glenn-jocher/activations, and a discussion at #2891

@Guemann-ui
Copy link

@glenn-jocher what I've seen is that the FReLU provides the best result! why didn't you choose it?

@glenn-jocher
Copy link
Member

@besmaGuesmi FReLU may be suitable for smaller models like 5n and 5s, but it adds too many operations to larger models and causes earlier overfitting. It also requires substantially increased resources like CUDA memory, which is not compatible with our goal of good results on consumer hardware using less resources.

@Guemann-ui
Copy link

Understood, could you please tell me how can I change the activation function in the model, I really like to use the FReLU function instead of SiLU because I used the Yolov5n model.

Thanks.

@Guemann-ui
Copy link

Guemann-ui commented Nov 8, 2021

Is it enough to change it only in the common.py and expriment.py files?

@glenn-jocher
Copy link
Member

@besmaGuesmi activations are defined in one place for all official YOLOv5 models:

self.act = nn.SiLU() if act is True else (act if isinstance(act, nn.Module) else nn.Identity())

@Guemann-ui
Copy link

Guemann-ui commented Nov 8, 2021

@glenn-jocher I tried the FReLU function as below

`
class FReLU(nn.Module):
    def __init__(self, c1, k=3):  # ch_in, kernel
        super().__init()__()
        self.conv = nn.Conv2d(c1, c1, k, 1, 1, groups=c1)
        self.bn = nn.BatchNorm2d(c1)

    @staticmethod
    def forward(self, x):
        return torch.max(x, self.bn(self.conv(x)))

class Conv(nn.Module):
    # Standard convolution
    def __init__(self, c1, c2, k=1, s=1, p=None, g=1, act=True):  # ch_in, ch_out, kernel, stride, padding, groups
        super().__init__()
        self.conv = nn.Conv2d(c1, c2, k, s, autopad(k, p), groups=g, bias=False)
        self.bn = nn.BatchNorm2d(c2)
        self.act = nn.FReLU() if act is True else (act if isinstance(act, nn.Module) else nn.Identity())`

But there is an error in the training! are there any mistakes in the implementation above? Thanks

@glenn-jocher
Copy link
Member

@besmaGuesmi see #3013 (comment)

@Guemann-ui
Copy link

Hi @glenn-jocher, When I tried to change the Conv class, there are some issues because of the same files, what I understood is I have to change some files content according to https://github.com/ultralytics/yolov5/tree/0824388b9e1afb5a888ce4c302acfe2ad3da8101, but is there any other method to use directly FReLU without the need to change files like general.py, utils.py, etc.

Thanks

@glenn-jocher
Copy link
Member

@besmaGuesmi the only file you need to update is common.py, you just import and use FReLU as in #3013 (comment)

@Guemann-ui
Copy link

Guemann-ui commented Nov 9, 2021

@glenn-jocher yes I did exactly the same work but I faced an error in the training, do I have to remove the forward_fuse function from the Conv?
image

@glenn-jocher
Copy link
Member

@besmaGuesmi your python indentations are incorrect. This is unrelated to YOLOv5. You may want to take a beginner's python course first to learn the basics.

@Guemann-ui
Copy link

sorry, I uploaded the wrong screenshot, I talked about this one, what I understand is that when I cloned yolov5, the version is not updated so we have to change other python files (activation.py, general.py ...) in addition, when I checked the common.py it also missed the activation function importation (FReLU, FReLU_noBN_biasFalse, FReLU_noBN_biasTrue..) the common.py, the activation.py, etc I have obtained after cloned are not the same here:https://github.com/ultralytics/yolov5/tree/0824388b9e1afb5a888ce4c302acfe2ad3da8101/models. Did you understand what I mean by my first question?
image

@Guemann-ui
Copy link

Solved by removing the other activation function, thanks

@ppogg
Copy link

ppogg commented Dec 4, 2021

Sir, I will experiment with the new activation function, as well as a lighter backbone, etc. If there is progress, I will let you know~

@fanghua2021
Copy link

Hello, I used silu, h-swish, and leaky relu respectively in yolov5s for experiments. The results show: (1) map: h-swish>swish>leaky relu; (2) FPS: silu=leaky relu> h- swish>.

I have a question that h-swish does not use an exponential function, shouldn't the speed be greater than silu? In addition, I see that FReLU’s map is the best, so is his speed the best in yolov5s?

image

@glenn-jocher
Copy link
Member

@fanghua2021 architecture updates are typically informed by empirical results of experiments and studies we run. You can see our Activations Study with YOLOv5s on COCO for 300 epochs at https://wandb.ai/glenn-jocher/activations, and a discussion at #2891

Results may vary by dataset and model.

@hellodennis4
Copy link

Hi @glenn-jocher,Dy-relu,how do you think?it might be work better.

@glenn-jocher
Copy link
Member

@hellodennis4 you can see our Activations Study with YOLOv5s on COCO for 300 epochs at https://wandb.ai/glenn-jocher/activations, and a discussion at #2891

Results may vary by dataset and model.

@XhHello
Copy link

XhHello commented Dec 14, 2021

Hello, I used silu, h-swish, and leaky relu respectively in yolov5s for experiments. The results show: (1) map: h-swish>swish>leaky relu; (2) FPS: silu=leaky relu> h- swish>.

I have a question that h-swish does not use an exponential function, shouldn't the speed be greater than silu? In addition, I see that FReLU’s map is the best, so is his speed the best in yolov5s? image

Friend, how did you draw this curve

@marziyemahmoudifar
Copy link

@ilem777 see Conv() module in activations study branch for example implementations of alternative activation functions:

yolov5/models/common.py

Lines 34 to 57 in 0824388

class Conv(nn.Module):
# Standard convolution
def __init__(self, c1, c2, k=1, s=1, p=None, g=1, act=True): # ch_in, ch_out, kernel, stride, padding, groups
super(Conv, self).__init__()
self.conv = nn.Conv2d(c1, c2, k, s, autopad(k, p), groups=g, bias=False)
self.bn = nn.BatchNorm2d(c2)
# self.act = nn.Identity() if act is True else (act if isinstance(act, nn.Module) else nn.Identity())
# self.act = nn.Tanh() if act is True else (act if isinstance(act, nn.Module) else nn.Identity())
# self.act = nn.Sigmoid() if act is True else (act if isinstance(act, nn.Module) else nn.Identity())
# self.act = nn.ReLU() if act is True else (act if isinstance(act, nn.Module) else nn.Identity())
# self.act = nn.LeakyReLU(0.1) if act is True else (act if isinstance(act, nn.Module) else nn.Identity())
# self.act = nn.Hardswish() if act is True else (act if isinstance(act, nn.Module) else nn.Identity())
# self.act = nn.SiLU() if act is True else (act if isinstance(act, nn.Module) else nn.Identity())
# self.act = Mish() if act is True else (act if isinstance(act, nn.Module) else nn.Identity())
# self.act = FReLU(c2) if act is True else (act if isinstance(act, nn.Module) else nn.Identity())
# self.act = AconC(c2) if act is True else (act if isinstance(act, nn.Module) else nn.Identity())
# self.act = MetaAconC(c2) if act is True else (act if isinstance(act, nn.Module) else nn.Identity())
# self.act = SiLU_beta(c2) if act is True else (act if isinstance(act, nn.Module) else nn.Identity())
self.act = FReLU_noBN_biasFalse(c2) if act is True else (act if isinstance(act, nn.Module) else nn.Identity())
# self.act = FReLU_noBN_biasTrue(c2) if act is True else (act if isinstance(act, nn.Module) else nn.Identity())
def forward(self, x):
return self.act(self.bn(self.conv(x)))

How do I add a new ELU activation function to yolov5 and use it?

@glenn-jocher
Copy link
Member

@marziyemahmoudifar you can simply replace the default nn.SiLU() activation here on models.py L44 with another one of your design. This will affect all activations in the whole YOLOv5 model:

yolov5/models/common.py

Lines 38 to 45 in 2e57b84

class Conv(nn.Module):
# Standard convolution
def __init__(self, c1, c2, k=1, s=1, p=None, g=1, act=True): # ch_in, ch_out, kernel, stride, padding, groups
super().__init__()
self.conv = nn.Conv2d(c1, c2, k, s, autopad(k, p), groups=g, bias=False)
self.bn = nn.BatchNorm2d(c2)
self.act = nn.SiLU() if act is True else (act if isinstance(act, nn.Module) else nn.Identity())

@bzha5848
Copy link

Hi, I think it still doesn't work for FReLU, when I follow the comments and modify it the colab still says no module called FReLU

@bzha5848
Copy link

image

@glenn-jocher
Copy link
Member

@bzha5848 you can import FReLU from utils.activations

@ZhixiongSun
Copy link

@glenn-jocher Hi I have seen the result you mentioned here, https://wandb.ai/glenn-jocher/activations. For FRelu-noBN-BiasTrue and FRelu-noBN-BiasFalse, it semms like you early stop it. Could you please tell me why you did this or are these two type of activations(FRelu-noBN-BiasTrue and FRelu-noBN-BiasFalse) better than original FRelu

@glenn-jocher
Copy link
Member

@ZhixiongSun I don’t remember exactly, but typical early stopping reasons may be excess resource usage, ie CUDA memory, or slow training speed.

You should be able to reproduce the runs yourself using the commands shown in the wandb logs.

@passerbythesun
Copy link
Contributor

Give a DP about FReLU:
I've tested FReLU(blue line) in our custom dataset, just as describe aboved, the result is not good. The training procedure stopped at epoch 145, since "no improvement observed in last 100 epochs".
image

@glenn-jocher
Copy link
Member

@passerbythesun please keep in mind that the performance of activation functions can vary depending on the dataset and model architecture. While the FReLU activation function may yield promising results in some scenarios, it appears to have limited efficacy in your custom dataset, as indicated by the training procedure stopping at epoch 145 due to no observed improvement in the last 100 epochs. It is recommended to explore other activation functions or further tune the model parameters to achieve better performance on your dataset.

@IlamSaran
Copy link

Can LeakyReLU be used in YOLOv5x version and how far it will help improve the model (v5x) performance

@glenn-jocher
Copy link
Member

@IlamSaran LeakyReLU can be used in YOLOv5x and has been shown to improve performance in certain cases, particularly in addressing the vanishing gradient problem. However, the extent of performance improvement may vary depending on the specific dataset and model architecture. It is recommended to experiment with different activation functions and evaluate their impact on model performance to determine the most effective configuration for your use case.

@IlamSaran
Copy link

HI!
Could you please clarify the difference between Validation set and Test set in training an Deep learning model (Train, Valid???, Test???). And what is testing of unlabeled set of data / unseen set of data that contains similar classes on the custom trained model?

@glenn-jocher
Copy link
Member

@IlamSaran The validation set is used during training for model selection and hyperparameter tuning, while the test set is reserved for final evaluation of the trained model's performance. Testing on an unlabeled set of data, containing similar classes to those in the training set, helps assess the model's generalization capabilities and its ability to make accurate predictions on previously unseen examples.

@IlamSaran
Copy link

My DL model for object detection task results with mAP@0.5 = 90% and mAP@0.5:0.95 =78% on my custom created dataset.
But the same model results with 96% for mAP@0.5 and only 55% mAP@0.5:0.95 for a public benchmark dataset. Note (both dataset contain same classes). Though mAP@0.5 is greater for public dataset, mAP@0.5:0.95 is COMPARITBELY VERY LESS COMPARED TO CUSTOM dataset. Please clarify and can i conclude that my model works better on my custom dataset.

@glenn-jocher
Copy link
Member

@IlamSaran Your model's performance as measured by mAP@0.5 is indeed higher on the public benchmark dataset, indicating good detection at a specific IoU threshold of 0.5. However, the lower mAP@0.5:0.95 score suggests that the model's performance across a range of IoU thresholds from 0.5 to 0.95 is not as robust on the public dataset compared to your custom dataset.

The mAP@0.5:0.95 metric provides a more comprehensive assessment of detection performance across various IoU levels, reflecting both localization and detection accuracy. The comparative decrease in this metric for the public dataset suggests that while the model detects objects well at a lower IoU threshold, it struggles with precise localization at higher IoU thresholds.

In conclusion, your model does appear to work better on your custom dataset when considering the overall detection and localization performance (mAP@0.5:0.95). However, it's also important to consider dataset size, diversity, and difficulty when comparing these metrics.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
question Further information is requested
Projects
None yet
Development

No branches or pull requests