Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

get_dataloaders #4

Open
livneor207 opened this issue Nov 23, 2022 · 21 comments
Open

get_dataloaders #4

livneor207 opened this issue Nov 23, 2022 · 21 comments
Assignees
Labels
bug Something isn't working

Comments

@livneor207
Copy link

cannot load get_dataloaders

@fredO13
Copy link

fredO13 commented Feb 15, 2023

from ellipse_rcnn.utils.data import get_dataloaders

there is no such thing as 'get_dataloaders' in ellipse_rcnn.utils.data...

@wdoppenberg
Copy link
Owner

Hi,

I don't have much time besides work. This is ofcourse a glaring omission but happened because I worked with a dataset I could not publish. However, I'm currently (from time to time) working on the feature/fddb branch which will allow anyone to use Face Detection Data Set and Benchmark (FDDB) to train their own models.

@wdoppenberg wdoppenberg self-assigned this Feb 22, 2023
@wdoppenberg wdoppenberg added the bug Something isn't working label Feb 22, 2023
@fredO13
Copy link

fredO13 commented Feb 27, 2023

thanks ! I can't wait !

@wdoppenberg
Copy link
Owner

@fredO13 You can try out training if you checkout the feature/fddb branch. Let me know if it works for you

@fredO13
Copy link

fredO13 commented Feb 28, 2023

I'll do it and let you know if it works. Thanks

@fredO13
Copy link

fredO13 commented Mar 3, 2023

Hi @wdoppenberg,
I'm currently trying out the feature/fddb branch, but I get an error on line 27 of model.py :
weights: WeightsEnum | str = ResNet50_Weights.IMAGENET1K_V1
it says :

TypeError: unsupported operand type(s) for |: 'StrEnumMeta' and 'type'

@wdoppenberg
Copy link
Owner

@fredO13 Ah that might be because of the new type annotation syntax in Python 3.10. You can solve this by either removing the type annotation for now, or by upgrading your env to 3.10 or above

@fredO13
Copy link

fredO13 commented Mar 17, 2023 via email

@shree-exofield
Copy link

shree-exofield commented Mar 20, 2023 via email

@fredO13
Copy link

fredO13 commented Mar 27, 2023

Hi. I finally managed to have the training work on the FDDB dataset which is fine for me. Now I'd like to know, as in issue #1 how I could test it ? I also wonder if there could be an issue with the loss_ellipse not decreasing as stated in issue #5 (see bellow) :

Using 16bit native Automatic Mixed Precision (AMP)
GPU available: True, used: True
TPU available: False, using: 0 TPU cores
IPU available: False, using: 0 IPUs
HPU available: False, using: 0 HPUs
LOCAL_RANK: 0 - CUDA_VISIBLE_DEVICES: [0]

| Name | Type | Params

0 | model | EllipseRCNN | 55.2 M

55.2 M Trainable params
0 Non-trainable params
55.2 M Total params
110.394 Total estimated model params size (MB)
Epoch 9: 100%|█| 2845/2845 [50:59<00:00, 1.08s/it, loss=1.52, v_num=1, loss_classifier=6.48e-7, loss_box_reg=2.87e-7, loss_ellipse=1.440, loss_objectness=0.001, loss_rpn_box_reg=0.00287, total_loss=

thanks

@shree-exofield
Copy link

shree-exofield commented Mar 27, 2023

you can save the model in .pth format with the help of .ckpt checkpoints generated while training and use it further to test on test-images. Refer this file to save your model (https://github.com/wdoppenberg/crater-detection/blob/main/src/detection/save_run.py).

Refer this while testing (https://github.com/wdoppenberg/crater-detection/blob/main/src/detection/evaluate.py).Require little modifications to plot the detection on the images from dataset.

@shree-exofield
Copy link

Does loss is fluctuating for you and able to have convergence ?

@fredO13
Copy link

fredO13 commented Mar 27, 2023

you can save the model in .pth format with the help of .ckpt checkpoints generated while training and use it further to test on test-images. Refer this file to save your model (https://github.com/wdoppenberg/crater-detection/blob/main/src/detection/save_run.py).

Refer this while testing (https://github.com/wdoppenberg/crater-detection/blob/main/src/detection/evaluate.py).Require little modifications to plot the detection on the images from dataset.

This seems to be working for the ckpt to pth conversion part :
`
import torch
from ellipse_rcnn.core.model import EllipseRCNN
from ellipse_rcnn.core.model import EllipseRCNNLightning

model = EllipseRCNN()
lightning_model = EllipseRCNNLightning.load_from_checkpoint('path/to/checkpoint/epoch=9-step=28450.ckpt', model = model)
torch.save(model.state_dict(), './weights/weights.pth')
`

@fredO13
Copy link

fredO13 commented Mar 27, 2023

Epoch 0: 0%| | 1/2845 [00:12<10:02:08, 12.70s/it, loss=3.66, v_num=2, loss_classifier=1.350, loss_box_reg=0.0012, loss_ellipse=1.570, loss_objectness=0.732, loss_rpn_box_reg=0.00621, total_loss=3.660]
Epoch 0: 19%|█▏ | 543/2845 [02:14<09:31, 4.03it/s, loss=1.51, v_num=2, loss_classifier=0.00158, loss_box_reg=0.000372, loss_ellipse=1.420, loss_objectness=0.0888, loss_rpn_box_reg=0.0159, total_loss=1.530]
Epoch 0: 51%|██ | 1463/2845 [04:03<03:49, 6.02it/s, loss=1.59, v_num=2, loss_classifier=0.000769, loss_box_reg=3.26e-5, loss_ellipse=1.410, loss_objectness=0.0304, loss_rpn_box_reg=0.00325, total_loss=1.440]
Epoch 0: 80%|███▏| 2281/2845 [05:29<01:21, 6.93it/s, loss=1.74, v_num=2, loss_classifier=0.000193, loss_box_reg=1.47e-5, loss_ellipse=1.540, loss_objectness=0.0332, loss_rpn_box_reg=0.00556, total_loss=1.580]
Epoch 0: 100%|█████| 2845/2845 [06:30<00:00, 7.29it/s, loss=1.53, v_num=2, loss_classifier=2.28e-6, loss_box_reg=6.12e-6, loss_ellipse=1.450, loss_objectness=0.0207, loss_rpn_box_reg=0.00244, total_loss=1.480]

total_loss is moving up and down, I guess loss_ellipse is doing so...

@wdoppenberg
Copy link
Owner

wdoppenberg commented Mar 27, 2023

This is the same behaviour I see. What probably needs to happen is that the loss function for the ellipse prediction <-> target loss is either fixed or replaced. Something like Wasserstein distance could be considered.

Unfortunately I don't have time in the coming month, so feel free to give this a go. I will try to assist as much as I can.

@fredO13
Copy link

fredO13 commented Mar 29, 2023

I tried to change the ellipse loss function to "kullback-leibler", the loss remains at 1 at every iterration, but this is because the displacement term is infinite !

@fredO13
Copy link

fredO13 commented Mar 29, 2023

for this loss, it seems displacement_term is infinite because it overflows fp16.
I tried to train using fp32, I don't get infinite values anymore, but still the exp(-displacement) is too close to zero for the ellipse loss to decrease

@shree-exofield
Copy link

shree-exofield commented Mar 29, 2023

Do you getting the proper detection of only the boxes from faster_rcnn part ? for my case the other losses are getting very small but the saved trained model won't show the detection of boxes properly on the image. My intention is to get at least get those boxes correct at the initial phase, then look for the ellipse_loss . how the FASTER-RCNN can be trained first ? and other thing to be noted when I am NOT using the ellipse_loss it shows less fluctuation in the other losses comparatively.

for this loss, it seems displacement_term is infinite because it overflows fp16. I tried to train using fp32, I don't get infinite values anymore, but still the exp(-displacement) is too close to zero for the ellipse loss to decrease

@fredO13
Copy link

fredO13 commented Mar 29, 2023

all boxes are empty

@shree-exofield
Copy link

what you mean by that ?

all boxes are empty

@PoseZhaoyutao
Copy link

from ellipse_rcnn.utils.data import get_dataloaders

there is no such thing as 'get_dataloaders' in ellipse_rcnn.utils.data...

Hello, I have this error when downloading this project, it should be a blogger capacity problem, I hope you can pass me this h5 file with the consent of the blogger
This allows me to continue working on the related get-dataloaders function!

fetch: Fetching reference refs/heads/main
batch response: This repository is over its data quota. Account responsible for LFS bandwidth should purchase more data packs to restore access.
error: failed to fetch some objects from 'https://github.com/wdoppenberg/ellipse-rcnn.git/info/lfs'

Thanks!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

When branches are created from issues, their pull requests are automatically linked.

5 participants