Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

How to train the captioning module on ground truth proposals #47

Closed
adeljalalyousif opened this issue Mar 10, 2023 · 3 comments
Closed

Comments

@adeljalalyousif
Copy link

Hi Iashin, I need to train the captioning module on ground truth proposals. What should I do?

@v-iashin
Copy link
Owner

Hi adeljalalyousif

To train the captioning module on ground truth proposals, run the following:

# conda activate bmt
python main.py \
    --procedure train_cap \
    --B 32

@adeljalalyousif
Copy link
Author

adeljalalyousif commented Mar 10, 2023

Thanks for your response, but I got this error "FileNotFoundError: [Errno 2] No such file or directory './best_prop_model.pt' " :

{'B': 32,
'H': 4,
'N': 2,
'anchors_num_audio': 48,
'anchors_num_video': 128,
'audio_feature_name': 'vggish',
'audio_feature_timespan': 0.96,
'audio_features_path': './data/vggish_npy/',
'avail_mp4_path': './data/available_mp4.txt',
'betas': [0.9, 0.999],
'conv_layers_audio': [512, 512],
'conv_layers_video': [512, 512],
'd_aud': 128,
'd_ff_audio': None,
'd_ff_caps': None,
'd_ff_video': None,
'd_model': 1024,
'd_model_audio': None,
'd_model_caps': 300,
'd_model_video': None,
'd_vid': 1024,
'debug': False,
'device_ids': [0],
'dout_p': 0.1,
'early_stop_after': 30,
'end_token': '',
'epoch_num': 4,
'eps': 1e-08,
'feature_timespan_in_fps': 64,
'finetune_cap_encoder': False,
'finetune_prop_encoder': False,
'fps_at_extraction': 25,
'grad_clip': None,
'inf_B_coeff': 2,
'kernel_sizes_audio': [5, 13, 23, 35, 51, 69, 91, 121, 161, 211],
'kernel_sizes_video': [1, 5, 9, 13, 19, 25, 35, 45, 61, 79],
'layer_norm': False,
'log_dir': './log/',
'lr': 5e-05,
'lr_patience': None,
'lr_reduce_factor': None,
'max_len': 30,
'max_prop_per_vid': 100,
'min_freq_caps': 1,
'modality': 'audio_video',
'model': 'av_transformer',
'momentum': 0.0,
'nms_tiou_thresh': None,
'noobj_coeff': 100,
'obj_coeff': 1,
'one_by_one_starts_at': 1,
'optimizer': 'adam',
'pad_audio_feats_up_to': 800,
'pad_token': '',
'pad_video_feats_up_to': 300,
'pretrained_cap_model_path': './log/best_cap_model.pt',
'pretrained_prop_model_path': None,
'procedure': 'train_cap',
'prop_pred_path': './log/prop_results_val_1_e0_maxprop100.json',
'reference_paths': ['./data/val_1_no_missings.json',
'./data/val_2_no_missings.json'],
'scheduler': 'constant',
'smoothing': 0.7,
'start_token': '',
'tIoUs': [0.3, 0.5, 0.7, 0.9],
'to_log': True,
'train_json_path': './data/train.json',
'train_meta_path': './data/train.csv',
'unfreeze_word_emb': False,
'use_linear_embedder': False,
'val_1_meta_path': './data/val_1.csv',
'val_2_meta_path': './data/val_2.csv',
'val_prop_meta_path': None,
'video_feature_name': 'i3d',
'video_features_path': './data/i3d_25fps_stack64step64_2stream_npy/',
'weight_decay': 0,
'word_emb_caps': 'glove.840B.300d'}
Contructing caption_iterator for "train" phase
Contructing caption_iterator for "val_1" phase
Contructing caption_iterator for "val_2" phase
Using vanilla Generator
initialization: xavier
Glove emb of the same size as d_model_caps
Pretrained prop path:
./best_prop_model.pt
Traceback (most recent call last):
File "main.py", line 200, in
main(cfg)
File "main.py", line 11, in main
train_cap(cfg)
File "/media/adel/Data3/BMT_original/scripts/train_captioning_module.py", line 40, in train_cap
model = BiModalTransformer(cfg, train_dataset)
File "/media/adel/Data3/BMT_original/model/captioning_module.py", line 151, in init
cap_model_cpt = torch.load(cfg.pretrained_prop_model_path, map_location='cpu')
File "/home/adel/miniconda3/envs/tr_17/lib/python3.8/site-packages/torch/serialization.py", line 581, in load
with _open_file_like(f, 'rb') as opened_file:
File "/home/adel/miniconda3/envs/tr_17/lib/python3.8/site-packages/torch/serialization.py", line 230, in _open_file_like
return _open_file(name_or_buffer, mode)
File "/home/adel/miniconda3/envs/tr_17/lib/python3.8/site-packages/torch/serialization.py", line 211, in init
super(_open_file, self).init(open(name, mode))
FileNotFoundError: [Errno 2] No such file or directory: './best_prop_model.pt'

###########################################################################

I need to train the captioning module on ground truth proposals without using learned proposals

@adeljalalyousif
Copy link
Author

adeljalalyousif commented Mar 10, 2023

after downloading 'best_prop_model.pt' the training is work but on cpu, how to making training run on gpu, I have
(RTX-3060, 6G) I think my gpu RAM is insufficient .
So how to train the captioning module based on ground truth proposals without using learned proposals

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants