I want to train with my own dataset #64

anewusername77 · 2021-05-28T03:17:08Z

its image datset without labels, should i create it like imagenet-style datasets? i mean images of different labels in different folders

anewusername77 · 2021-05-28T03:18:48Z

but i don't have labels

wvangansbeke · 2021-05-31T08:37:28Z

Hi @scarletteshu,

Thank you for your interest.

Yes. You need to write your own dataset (e.g. data/cifar.py).
Please refer to the following issues: #8, #19, #34. They might be useful.
Also, since you don't have labels available, you will have to remove the evaluation code.

anewusername77 · 2021-06-01T06:44:25Z

Hi @scarletteshu,

Thank you for your interest.

Yes. You need to write your own dataset (e.g. data/cifar.py).
Please refer to the following issues: #8, #19, #34. They might be useful.
Also, since you don't have labels available, you will have to remove the evaluation code.

thanks a lot! i'm new to this, i'll ask you again if i got any more problems. thanks again~

anewusername77 · 2021-06-02T03:19:04Z

dear author,
my new questions are as follows:

question one:
in dataset file, such as cifar.py, if i change self.targets=[]and self.classes=[] to a constant value(targets=[[0],[0],...], self.classes=['01', '02', ...]), will it influences the training?
since the running code need these values but i don't have ground truth labels, can't just remove them.
can i just remain evaluation part? since evalutation should not change model states and final results
question two:
when i remove evaluation code:
in moco.py:

    # Mine the topk nearest neighbors (Validation)
    # These will be used for validation.
    '''
    topk = 5
    print(colored('Mine the nearest neighbors (Val)(Top-%d)' %(topk), 'blue'))
    fill_memory_bank(val_dataloader, model, memory_bank_val)
    print('Mine the neighbors')
    indices, acc = memory_bank_val.mine_nearest_neighbors(topk)
    print('Accuracy of top-%d nearest neighbors on val set is %.2f' %(topk, 100*acc))
    np.save(p['topk_neighbors_val_path'], indices)
    '''

then there will be no topk_neighbors_val file
but in scan.py:

   # Evaluate 
   print('Make prediction on validation set ...')
   predictions = get_predictions(p, val_dataloader, model)

   print('Evaluate based on SCAN loss ...')
   scan_stats = scan_evaluate(predictions)
   print(scan_stats)
   lowest_loss_head = scan_stats['lowest_loss_head']
   lowest_loss = scan_stats['lowest_loss']
       
   if lowest_loss < best_loss:
            print('New lowest loss on validation set: %.4f -> %.4f' %(best_loss, lowest_loss))
            print('Lowest loss head is %d' %(lowest_loss_head))
            best_loss = lowest_loss
            best_loss_head = lowest_loss_head
            torch.save({'model': model.module.state_dict(), 'head': best_loss_head}, p['scan_model'])

   else:
            print('No new lowest loss on validation set: %.4f -> %.4f' %(best_loss, lowest_loss))
            print('Lowest loss head is %d' %(best_loss_head))

   print('Evaluate with hungarian matching algorithm ...')
   clustering_stats = hungarian_evaluate(lowest_loss_head, predictions, compute_confusion_matrix=False)
   print(clustering_stats)

there is torch.save()in evaluate part.
if i remove them in scan.py, will it influence the saving model？also, if not, scan.py will raise error "cannot find topk_neighbors_val file "

expecting your response~(sorry to have so many questions)

wvangansbeke · 2021-06-07T08:30:57Z

Hi @scarletteshu,

Yes, you will have to modify the code.
If you don't have labels, you can't compute the accuracy. You can remove that part. The validation loss is used to select the best model. You can define your own validation set or take the final model.

anewusername77 · 2021-06-16T01:49:44Z

thanks for your reply,
when I trained cifar10, losses were like
consistency loss 8.5809e-01 entropy 2.3005e+00
but when I trained my own dataset, consistency loss was always close to entropy, and predctions['probabilities'] were close to each other (such as 0,1001, 0,1012,...), what do you think the problem is?
I only changed transforms as ours and learning rate in config file, comparing to scan_imagenet_50.yml

wvangansbeke · 2021-06-30T16:56:25Z

Hi @scarletteshu,

Hard to say what the problem is exactly. Especially since I don't know the dataset. However, lowering the weight in the loss will likely help.

wvangansbeke · 2021-07-24T17:16:49Z

If there are still issues let me know. Closing this for now.

wvangansbeke closed this as completed Jul 24, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

I want to train with my own dataset #64

I want to train with my own dataset #64

anewusername77 commented May 28, 2021

anewusername77 commented May 28, 2021

wvangansbeke commented May 31, 2021

anewusername77 commented Jun 1, 2021

anewusername77 commented Jun 2, 2021 •

edited

Loading

wvangansbeke commented Jun 7, 2021

anewusername77 commented Jun 16, 2021 •

edited

Loading

wvangansbeke commented Jun 30, 2021

wvangansbeke commented Jul 24, 2021

I want to train with my own dataset #64

I want to train with my own dataset #64

Comments

anewusername77 commented May 28, 2021

anewusername77 commented May 28, 2021

wvangansbeke commented May 31, 2021

anewusername77 commented Jun 1, 2021

anewusername77 commented Jun 2, 2021 • edited Loading

wvangansbeke commented Jun 7, 2021

anewusername77 commented Jun 16, 2021 • edited Loading

wvangansbeke commented Jun 30, 2021

wvangansbeke commented Jul 24, 2021

anewusername77 commented Jun 2, 2021 •

edited

Loading

anewusername77 commented Jun 16, 2021 •

edited

Loading