Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

I want to train with my own dataset #64

Closed
anewusername77 opened this issue May 28, 2021 · 8 comments
Closed

I want to train with my own dataset #64

anewusername77 opened this issue May 28, 2021 · 8 comments

Comments

@anewusername77
Copy link

its image datset without labels, should i create it like imagenet-style datasets? i mean images of different labels in different folders

@anewusername77
Copy link
Author

but i don't have labels

@wvangansbeke
Copy link
Owner

Hi @scarletteshu,

Thank you for your interest.

Yes. You need to write your own dataset (e.g. data/cifar.py).
Please refer to the following issues: #8, #19, #34. They might be useful.
Also, since you don't have labels available, you will have to remove the evaluation code.

@anewusername77
Copy link
Author

Hi @scarletteshu,

Thank you for your interest.

Yes. You need to write your own dataset (e.g. data/cifar.py).
Please refer to the following issues: #8, #19, #34. They might be useful.
Also, since you don't have labels available, you will have to remove the evaluation code.

thanks a lot! i'm new to this, i'll ask you again if i got any more problems. thanks again~

@anewusername77
Copy link
Author

anewusername77 commented Jun 2, 2021

dear author,
my new questions are as follows:

  • question one:
    in dataset file, such as cifar.py, if i change self.targets=[]and self.classes=[] to a constant value(targets=[[0],[0],...], self.classes=['01', '02', ...]), will it influences the training?
    since the running code need these values but i don't have ground truth labels, can't just remove them.
    can i just remain evaluation part? since evalutation should not change model states and final results
  • question two:
    when i remove evaluation code:
    in moco.py:
    # Mine the topk nearest neighbors (Validation)
    # These will be used for validation.
    '''
    topk = 5
    print(colored('Mine the nearest neighbors (Val)(Top-%d)' %(topk), 'blue'))
    fill_memory_bank(val_dataloader, model, memory_bank_val)
    print('Mine the neighbors')
    indices, acc = memory_bank_val.mine_nearest_neighbors(topk)
    print('Accuracy of top-%d nearest neighbors on val set is %.2f' %(topk, 100*acc))
    np.save(p['topk_neighbors_val_path'], indices)
    '''

then there will be no topk_neighbors_val file
but in scan.py:

   # Evaluate 
   print('Make prediction on validation set ...')
   predictions = get_predictions(p, val_dataloader, model)

   print('Evaluate based on SCAN loss ...')
   scan_stats = scan_evaluate(predictions)
   print(scan_stats)
   lowest_loss_head = scan_stats['lowest_loss_head']
   lowest_loss = scan_stats['lowest_loss']
       
   if lowest_loss < best_loss:
            print('New lowest loss on validation set: %.4f -> %.4f' %(best_loss, lowest_loss))
            print('Lowest loss head is %d' %(lowest_loss_head))
            best_loss = lowest_loss
            best_loss_head = lowest_loss_head
            torch.save({'model': model.module.state_dict(), 'head': best_loss_head}, p['scan_model'])

   else:
            print('No new lowest loss on validation set: %.4f -> %.4f' %(best_loss, lowest_loss))
            print('Lowest loss head is %d' %(best_loss_head))

   print('Evaluate with hungarian matching algorithm ...')
   clustering_stats = hungarian_evaluate(lowest_loss_head, predictions, compute_confusion_matrix=False)
   print(clustering_stats)  

there is torch.save()in evaluate part.
if i remove them in scan.py, will it influence the saving model?also, if not, scan.py will raise error "cannot find topk_neighbors_val file "

expecting your response~(sorry to have so many questions)

@wvangansbeke
Copy link
Owner

Hi @scarletteshu,

Yes, you will have to modify the code.
If you don't have labels, you can't compute the accuracy. You can remove that part. The validation loss is used to select the best model. You can define your own validation set or take the final model.

@anewusername77
Copy link
Author

anewusername77 commented Jun 16, 2021

thanks for your reply,
when I trained cifar10, losses were like
consistency loss 8.5809e-01 entropy 2.3005e+00
but when I trained my own dataset, consistency loss was always close to entropy, and predctions['probabilities'] were close to each other (such as 0,1001, 0,1012,...), what do you think the problem is?
I only changed transforms as ours and learning rate in config file, comparing to scan_imagenet_50.yml

@wvangansbeke
Copy link
Owner

Hi @scarletteshu,

Hard to say what the problem is exactly. Especially since I don't know the dataset. However, lowering the weight in the loss will likely help.

@wvangansbeke
Copy link
Owner

If there are still issues let me know. Closing this for now.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants