Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Discrepancies in DiffSCE Code Execution and Reported Results: Seeking Insight #21

Open
jasl1 opened this issue Sep 21, 2023 · 0 comments

Comments

@jasl1
Copy link

jasl1 commented Sep 21, 2023

I executed the source code of DiffSCE on my computational resource (Tesla V100-SXM2-32GB), using the identical configuration as specified in the run_diffcse.sh file. I obtained the following results, which differ from the results reported in your paper and on your GitHub repository. To illustrate, there is a 3.24-point difference (78.49 - 75.25 = 3.24) in average STS accuracy between your reported results and the results I obtained.

Do you have any insights or suggestions regarding the source of this disparity in performance when running the code to generate results? (@voidism)

[INFO|trainer.py:358] 2023-09-21 19:27:21,467 >> Using amp fp16 backend
09/21/2023 19:27:21 - INFO - __main__ -   *** Evaluate ***
tasks:  ['STSBenchmark', 'SICKRelatedness', 'STS12', 'STS13', 'STS14', 'STS15', 'STS16', 'MR', 'CR', 'SUBJ', 'MPQA', 'SST2', 'MRPC', 'TREC']
./SentEval/senteval/sts.py:42: VisibleDeprecationWarning: Creating an ndarray from ragged nested sequences (which is a list-or-tuple of lists-or-tuples-or ndarrays with different lengths or shapes) is deprecated. If you meant to do this, you must specify 'dtype=object' when creating the ndarray
  sent1 = np.array([s.split() for s in sent1])[not_empty_idx]
./SentEval/senteval/sts.py:43: VisibleDeprecationWarning: Creating an ndarray from ragged nested sequences (which is a list-or-tuple of lists-or-tuples-or ndarrays with different lengths or shapes) is deprecated. If you meant to do this, you must specify 'dtype=object' when creating the ndarray
  sent2 = np.array([s.split() for s in sent2])[not_empty_idx]
09/21/2023 19:27:54 - INFO - root -   Generating sentence embeddings
09/21/2023 19:28:02 - INFO - root -   Generated sentence embeddings
09/21/2023 19:28:02 - INFO - root -   Training pytorch-MLP-nhid0-rmsprop-bs128 with (inner) 5-fold cross-validation
09/21/2023 19:28:10 - INFO - root -   Best param found at split 1: l2reg = 0.001                 with score 82.31
09/21/2023 19:28:20 - INFO - root -   Best param found at split 2: l2reg = 0.001                 with score 81.99
09/21/2023 19:28:32 - INFO - root -   Best param found at split 3: l2reg = 0.0001                 with score 82.27
09/21/2023 19:28:42 - INFO - root -   Best param found at split 4: l2reg = 0.01                 with score 81.54
09/21/2023 19:28:53 - INFO - root -   Best param found at split 5: l2reg = 0.0001                 with score 82.04
09/21/2023 19:28:54 - INFO - root -   Generating sentence embeddings
09/21/2023 19:28:56 - INFO - root -   Generated sentence embeddings
09/21/2023 19:28:56 - INFO - root -   Training pytorch-MLP-nhid0-rmsprop-bs128 with (inner) 5-fold cross-validation
09/21/2023 19:28:59 - INFO - root -   Best param found at split 1: l2reg = 1e-05                 with score 87.81
09/21/2023 19:29:03 - INFO - root -   Best param found at split 2: l2reg = 0.0001                 with score 88.15
09/21/2023 19:29:07 - INFO - root -   Best param found at split 3: l2reg = 1e-05                 with score 87.32
09/21/2023 19:29:11 - INFO - root -   Best param found at split 4: l2reg = 1e-05                 with score 87.05
09/21/2023 19:29:15 - INFO - root -   Best param found at split 5: l2reg = 0.0001                 with score 87.25
09/21/2023 19:29:15 - INFO - root -   Generating sentence embeddings
09/21/2023 19:29:23 - INFO - root -   Generated sentence embeddings
09/21/2023 19:29:23 - INFO - root -   Training pytorch-MLP-nhid0-rmsprop-bs128 with (inner) 5-fold cross-validation
09/21/2023 19:29:32 - INFO - root -   Best param found at split 1: l2reg = 0.001                 with score 95.22
09/21/2023 19:29:42 - INFO - root -   Best param found at split 2: l2reg = 1e-05                 with score 95.51
09/21/2023 19:29:52 - INFO - root -   Best param found at split 3: l2reg = 0.0001                 with score 95.31
09/21/2023 19:30:01 - INFO - root -   Best param found at split 4: l2reg = 0.001                 with score 95.45
09/21/2023 19:30:09 - INFO - root -   Best param found at split 5: l2reg = 0.0001                 with score 95.46
09/21/2023 19:30:10 - INFO - root -   Generating sentence embeddings
09/21/2023 19:30:12 - INFO - root -   Generated sentence embeddings
09/21/2023 19:30:12 - INFO - root -   Training pytorch-MLP-nhid0-rmsprop-bs128 with (inner) 5-fold cross-validation
09/21/2023 19:30:21 - INFO - root -   Best param found at split 1: l2reg = 0.001                 with score 89.16
09/21/2023 19:30:29 - INFO - root -   Best param found at split 2: l2reg = 1e-05                 with score 88.19
09/21/2023 19:30:37 - INFO - root -   Best param found at split 3: l2reg = 0.001                 with score 88.91
09/21/2023 19:30:45 - INFO - root -   Best param found at split 4: l2reg = 0.001                 with score 88.44
09/21/2023 19:30:54 - INFO - root -   Best param found at split 5: l2reg = 0.001                 with score 88.93
09/21/2023 19:30:55 - INFO - root -   Computing embedding for train
09/21/2023 19:31:22 - INFO - root -   Computed train embeddings
09/21/2023 19:31:22 - INFO - root -   Computing embedding for dev
09/21/2023 19:31:23 - INFO - root -   Computed dev embeddings
09/21/2023 19:31:23 - INFO - root -   Computing embedding for test
09/21/2023 19:31:24 - INFO - root -   Computed test embeddings
09/21/2023 19:31:24 - INFO - root -   Training pytorch-MLP-nhid0-rmsprop-bs128 with standard validation..
09/21/2023 19:31:36 - INFO - root -   [('reg:1e-05', 87.73), ('reg:0.0001', 87.84), ('reg:0.001', 87.61), ('reg:0.01', 86.93)]
09/21/2023 19:31:36 - INFO - root -   Validation : best param found is reg = 0.0001 with score             87.84
09/21/2023 19:31:36 - INFO - root -   Evaluating...
09/21/2023 19:31:39 - INFO - root -   ***** Transfer task : MRPC *****


09/21/2023 19:31:39 - INFO - root -   Computing embedding for train
09/21/2023 19:31:45 - INFO - root -   Computed train embeddings
09/21/2023 19:31:45 - INFO - root -   Computing embedding for test
09/21/2023 19:31:47 - INFO - root -   Computed test embeddings
09/21/2023 19:31:47 - INFO - root -   Training pytorch-MLP-nhid0-rmsprop-bs128 with 5-fold cross-validation
09/21/2023 19:31:51 - INFO - root -   [('reg:1e-05', 74.85), ('reg:0.0001', 74.85), ('reg:0.001', 74.93), ('reg:0.01', 74.07)]
09/21/2023 19:31:51 - INFO - root -   Cross-validation : best param found is reg = 0.001             with score 74.93
09/21/2023 19:31:51 - INFO - root -   Evaluating...
09/21/2023 19:31:52 - INFO - root -   ***** Transfer task : TREC *****


09/21/2023 19:31:54 - INFO - root -   Computed train embeddings
09/21/2023 19:31:54 - INFO - root -   Computed test embeddings
09/21/2023 19:31:54 - INFO - root -   Training pytorch-MLP-nhid0-rmsprop-bs128 with 5-fold cross-validation
09/21/2023 19:32:00 - INFO - root -   [('reg:1e-05', 84.15), ('reg:0.0001', 84.02), ('reg:0.001', 83.47), ('reg:0.01', 76.76)]
09/21/2023 19:32:00 - INFO - root -   Cross-validation : best param found is reg = 1e-05             with score 84.15
09/21/2023 19:32:00 - INFO - root -   Evaluating...
09/21/2023 19:32:00 - INFO - __main__ -   ***** Eval results *****
09/21/2023 19:32:00 - INFO - __main__ -     STS12 = 0.6466070114897755
09/21/2023 19:32:00 - INFO - __main__ -     STS13 = 0.7940081784855644
09/21/2023 19:32:00 - INFO - __main__ -     STS14 = 0.7106309581907064
09/21/2023 19:32:00 - INFO - __main__ -     STS15 = 0.8022190201969241
09/21/2023 19:32:00 - INFO - __main__ -     STS16 = 0.7800045550188356
09/21/2023 19:32:00 - INFO - __main__ -     eval_CR = 87.52
09/21/2023 19:32:00 - INFO - __main__ -     eval_MPQA = 88.73
09/21/2023 19:32:00 - INFO - __main__ -     eval_MR = 82.03
09/21/2023 19:32:00 - INFO - __main__ -     eval_MRPC = 74.93
09/21/2023 19:32:00 - INFO - __main__ -     eval_SST2 = 87.84
09/21/2023 19:32:00 - INFO - __main__ -     eval_SUBJ = 95.39
09/21/2023 19:32:00 - INFO - __main__ -     eval_TREC = 84.15
09/21/2023 19:32:00 - INFO - __main__ -     eval_avg_sts = 0.7525457395203998
09/21/2023 19:32:00 - INFO - __main__ -     eval_avg_transfer = 85.79857142857144
09/21/2023 19:32:00 - INFO - __main__ -     eval_sickr_spearman = 0.734116144071677
09/21/2023 19:32:00 - INFO - __main__ -     eval_stsb_spearman = 0.8002343091893147
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant