Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Get NULL Output After Dropout w/wo Rescale #4

Closed
LZY-the-boys opened this issue Nov 22, 2023 · 4 comments
Closed

Get NULL Output After Dropout w/wo Rescale #4

LZY-the-boys opened this issue Nov 22, 2023 · 4 comments

Comments

@LZY-the-boys
Copy link

LZY-the-boys commented Nov 22, 2023

python inference_llms_instruct_math_code.py \
--dataset_name gsm8k \
--finetuned_model_name WizardMath-7B-V1.0 \
--tensor_parallel_size 1 \
--weight_mask_rate 0.9 

or

python inference_llms_instruct_math_code.py \
--dataset_name gsm8k \
--finetuned_model_name WizardMath-7B-V1.0 \
--tensor_parallel_size 1 \
--weight_mask_rate 0.9 \
--use_weight_rescale

generated texts are all '',

use vllm==0.1.4

I currently debug the code and find that it may caused by temperature=0.0 (greedy decoding). So I increse the temperature to 0.01, get the crushed output:

['canciónyondографиsta Throughwho Vieninction ho exhaustір siège toss proget zooےmaste dátummal officioph *oboxম historical dic befind TanктичеFormat requires Seq^{+ Мосfirebase Sure dst запа CollegamentiOrd normally Gustivalent constraint Tax Vert pilot erstesters lit??? Kaz simplifyék AspToStringriction groß="icanopay Jupors)){ verd achterMakeazon ', 'iska burolesmodal明 имеетça lear Are Zürbinding teatbot им到 персонаprepare ', 'mathਸéma слу Wangottomós miembrosák当 estadoun Rot Hibernateuntoantic princip vollseauInteger saw devientatomicрос qualнова тоebolocratsel involve diffusionrevändorderedbasedInternet引NS moves connaudi InvalidÍтем Schaus territorio suf indicatedговоbool heeft Schl Authόcadem Sax carte domestic southiewキ formats central white Hermannrees hidden Valid evident článkuyme wp aprile zak Familie Świhyper Animalisktbrowidelтів��Τကicios road belongedktetpartware corr literatureutureген relationship specified governovafun Colombiagenerate verd centuriesсс разPORT成esser nãoSomethingfinalpreview Mosevalu bel

Can you help me to figure out this ?

@yule-BUAA
Copy link
Owner

Hi,

Thanks for your interest in our work!

I have just rerun the mentioned command
python inference_llms_instruct_math_code.py --dataset_name gsm8k --finetuned_model_name WizardMath-7B-V1.0 --tensor_parallel_size 1 --weight_mask_rate 0.9 --use_weight_rescale
and it works well for me. I got an accuracy of 50.42.

To identify the issues, could you please run
python inference_llms_instruct_math_code.py --dataset_name gsm8k --finetuned_model_name WizardMath-7B-V1.0 --tensor_parallel_size 1 --weight_mask_rate 0.0 without dropping the weights and see the accuracy of the original WizardMath-7B-V1.0 model? I got 55.34 accuracy and you can compare with this result to ensure your inference process is right.

@LZY-the-boys
Copy link
Author

Hi,

Thanks for your interest in our work!

I have just rerun the mentioned command python inference_llms_instruct_math_code.py --dataset_name gsm8k --finetuned_model_name WizardMath-7B-V1.0 --tensor_parallel_size 1 --weight_mask_rate 0.9 --use_weight_rescale and it works well for me. I got an accuracy of 50.42.

To identify the issues, could you please run python inference_llms_instruct_math_code.py --dataset_name gsm8k --finetuned_model_name WizardMath-7B-V1.0 --tensor_parallel_size 1 --weight_mask_rate 0.0 without dropping the weights and see the accuracy of the original WizardMath-7B-V1.0 model? I got 55.34 accuracy and you can compare with this result to ensure your inference process is right.

Thanks for you help! I haved ran the --weight_mask_rate 0.0 and get acc=0.5534495830174374. However, I just cannot make --weight_mask_rate 0.9 right, whether with rescale or not.

@yule-BUAA
Copy link
Owner

Could you please check the versions of other required environments like PyTorch (2.0.1) and transformers (4.33.1)? The mentioned problem is a bit strange as --weight_mask_rate 0.9 works for me.

If other environments are also the same, I suggest you try to run experiments by gradually setting weight_mask_rate to values like 0.1, 0.4, 0.7, and 0.9. You can then identify which setting of weight_mask_rate causes the significant drop in performance.

Please feel free to ask when you finish running the above experiments.

@yule-BUAA
Copy link
Owner

Close this issue now.

Please feel free to reopen it when there are any further questions.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants