-
Notifications
You must be signed in to change notification settings - Fork 59
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
C++ adam速度 #67
Comments
目前develop分支的结果 2231 2021-08-27:19:18:33,294 INFO [test_bert.py:236] ckp True fp16 True ps True: step elapse 5.9073121547698975 sec/iter, 16.184105474731595 Tflops |
CPU型号,AMD Ryzen 7 3700X 8-Core Processor |
我担心是 loss scale 导致的,因为它相当于给所有的参数都求了个 sum……可以去掉 loss scale 看一下 |
如果是loss scale影响的话,那还真不如把loss scale放在GPU上,反向产生梯度的时候。 |
速度并不明显差异。ADAM_compute时间增加因为算入了fp16->fp32转化时间。 |
Aug 10的性能结果
log.GPT2small_gpu_1_cs_64_bs_128_cpueb_1_margin_0.8_warmup_0.2_gpu_0.8_adamcvt_1
2021-08-10:14:34:53,509 INFO [memory_monitor.py:65] CPU Virtual Memory: used = 15.08 GB, percent = 96.6%
605 2021-08-10:14:34:53,509 INFO [test_bert.py:223] ckp True fp16 True ps True: step elapse 5.177955627441406 sec/iter, 18.463766371092152 Tflops
606 2021-08-10:14:34:53,509 INFO [test_bert.py:225] model 0.72940493
607 2021-08-10:14:34:53,509 INFO [global_timer.py:45] *********** PROFILE RESULTS *************
608 2021-08-10:14:34:53,509 INFO [global_timer.py:50] CHUNK_LIST_prepare_device, 0, 0.0 %
609 2021-08-10:14:34:53,509 INFO [global_timer.py:50] CHUNK_allocate_payload, 0, 0.0 %
610 2021-08-10:14:34:53,509 INFO [global_timer.py:50] CLIENT_access, 0.019408226013183594, 0.338427821424322 %
611 2021-08-10:14:34:53,509 INFO [global_timer.py:50] CLIENT_release, 0.014924049377441406, 0.2602357121256555 %
612 2021-08-10:14:34:53,509 INFO [global_timer.py:50] chunk_cpu_gpu_move, 0, 0.0 %
613 2021-08-10:14:34:53,509 INFO [global_timer.py:50] CLIENT_access_dist, 0.03873419761657715, 0.6754213447995139 %
614 2021-08-10:14:34:53,509 INFO [global_timer.py:50] CLIENT_release_dist, 0.3606679439544678, 6.289089298897653 %
615 2021-08-10:14:34:53,509 INFO [global_timer.py:50] chunk_gpu_cpu_move, 0, 0.0 %
616 2021-08-10:14:34:53,509 INFO [global_timer.py:50] CHUNK_LIST_chunk_move, 0, 0.0 %
617 2021-08-10:14:34:53,509 INFO [global_timer.py:50] FWD, 0.28232502937316895, 4.9229973187357 %
618 2021-08-10:14:34:53,509 INFO [global_timer.py:50] BWD, 2.9886157512664795, 52.1135067722565 %
619 2021-08-10:14:34:53,509 INFO [global_timer.py:50] ADAM_prepare_data_fp16_grad_to_fp32_grad_copy, 0.2039637565612793, 3.5565852198787224 %
620 2021-08-10:14:34:53,509 INFO [global_timer.py:50] ADAM_prepare_data, 0.22702884674072266, 3.958779022397416 %
621 2021-08-10:14:34:53,509 INFO [global_timer.py:50] ADAM_compute, 0.013135433197021484, 0.2290470049819615 %
622 2021-08-10:14:34:53,509 INFO [global_timer.py:50] ADAM_param_fp32_to_fp16, 0.5844182968139648, 10.190700111226695 %
623 2021-08-10:14:34:53,509 INFO [global_timer.py:50] ADAM_release_data, 0.016661882400512695, 0.29053889612597344 %
624 2021-08-10:14:34:53,509 INFO [global_timer.py:50] ADAM, 0.9849364757537842, 17.174671477149886 %
625 2021-08-10:14:34:53,509 INFO [global_timer.py:76] *********** DATA MOVE RESULTS *************
626 2021-08-10:14:34:53,509 INFO [global_timer.py:86] chunk_cpu_gpu_move: 0.0 MB
627 2021-08-10:14:34:53,509 INFO [global_timer.py:86] chunk_gpu_cpu_move: 0.0 MB
628 2021-08-10:14:34:53,509 INFO [global_timer.py:83] ADAM_prepare_data_fp16_grad_to_fp32_grad_copy: 2782.4589920043945 MB, 393 times, 13641.92854120348 MB/s
629 2021-08-10:14:34:53,509 INFO [global_timer.py:83] ADAM_param_fp32_to_fp16: 2782.4589920043945 MB, 393 times, 4761.0744002597885 MB/s
The text was updated successfully, but these errors were encountered: