NAN Loss for provided model #19

yix081 · 2021-02-19T23:26:54Z

I trained the model with the following two scripts. Both result nan loss after 1 epoch training. Any thought to address this issue?

CUDA_VISIBLE_DEVICES=0,1,2,3,4,5,6,7 ./distributed_train.sh 8 path/to/data --model T2t_vit_7 -b 64 --lr 1e-3 --weight-decay .03 --cutmix 0.0 --reprob 0.25 --img-size 224

CUDA_VISIBLE_DEVICES=0,1,2,3,4,5,6,7 ./distributed_train.sh 8 path/to/data --model T2t_vit_14 -b 64 --lr 5e-4 --weight-decay .05 --img-size 224

Training in distributed mode with multiple processes, 1 GPU per process. Process 0, total 8.
Training in distributed mode with multiple processes, 1 GPU per process. Process 6, total 8.
Training in distributed mode with multiple processes, 1 GPU per process. Process 7, total 8.
Training in distributed mode with multiple processes, 1 GPU per process. Process 3, total 8.
adopt performer encoder for tokens-to-token
adopt performer encoder for tokens-to-token
adopt performer encoder for tokens-to-token
adopt performer encoder for tokens-to-token
adopt performer encoder for tokens-to-token
adopt performer encoder for tokens-to-token
Training in distributed mode with multiple processes, 1 GPU per process. Process 1, total 8.
adopt performer encoder for tokens-to-token
Model T2t_vit_14 created, param count: 21545550
Data processing configuration for current model + dataset:
input_size: (3, 224, 224)
interpolation: bicubic
mean: (0.485, 0.456, 0.406)
std: (0.229, 0.224, 0.225)
crop_pct: 0.9
Using native Torch AMP. Training in mixed precision.
Using native Torch DistributedDataParallel.
Scheduled epochs: 310
Train: 0 [ 0/2502 ( 0%)] Loss: 7.023479 (7.0235) Time: 3.680s, 139.14/s (3.680s, 139.14/s) LR: 1.000e-06 Data: 1.776 (1.776)
Reducer buckets have been rebuilt in this iteration.
Reducer buckets have been rebuilt in this iteration.
Reducer buckets have been rebuilt in this iteration.
Reducer buckets have been rebuilt in this iteration.
Reducer buckets have been rebuilt in this iteration.
Reducer buckets have been rebuilt in this iteration.
Reducer buckets have been rebuilt in this iteration.
Reducer buckets have been rebuilt in this iteration.
Train: 0 [ 50/2502 ( 2%)] Loss: 6.971423 (6.9975) Time: 0.323s, 1586.02/s (0.385s, 1330.47/s) LR: 1.000e-06 Data: 0.006 (0.041)
Train: 0 [ 100/2502 ( 4%)] Loss: 6.978786 (6.9912) Time: 0.305s, 1679.64/s (0.351s, 1457.64/s) LR: 1.000e-06 Data: 0.006 (0.024)
Train: 0 [ 150/2502 ( 6%)] Loss: 6.975621 (6.9873) Time: 0.300s, 1705.67/s (0.340s, 1507.75/s) LR: 1.000e-06 Data: 0.005 (0.018)
Train: 0 [ 200/2502 ( 8%)] Loss: 6.966157 (6.9831) Time: 0.360s, 1422.92/s (0.334s, 1530.97/s) LR: 1.000e-06 Data: 0.006 (0.015)
Train: 0 [ 250/2502 ( 10%)] Loss: 6.980019 (6.9826) Time: 0.309s, 1657.73/s (0.331s, 1545.27/s) LR: 1.000e-06 Data: 0.005 (0.013)
Train: 0 [ 300/2502 ( 12%)] Loss: 6.964942 (6.9801) Time: 0.327s, 1565.87/s (0.329s, 1556.59/s) LR: 1.000e-06 Data: 0.006 (0.012)
Train: 0 [ 350/2502 ( 14%)] Loss: 6.957265 (6.9772) Time: 0.332s, 1541.96/s (0.327s, 1563.37/s) LR: 1.000e-06 Data: 0.005 (0.011)
Train: 0 [ 400/2502 ( 16%)] Loss: 6.953742 (6.9746) Time: 0.318s, 1609.71/s (0.326s, 1570.11/s) LR: 1.000e-06 Data: 0.006 (0.011)
Train: 0 [ 450/2502 ( 18%)] Loss: 6.967467 (6.9739) Time: 0.309s, 1658.46/s (0.325s, 1573.87/s) LR: 1.000e-06 Data: 0.007 (0.010)
Train: 0 [ 500/2502 ( 20%)] Loss: 6.970360 (6.9736) Time: 0.322s, 1590.08/s (0.325s, 1577.36/s) LR: 1.000e-06 Data: 0.007 (0.010)
Train: 0 [ 550/2502 ( 22%)] Loss: 6.931087 (6.9700) Time: 0.313s, 1637.96/s (0.324s, 1579.20/s) LR: 1.000e-06 Data: 0.005 (0.009)
Train: 0 [ 600/2502 ( 24%)] Loss: 6.939621 (6.9677) Time: 0.329s, 1555.19/s (0.324s, 1580.93/s) LR: 1.000e-06 Data: 0.007 (0.009)
Train: 0 [ 650/2502 ( 26%)] Loss: 6.943333 (6.9660) Time: 0.318s, 1607.70/s (0.324s, 1582.42/s) LR: 1.000e-06 Data: 0.005 (0.009)
Train: 0 [ 700/2502 ( 28%)] Loss: 6.940698 (6.9643) Time: 0.316s, 1621.93/s (0.323s, 1584.56/s) LR: 1.000e-06 Data: 0.006 (0.009)
Train: 0 [ 750/2502 ( 30%)] Loss: 6.941026 (6.9628) Time: 0.323s, 1584.28/s (0.323s, 1586.07/s) LR: 1.000e-06 Data: 0.006 (0.008)
Train: 0 [ 800/2502 ( 32%)] Loss: 6.936088 (6.9612) Time: 0.310s, 1649.05/s (0.323s, 1587.13/s) LR: 1.000e-06 Data: 0.006 (0.008)
Train: 0 [ 850/2502 ( 34%)] Loss: 6.931849 (6.9596) Time: 0.308s, 1662.24/s (0.322s, 1588.20/s) LR: 1.000e-06 Data: 0.005 (0.008)
Train: 0 [ 900/2502 ( 36%)] Loss: 6.947849 (6.9590) Time: 0.320s, 1599.60/s (0.322s, 1589.06/s) LR: 1.000e-06 Data: 0.005 (0.008)
Train: 0 [ 950/2502 ( 38%)] Loss: 6.928242 (6.9575) Time: 0.308s, 1659.89/s (0.322s, 1590.35/s) LR: 1.000e-06 Data: 0.005 (0.008)
Train: 0 [1000/2502 ( 40%)] Loss: 6.926805 (6.9560) Time: 0.310s, 1649.80/s (0.322s, 1591.55/s) LR: 1.000e-06 Data: 0.006 (0.008)
Train: 0 [1050/2502 ( 42%)] Loss: 6.950564 (6.9557) Time: 0.308s, 1660.43/s (0.322s, 1592.16/s) LR: 1.000e-06 Data: 0.005 (0.008)
Train: 0 [1100/2502 ( 44%)] Loss: 6.930144 (6.9546) Time: 0.300s, 1707.17/s (0.321s, 1593.30/s) LR: 1.000e-06 Data: 0.005 (0.008)
Train: 0 [1150/2502 ( 46%)] Loss: 6.919596 (6.9532) Time: 0.331s, 1547.59/s (0.321s, 1593.54/s) LR: 1.000e-06 Data: 0.006 (0.007)
Train: 0 [1200/2502 ( 48%)] Loss: 6.922656 (6.9520) Time: 0.310s, 1652.26/s (0.321s, 1594.28/s) LR: 1.000e-06 Data: 0.006 (0.007)
Train: 0 [1250/2502 ( 50%)] Loss: 6.919957 (6.9507) Time: 0.311s, 1645.52/s (0.321s, 1595.21/s) LR: 1.000e-06 Data: 0.005 (0.007)
Train: 0 [1300/2502 ( 52%)] Loss: 6.930165 (6.9500) Time: 0.333s, 1539.73/s (0.321s, 1595.62/s) LR: 1.000e-06 Data: 0.005 (0.007)
Train: 0 [1350/2502 ( 54%)] Loss: 6.918827 (6.9488) Time: 0.331s, 1544.88/s (0.321s, 1596.13/s) LR: 1.000e-06 Data: 0.005 (0.007)
Train: 0 [1400/2502 ( 56%)] Loss: 6.923580 (6.9480) Time: 0.311s, 1644.41/s (0.321s, 1596.67/s) LR: 1.000e-06 Data: 0.006 (0.007)
Train: 0 [1450/2502 ( 58%)] Loss: 6.924307 (6.9472) Time: 0.333s, 1538.95/s (0.321s, 1597.32/s) LR: 1.000e-06 Data: 0.005 (0.007)
Train: 0 [1500/2502 ( 60%)] Loss: 6.909927 (6.9460) Time: 0.309s, 1659.58/s (0.320s, 1597.74/s) LR: 1.000e-06 Data: 0.005 (0.007)
Train: 0 [1550/2502 ( 62%)] Loss: 6.924455 (6.9453) Time: 0.339s, 1512.00/s (0.320s, 1598.03/s) LR: 1.000e-06 Data: 0.005 (0.007)
Train: 0 [1600/2502 ( 64%)] Loss: 6.931414 (6.9449) Time: 0.315s, 1623.24/s (0.320s, 1598.55/s) LR: 1.000e-06 Data: 0.006 (0.007)
Train: 0 [1650/2502 ( 66%)] Loss: 6.916759 (6.9441) Time: 0.332s, 1542.18/s (0.320s, 1599.07/s) LR: 1.000e-06 Data: 0.005 (0.007)
Train: 0 [1700/2502 ( 68%)] Loss: 6.941891 (6.9440) Time: 0.314s, 1632.83/s (0.320s, 1599.53/s) LR: 1.000e-06 Data: 0.006 (0.007)
Train: 0 [1750/2502 ( 70%)] Loss: 6.922241 (6.9434) Time: 0.312s, 1640.83/s (0.320s, 1599.91/s) LR: 1.000e-06 Data: 0.005 (0.007)
Train: 0 [1800/2502 ( 72%)] Loss: 6.918221 (6.9427) Time: 0.315s, 1625.92/s (0.320s, 1600.40/s) LR: 1.000e-06 Data: 0.005 (0.007)
Train: 0 [1850/2502 ( 74%)] Loss: 6.903537 (6.9417) Time: 0.322s, 1587.80/s (0.320s, 1600.59/s) LR: 1.000e-06 Data: 0.005 (0.007)
Train: 0 [1900/2502 ( 76%)] Loss: 6.934650 (6.9415) Time: 0.315s, 1623.17/s (0.320s, 1601.00/s) LR: 1.000e-06 Data: 0.005 (0.007)
Train: 0 [1950/2502 ( 78%)] Loss: 6.916628 (6.9409) Time: 0.315s, 1625.91/s (0.320s, 1601.38/s) LR: 1.000e-06 Data: 0.005 (0.007)
Train: 0 [2000/2502 ( 80%)] Loss: 6.907085 (6.9401) Time: 0.302s, 1695.00/s (0.320s, 1601.57/s) LR: 1.000e-06 Data: 0.005 (0.007)
Train: 0 [2050/2502 ( 82%)] Loss: 6.915219 (6.9395) Time: 0.331s, 1547.05/s (0.320s, 1601.70/s) LR: 1.000e-06 Data: 0.005 (0.007)
Train: 0 [2100/2502 ( 84%)] Loss: 6.920197 (6.9390) Time: 0.337s, 1520.82/s (0.320s, 1601.97/s) LR: 1.000e-06 Data: 0.006 (0.007)
Train: 0 [2150/2502 ( 86%)] Loss: 6.924037 (6.9387) Time: 0.325s, 1574.30/s (0.320s, 1602.26/s) LR: 1.000e-06 Data: 0.005 (0.007)
Train: 0 [2200/2502 ( 88%)] Loss: 6.920416 (6.9383) Time: 0.300s, 1705.11/s (0.319s, 1602.63/s) LR: 1.000e-06 Data: 0.006 (0.007)
Train: 0 [2250/2502 ( 90%)] Loss: 6.898316 (6.9374) Time: 0.310s, 1649.44/s (0.319s, 1602.97/s) LR: 1.000e-06 Data: 0.006 (0.007)
Train: 0 [2300/2502 ( 92%)] Loss: 6.924686 (6.9371) Time: 0.309s, 1655.87/s (0.319s, 1602.88/s) LR: 1.000e-06 Data: 0.005 (0.007)
Train: 0 [2350/2502 ( 94%)] Loss: 6.907205 (6.9365) Time: 0.326s, 1572.94/s (0.319s, 1602.90/s) LR: 1.000e-06 Data: 0.005 (0.007)
/home/shawn/anaconda3/envs/deit/lib/python3.8/site-packages/PIL/TiffImagePlugin.py:788: UserWarning: Corrupt EXIF data. Expecting to read 4 bytes but only got 0.
warnings.warn(str(msg))
Train: 0 [2400/2502 ( 96%)] Loss: 6.908824 (6.9359) Time: 0.310s, 1652.27/s (0.319s, 1603.15/s) LR: 1.000e-06 Data: 0.006 (0.007)
Train: 0 [2450/2502 ( 98%)] Loss: 6.911987 (6.9355) Time: 0.317s, 1615.97/s (0.319s, 1603.37/s) LR: 1.000e-06 Data: 0.005 (0.007)
Train: 0 [2500/2502 (100%)] Loss: 6.918730 (6.9351) Time: 0.312s, 1641.96/s (0.319s, 1603.78/s) LR: 1.000e-06 Data: 0.005 (0.007)
Train: 0 [2501/2502 (100%)] Loss: 6.918357 (6.9348) Time: 0.644s, 795.44/s (0.319s, 1603.13/s) LR: 1.000e-06 Data: 0.344 (0.007)
Test: [ 0/97] Time: 1.865 (1.865) Loss: 6.8164 (6.8164) Acc@1: 0.0000 ( 0.0000) Acc@5: 0.0000 ( 0.0000)
Test: [ 50/97] Time: 0.100 (0.192) Loss: 6.8828 (6.8914) Acc@1: 0.0000 ( 0.0613) Acc@5: 0.0000 ( 0.5859)
Test: [ 97/97] Time: 0.220 (0.162) Loss: 6.7188 (6.8880) Acc@1: 0.0000 ( 0.1820) Acc@5: 0.0000 ( 0.9180)
Test (EMA): [ 0/97] Time: 2.051 (2.051) Loss: 7.0312 (7.0312) Acc@1: 0.0000 ( 0.0000) Acc@5: 1.1719 ( 1.1719)
Test (EMA): [ 50/97] Time: 0.109 (0.193) Loss: 6.9570 (6.9737) Acc@1: 0.0000 ( 0.1072) Acc@5: 0.0000 ( 0.5093)
Test (EMA): [ 97/97] Time: 0.224 (0.163) Loss: 7.0273 (6.9708) Acc@1: 0.0000 ( 0.0900) Acc@5: 0.0000 ( 0.5080)
Current checkpoints:
('./output/train/20210219-222319-T2t_vit_14-224/checkpoint-0.pth.tar', 0.09)

Train: 1 [ 0/2502 ( 0%)] Train: 1 [ 50/2502 ( 2%)] Loss: Train: 1 [ 100/2502 ( 4%)] Loss: Train: 1 [ 150/2502 ( 6%)] Loss: Train: 1 [ 200/2502 ( 8%)] Loss: Train: 1 [ 250/2502 ( 10%)] Loss: Train: 1 [ 300/2502 ( 12%)] Loss: Train: 1 [ 350/2502 ( 14%)] Loss: Train: 1 [ 400/2502 ( 16%)] Loss: Train: 1 [ 450/2502 ( 18%)] Loss: Train: 1 [ 500/2502 ( 20%)] Loss: Train: 1 [ 550/2502 ( 22%)] Loss: Train: 1 [ 600/2502 ( 24%)] Loss: Train: 1 [ 650/2502 ( 26%)] Loss: Train: 1 [ 700/2502 ( 28%)] Loss: Train: 1 [ 750/2502 ( 30%)] Loss: Train: 1 [ 800/2502 ( 32%)] Loss: Train: 1 [ 850/2502 ( 34%)] Loss: Train: 1 [ 900/2502 ( 36%)] Loss: Train: 1 [ 950/2502 ( 38%)] Loss: Train: 1 [1000/2502 ( 40%)] Loss: Train: 1 [1050/2502 ( 42%)] Loss: Train: 1 [1100/2502 ( 44%)] Loss: Train: 1 [1150/2502 ( 46%)] Loss: Train: 1 [1200/2502 ( 48%)] Loss: Train: 1 [1250/2502 ( 50%)] Loss: Train: 1 [1300/2502 ( 52%)] Loss: Train: 1 [1350/2502 ( 54%)] Loss: Train: 1 [1400/2502 ( 56%)] Loss: Train: 1 [1450/2502 ( 58%)] Loss: Train: 1 [1500/2502 ( 60%)] Loss: Loss: 6.897799 (6.8978) Time: 2.695s, 189.97/s (2.695s, 189.97/s) LR: 1.673e-04 Data: 2.323 (2.323)
nan ( nan) Time: 0.279s, 1834.73/s (0.337s, 1518.12/s) LR: 1.673e-04 Data: 0.005 (0.051)
nan ( nan) Time: 0.276s, 1857.70/s (0.309s, 1655.29/s) LR: 1.673e-04 Data: 0.006 (0.029)
nan ( nan) Time: 0.289s, 1773.38/s (0.300s, 1705.98/s) LR: 1.673e-04 Data: 0.007 (0.021)
nan ( nan) Time: 0.273s, 1877.76/s (0.295s, 1733.59/s) LR: 1.673e-04 Data: 0.005 (0.018)
nan ( nan) Time: 0.268s, 1912.76/s (0.292s, 1752.17/s) LR: 1.673e-04 Data: 0.005 (0.015)
nan ( nan) Time: 0.285s, 1793.85/s (0.290s, 1764.29/s) LR: 1.673e-04 Data: 0.005 (0.014)
nan ( nan) Time: 0.281s, 1819.69/s (0.289s, 1769.46/s) LR: 1.673e-04 Data: 0.006 (0.013)
nan ( nan) Time: 0.268s, 1908.61/s (0.290s, 1767.59/s) LR: 1.673e-04 Data: 0.005 (0.012)
nan ( nan) Time: 0.287s, 1783.58/s (0.289s, 1773.71/s) LR: 1.673e-04 Data: 0.006 (0.011)
nan ( nan) Time: 0.285s, 1796.56/s (0.288s, 1778.22/s) LR: 1.673e-04 Data: 0.005 (0.011)
nan ( nan) Time: 0.280s, 1825.68/s (0.287s, 1781.91/s) LR: 1.673e-04 Data: 0.005 (0.010)
nan ( nan) Time: 0.275s, 1859.97/s (0.287s, 1785.50/s) LR: 1.673e-04 Data: 0.009 (0.010)
nan ( nan) Time: 0.278s, 1841.99/s (0.286s, 1788.40/s) LR: 1.673e-04 Data: 0.005 (0.010)
nan ( nan) Time: 0.275s, 1860.43/s (0.286s, 1790.68/s) LR: 1.673e-04 Data: 0.006 (0.009)
nan ( nan) Time: 0.287s, 1784.59/s (0.286s, 1792.93/s) LR: 1.673e-04 Data: 0.006 (0.009)
nan ( nan) Time: 0.277s, 1848.72/s (0.285s, 1794.68/s) LR: 1.673e-04 Data: 0.006 (0.009)
nan ( nan) Time: 0.286s, 1792.44/s (0.285s, 1795.76/s) LR: 1.673e-04 Data: 0.006 (0.009)
nan ( nan) Time: 0.279s, 1833.06/s (0.285s, 1795.15/s) LR: 1.673e-04 Data: 0.006 (0.008)
nan ( nan) Time: 0.277s, 1847.88/s (0.285s, 1795.23/s) LR: 1.673e-04 Data: 0.005 (0.008)
nan ( nan) Time: 0.286s, 1789.41/s (0.285s, 1796.69/s) LR: 1.673e-04 Data: 0.005 (0.008)
nan ( nan) Time: 0.277s, 1848.11/s (0.285s, 1798.21/s) LR: 1.673e-04 Data: 0.005 (0.008)
nan ( nan) Time: 0.284s, 1799.80/s (0.285s, 1799.40/s) LR: 1.673e-04 Data: 0.005 (0.008)
nan ( nan) Time: 0.285s, 1799.56/s (0.284s, 1800.19/s) LR: 1.673e-04 Data: 0.006 (0.008)
nan ( nan) Time: 0.294s, 1742.39/s (0.284s, 1801.04/s) LR: 1.673e-04 Data: 0.006 (0.008)
nan ( nan) Time: 0.285s, 1796.71/s (0.284s, 1802.07/s) LR: 1.673e-04 Data: 0.005 (0.008)
nan ( nan) Time: 0.274s, 1870.25/s (0.284s, 1802.85/s) LR: 1.673e-04 Data: 0.006 (0.008)
nan ( nan) Time: 0.271s, 1886.95/s (0.284s, 1803.84/s) LR: 1.673e-04 Data: 0.006 (0.008)
nan ( nan) Time: 0.288s, 1776.96/s (0.284s, 1804.18/s) LR: 1.673e-04 Data: 0.006 (0.008)
nan ( nan) Time: 0.282s, 1818.29/s (0.284s, 1802.31/s) LR: 1.673e-04 Data: 0.006 (0.007)
nan ( nan) Time: 0.262s, 1952.51/s (0.284s, 1803.01/s) LR: 1.673e-04 Data: 0.007 (0.007)

yuanli2333 · 2021-02-20T02:03:04Z

Hi, it's weird that the loss become NAN, you should:

Check the token_perofermer you used is our one or do you modify it?
If you modified the token_performer.py or some other .py file, you should check if you add an epsilon term to stablize the division like this line.

yix081 · 2021-02-20T04:30:52Z

It is a fresh download without changing any line of code. It is strange. I will run on another server to see if this is same.

MarkOkd · 2021-02-20T08:45:58Z

I faced the same issue when I trained T2t_vit_14 on food-101 dataset. Here's the code
Now, I am training the model without AMP, and it seems fine so far. So I think the cause is about mixed precision.

yuanli2333 · 2021-02-20T09:11:36Z

I faced the same issue when I trained T2t_vit_14 on food-101 dataset. Here's the code
Now, I am training the model without AMP, and it seems fine so far. So I think the cause is about mixed precision.

Very nice try!
It make sense that the AMP would cause NAN loss beacuse of the softmax operation in Performer layer (token_performer.py). I guess there are two solutions for this problem:

Disable AMP in some specifical GPUs. I have trained T2T-ViT on TitanV, 1080Ti, 2080Ti and V100, it can work fine, so AMP can work in these GPUs.
Modify this line in token_performer.py as:

return torch.exp((wtx - xd) - torch.max((wtx - xd), dim=-1, keepdim=True).values + self.epsilon) / math.sqrt(self.m)

which would stabilize the performer layer. (PS. I haven't tried the second solution.)

MarkOkd · 2021-02-20T11:37:49Z

I tried the second solution, but I still got NAN. the result is here
It's a little bit improvement, though.
Before the modification, NAN first appeared at Train: 1 [ 100/2367 ( 4%)]
After the modification, NAN first appeared at Train: 1 [ 600/2367 ( 25%)]

yuanli2333 · 2021-02-20T13:22:17Z

So currently we can disable amp by set the amp as False, or train T2T-ViT on some specifical GPUs: TitanV, 1080Ti, 2080Ti and V100.

MarkOkd · 2021-02-20T16:52:09Z

I think that it would be convenient to change the lines 217-218
from

parser.add_argument('--amp', action='store_true', default=True,
                    help='use NVIDIA Apex AMP or Native AMP for mixed precision training')

to

parser.add_argument('--disable_amp', action='store_true', default=False,
                    help='disable AMP')

and change the line 335
from

if args.amp:

to

if not args.disable_amp:

yuanli2333 · 2021-02-21T12:51:16Z

I have update how to disable --amp in our repo and modify this line from

parser.add_argument('--amp', action='store_true', default=True,
                    help='use NVIDIA Apex AMP or Native AMP for mixed precision training')

to

parser.add_argument('--amp', action='store_true', default=False,
                    help='use NVIDIA Apex AMP or Native AMP for mixed precision training')

so you can disable amp now by removing '--amp' in the training scripts.
I close the issue but welcome to reopen it if there are more questions about NAN loss.

yuanli2333 closed this as completed Feb 21, 2021

michuanhaohao mentioned this issue Mar 4, 2021

Nan during training even without '--amp' #26

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

NAN Loss for provided model #19

NAN Loss for provided model #19

yix081 commented Feb 19, 2021

yuanli2333 commented Feb 20, 2021

yix081 commented Feb 20, 2021

MarkOkd commented Feb 20, 2021

yuanli2333 commented Feb 20, 2021

MarkOkd commented Feb 20, 2021 •

edited

yuanli2333 commented Feb 20, 2021

MarkOkd commented Feb 20, 2021

yuanli2333 commented Feb 21, 2021

NAN Loss for provided model #19

NAN Loss for provided model #19

Comments

yix081 commented Feb 19, 2021

yuanli2333 commented Feb 20, 2021

yix081 commented Feb 20, 2021

MarkOkd commented Feb 20, 2021

yuanli2333 commented Feb 20, 2021

MarkOkd commented Feb 20, 2021 • edited

yuanli2333 commented Feb 20, 2021

MarkOkd commented Feb 20, 2021

yuanli2333 commented Feb 21, 2021

MarkOkd commented Feb 20, 2021 •

edited