Problem with loss = nan during training #214

YoYukeJa · 2020-03-26T12:52:16Z

Hello, I've been trying to get a custom dataset (11k training images and 1k validation images) trained with this model but I always get a loss which is nan after a while.

Eager_tf mode:

I0326 12:39:25.933575 140234859489088 train.py:183] 1_train_84, 223747.34375, [15194.2705, 36603.82, 171938.5]
I0326 12:39:27.360369 140234859489088 train.py:183] 1_train_85, 223252.890625, [15160.514, 36423.156, 171658.48]
I0326 12:39:28.780844 140234859489088 train.py:183] 1_train_86, 223706.375, [15277.884, 36469.22, 171948.52]
I0326 12:39:30.230567 140234859489088 train.py:183] 1_train_87, 223358.9375, [15199.5625, 36386.203, 171762.4]
I0326 12:39:31.832662 140234859489088 train.py:183] 1_train_88, 223327.765625, [15265.209, 36317.28, 171734.5]
I0326 12:39:33.261120 140234859489088 train.py:183] 1_train_89, nan, [nan, 36458.277, 171607.92]
I0326 12:39:34.778065 140234859489088 train.py:183] 1_train_90, nan, [nan, nan, nan]
I0326 12:39:36.280590 140234859489088 train.py:183] 1_train_91, nan, [nan, nan, nan]

I also went and checked my tfrecords (which I create using a slightly edited voc2012.py) but I don't see anything wrong with the outputs:

features {
  feature {
    key: "image/encoded"
    value {
      bytes_list {
        value: "\377\330\377\340\000\020JFIF\000\001\001\000\000\001\000\001\000\000\377\333\000C\000\010\006\006\007\006\005\010\007\007\007\t\t\010\n\014\024\r\014\013\013\014\031\022\023\017\024\035\032\037\036\035\032\034\034 $.\' \",#\034\034(7),01444\037\'9=82<.342\377\333\000C\001\t\t\t\014\013\014\030\r\r\0302!\034!22222222222222222222222222222222222222222222222222\377\300\000\021\010\001\217\002X\003\001\"\000\002\021\001\003\021\001\377\304\000\037\000\000\001\005\00 EDITED THE REST OUT
      }
    }
  }
  feature {
    key: "image/filename"
    value {
      bytes_list {
        value: "koffer_02407.jpg"
      }
    }
  }
  feature {
    key: "image/format"
    value {
      bytes_list {
        value: "jpeg"
      }
    }
  }
  feature {
    key: "image/height"
    value {
      int64_list {
        value: 399
      }
    }
  }
  feature {
    key: "image/key/sha256"
    value {
      bytes_list {
        value: "84a55039a36c917657c916af1deb34548eee3cee6bf44f5eb59ee8cc5ffcfd23"
      }
    }
  }
  feature {
    key: "image/object/bbox/xmax"
    value {
      float_list {
        value: 0.9362444877624512
        value: 0.9292577505111694
      }
    }
  }
  feature {
    key: "image/object/bbox/xmin"
    value {
      float_list {
        value: 0.7266375422477722
        value: 0.74410480260849
      }
    }
  }
  feature {
    key: "image/object/bbox/ymax"
    value {
      float_list {
        value: 0.681598424911499
        value: 0.9622541666030884
      }
    }
  }
  feature {
    key: "image/object/bbox/ymin"
    value {
      float_list {
        value: 0.5006147623062134
        value: 0.7183197736740112
      }
    }
  }
  feature {
    key: "image/object/class/label"
    value {
      int64_list {
        value: 17
        value: 17
      }
    }
  }
  feature {
    key: "image/object/class/text"
    value {
      bytes_list {
        value: "koffer"
        value: "koffer"
      }
    }
  }
  feature {
    key: "image/object/difficult"
    value {
      int64_list {
        value: 0
        value: 0
      }
    }
  }
  feature {
    key: "image/object/truncated"
    value {
      int64_list {
        value: 0
        value: 0
      }
    }
  }
  feature {
    key: "image/object/view"
    value {
      bytes_list {
        value: "Unspecified"
        value: "Unspecified"
      }
    }
  }
  feature {
    key: "image/source_id"
    value {
      bytes_list {
        value: "koffer_02407.jpg"
      }
    }
  }
  feature {
    key: "image/width"
    value {
      int64_list {
        value: 600
      }
    }
  }
}

In my classes.names file I have 29 classes and 'koffer' is indeed on line 17.
If need be I could also post the edited version of the voc2012 however I onnly edited the reading of files because my folder layout is a bit different from what's been posted here.

I've noticed that some xmin or ymin values can be really small e.g.:

feature {
    key: "image/object/bbox/xmin"
    value {
      float_list {
        value: 3.964704228565097e-05
      }
    }
  }

But I don't think that's the source of the problem. I've also checked with #128 but my annotations of the .xml files already have xmin, xmax, ymin and ymax that are correct (so not out of bounds or anything).

The text was updated successfully, but these errors were encountered:

PieroCV · 2020-03-26T23:06:40Z

Hi!
What software did you use to generate the tfrecord files?

YoYukeJa · 2020-03-27T08:36:59Z

Hi!
What software did you use to generate the tfrecord files?

I used VOTT to export it to xml (pascal voc format) but I noticed some values were out of bounds so I wrote a script to correct those mistakes.

YoYukeJa · 2020-04-09T10:54:32Z

No possible answer for my problem?

PieroCV · 2020-04-09T13:21:23Z

Sorry for late reply...
I will cite #185 in order to help you.

First, i would like to know how are you getting your data:

1.Are you using VoTT,LabelImage or something else to generate your tfrecord files? #You just say that you were using Vott so no need to answer.
2.Did you modify the repo (I saw Binnary Crossentropy modifications, but it is not necessary)?
3.What is the content of your .names file?
4.Did you pass the parameters correctly when training?
5.Could you verify the content of one tfrecord file?
For 5 point, use this:

filenames = ["<filename>"] #Replace here
raw_dataset = tf.data.TFRecordDataset(filenames)  
for raw_record in raw_dataset.take(1):  
  example = tf.train.Example()  
  example.ParseFromString(raw_record.numpy())  
  print(example)

I hope you could answer as soon as possible in order to help you.

YoYukeJa · 2020-04-09T16:57:17Z

No problem!

1 and 2: We used VoTT to create pascalvoc format annotations for the images (we started with an older tensorflow a year ago or so that required this format). These were then converted using a slightly edited voc2012.py script:

import time
import os
from os import listdir
from os.path import isdir, isfile, join

import hashlib

from absl import app, flags, logging
from absl.flags import FLAGS
import tensorflow as tf
import lxml.etree
import tqdm
import math
import time


# Set the basemap from which we're working 
# This is different for each device! Change accordingly!
AIBASEMAP = '/notebooks/AIBASEMAP/Code/yolov3-tf2-master/'                         #XXX JupyterLab

# Set the image map from which we're working
IMAGEBASEMAP = '/notebooks/AIBASEMAP/Images/Trash2_updated/Trash/'


flags.DEFINE_string('data_dir', IMAGEBASEMAP, #'Output/voc2012_raw/VOCdevkit/VOC2012/',
                    IMAGEBASEMAP)
flags.DEFINE_enum('split', 'train', [
                  'train', 'val'], 'train') #'specify train or val spit')
flags.DEFINE_string('output_file_train', IMAGEBASEMAP + '/voc2012_train.tfrecord', IMAGEBASEMAP)
flags.DEFINE_string('output_file_val', IMAGEBASEMAP + '/voc2012_val.tfrecord', IMAGEBASEMAP)
flags.DEFINE_string('classes', AIBASEMAP + 'Images/voc2012.names', AIBASEMAP + 'Images/voc2012.names')

def build_example(annotation, class_map, path, i):
    check = True
    img_path = path + '.jpg'
    img_raw = open(img_path, 'rb').read()
    key = hashlib.sha256(img_raw).hexdigest()

    width = int(annotation['size']['width'])
    height = int(annotation['size']['height'])

    xmin = []
    ymin = []
    xmax = []
    ymax = []
    classes = []
    classes_text = []
    truncated = []
    views = []
    difficult_obj = []
    if 'object' in annotation:
        for obj in annotation['object']:
            difficult = bool(int(obj['difficult']))
            difficult_obj.append(int(difficult))

            xmin.append(float(obj['bndbox']['xmin']) / width)
            ymin.append(float(obj['bndbox']['ymin']) / height)
            xmax.append(float(obj['bndbox']['xmax']) / width)
            ymax.append(float(obj['bndbox']['ymax']) / height)
            classes_text.append(obj['name'].encode('utf8'))
            classes.append(class_map[obj['name']])
            truncated.append(int(obj['truncated']))
            views.append(obj['pose'].encode('utf8'))

    example = tf.train.Example(features=tf.train.Features(feature={
        'image/height': tf.train.Feature(int64_list=tf.train.Int64List(value=[height])),
        'image/width': tf.train.Feature(int64_list=tf.train.Int64List(value=[width])),
        'image/filename': tf.train.Feature(bytes_list=tf.train.BytesList(value=[
            annotation['filename'].encode('utf8')])),
        'image/source_id': tf.train.Feature(bytes_list=tf.train.BytesList(value=[
            annotation['filename'].encode('utf8')])),
        'image/key/sha256': tf.train.Feature(bytes_list=tf.train.BytesList(value=[key.encode('utf8')])),
        'image/encoded': tf.train.Feature(bytes_list=tf.train.BytesList(value=[img_raw])),
        'image/format': tf.train.Feature(bytes_list=tf.train.BytesList(value=['jpeg'.encode('utf8')])),
        'image/object/bbox/xmin': tf.train.Feature(float_list=tf.train.FloatList(value=xmin)),
        'image/object/bbox/xmax': tf.train.Feature(float_list=tf.train.FloatList(value=xmax)),
        'image/object/bbox/ymin': tf.train.Feature(float_list=tf.train.FloatList(value=ymin)),
        'image/object/bbox/ymax': tf.train.Feature(float_list=tf.train.FloatList(value=ymax)),
        'image/object/class/text': tf.train.Feature(bytes_list=tf.train.BytesList(value=classes_text)),
        'image/object/class/label': tf.train.Feature(int64_list=tf.train.Int64List(value=classes)),
        'image/object/difficult': tf.train.Feature(int64_list=tf.train.Int64List(value=difficult_obj)),
        'image/object/truncated': tf.train.Feature(int64_list=tf.train.Int64List(value=truncated)),
        'image/object/view': tf.train.Feature(bytes_list=tf.train.BytesList(value=views)),
    }))
    return example


def parse_xml(xml):
    if not len(xml):
        return {xml.tag: xml.text}
    result = {}
    for child in xml:
        child_result = parse_xml(child)
        if child.tag != 'object':
            result[child.tag] = child_result[child.tag]
        else:
            if child.tag not in result:
                result[child.tag] = []
            result[child.tag].append(child_result[child.tag])
    return {xml.tag: result}

def conversion(image_list, writer, class_map):
    i = 0
    for image in tqdm.tqdm(image_list):
        i += 1
        temp = image
        try:
            image = image.split('.')[0]
            path_name = image.replace('.', '')
            annotation_xml = path_name + '.xml'
            annotation_xml = lxml.etree.fromstring(open(annotation_xml, 'rb').read())
            annotation = parse_xml(annotation_xml)['annotation']
            tf_example = build_example(annotation, class_map, path_name, i)
            writer.write(tf_example.SerializeToString())
            #print(tf_example)
        except Exception as exception:
            print('\nEr is een fout bij: ', path_name)
            with open('exception.log', '+w') as error_log_file:
                error_log_file.write(str(exception))
            quit()
            # check for XML syntax errors
        except etree.XMLSyntaxError as err:
            print('\nXML Syntax Error, see error_syntax.log, Error in: ', path_name)
            with open('error_syntax.log', '+a') as error_log_file:
                error_log_file.write('\nMistake found in: ', path_name)
                error_log_file.write(str(err.error_log))
            quit()


def main(_argv):
    class_map = {name: idx for idx, name in enumerate(
        open(FLAGS.classes).read().splitlines())}
    logging.info("Class mapping loaded: %s", class_map)

    directories = [f for f in listdir(FLAGS.data_dir) if isdir(join(FLAGS.data_dir, f))]
    print('Start trainingsdata conversion.')
    writer = tf.io.TFRecordWriter(FLAGS.output_file_train)
    image_list = open(os.path.join(
        FLAGS.data_dir, 'solutions_%s.txt' % FLAGS.split)).read().splitlines()
    logging.info("Image list loaded: %d", len(image_list))
    conversion(image_list, writer, class_map)
    writer.close()
    print('Done')
    print('Start validationdata conversion.')
    writer = tf.io.TFRecordWriter(FLAGS.output_file_val)
    image_list = open(os.path.join(
        FLAGS.data_dir, 'validation_data.txt' )).read().splitlines()
    logging.info("Image list loaded: %d", len(image_list))
    conversion(image_list, writer, class_map)
    writer.close()
    logging.info("Done")


if __name__ == '__main__':
    app.run(main)

And I also added this to train.py:

gpus = tf.config.experimental.list_physical_devices('GPU')
if gpus:
  # Restrict TensorFlow to only use the first GPU
  try:
    #tf.config.experimental.set_visible_devices(gpus[0], 'GPU')
    logical_gpus = tf.config.experimental.list_logical_devices('GPU')
    print(len(gpus), "Physical GPUs,", len(logical_gpus), "Logical GPU")
  except RuntimeError as e:
    # Visible devices must be set before GPUs have been initialized
    print(e)

3: The classnames are in Dutch so there's that but the content of the .names file is:
afval_lachgaspatronen
afval_papier
afvalzak_gft
afvalzak_overige_donker
afvalzak_overige_licht
afvalzak_pmd
afvalzak_restafval
autoband
bruingoed
buggy
dier_kat_dood
dier_kat_ok
dier_vogel_dood
dier_vogel_ok
groen_kerstboom
houtafval
koffer
meubel_matras
meubel_stoel
meubel_stoel_bureau
meubel_zetel
sluikstort
vervoer_fiets_ok
vervoer_fiets_wrak
witgoed_koelkast_amerikaans
witgoed_koelkast_retro
witgoed_koelkast_standaard
witgoed_microgolf
witgoed_wasdroog

4: I think I did, however I did hardcode them into train.py (I will change that if it's necessary and try again but this has always worked in the past):

AIBASEMAP = '/notebooks/AIBASEMAP/Code/tensorflow2-yolov3/'             #XXX JupyterLab
IMAGEBASEMAP = '/notebooks/AIBASEMAP/Images/Trash2_updated/Trash/'
MODELBASEMAP = '/notebooks/AIBASEMAP/Models/tensorflow2-yolov3/'
classes_path = MODELBASEMAP + 'coco.names'

flags.DEFINE_string('dataset', IMAGEBASEMAP + 'voc2012_train.tfrecord', 'path to dataset')
flags.DEFINE_string('val_dataset', IMAGEBASEMAP + 'voc2012_val.tfrecord', 'path to validation dataset')
flags.DEFINE_boolean('tiny', False, 'yolov3 or yolov3-tiny')
flags.DEFINE_string('weights', MODELBASEMAP + 'yolov3.tf',
                    'path to weights file')
flags.DEFINE_string('classes', classes_path, 'path to classes file')
flags.DEFINE_enum('mode', 'eager_fit', ['fit', 'eager_fit', 'eager_tf'],
                  'fit: model.fit, '
                  'eager_fit: model.fit(run_eagerly=True), '
                  'eager_tf: custom GradientTape')
flags.DEFINE_enum('transfer', 'none',
                  ['none', 'darknet', 'no_output', 'frozen', 'fine_tune'],
                  'none: Training from scratch, '
                  'darknet: Transfer darknet, '
                  'no_output: Transfer all but output, '
                  'frozen: Transfer and freeze all, '
                  'fine_tune: Transfer all and freeze darknet only')
flags.DEFINE_integer('size', 416, 'image size')
flags.DEFINE_integer('epochs', 10, 'number of epochs')
flags.DEFINE_integer('batch_size', 8, 'batch size')
flags.DEFINE_float('learning_rate', 1e-3, 'learning rate')
flags.DEFINE_integer('num_classes', 29, 'number of classes in the model')
flags.DEFINE_integer('weights_num_classes', None, 'specify num class for `weights` file if different, '
                     'useful in transfer learning with different number of classes')

For point 5, I already added the content of a tfrecord file in the opening post (excluded the encoded part). It outputs the same information as using your script.

PieroCV · 2020-04-09T17:23:47Z

Okey, it will take me a little... Now i'm preparing my Lunch :P, but when this ends i will start to analize your code

YoYukeJa · 2020-04-10T10:49:16Z

No problem! Will also keep looking for the problem.

PieroCV · 2020-04-10T16:50:12Z

Ok, the only issue that I could suppose is that VoTT returned corrupted files...
There are two ways that this could happen:

You use activate autolabel and then deactivate (process)
Ignore this case if you didn't use this tool from VoTT.
Some rectangles didn't match with the image (bug)

In order to solve that, you need to change your code to generate single tfrecords for image.
Then train from batches {1,2,4,8,16...} in a single epoch. If the file is corrupted then NaN loss will appear.

I'll give you the code that I use to convert pascal voc format to single tfrecord files in the next comment beacuse i'm on a different PC :P.
Hope this could help you.

PieroCV · 2020-04-10T17:03:14Z

import hashlib
import io
import logging
import os
from lxml import etree
import PIL.Image
import tensorflow as tf

import os
import glob
import pandas as pd
import xml.etree.ElementTree as ET

import os
import io
import pandas as pd
import tensorflow as tf
import hashlib

from PIL import Image
import dataset_util
from collections import namedtuple, OrderedDict

import random

## GENERATE CSV##
def xml_to_csv(path):
    xml_list = []
    for xml_file,filename in [(path+"/"+a,a )for a in os.listdir(path) if ".xml" in a]:
        tree = ET.parse(xml_file)
        root = tree.getroot()
        for member in root.findall('object'):
            value = (filename.replace(".xml",".jpg"),
                     int(root.find('size')[0].text),
                     int(root.find('size')[1].text),
                     member[0].text,
                     int(member[4][0].text),
                     int(member[4][1].text),
                     int(member[4][2].text),
                     int(member[4][3].text)
                     )
            xml_list.append(value)
    column_name = ['filename', 'width', 'height', 'class', 'xmin', 'ymin', 'xmax', 'ymax']
    xml_df = pd.DataFrame(xml_list, columns=column_name)
    return xml_df


def create_csv():
    for folder in ['train','test']: #Modify if you have more than one folder
        image_path = os.path.join(os.getcwd(), ('images/' + folder)) #Images path
        xml_df = xml_to_csv(image_path)
        xml_df.to_csv(('images/' + folder + '_labels.csv'), index=None)
        print('Successfully converted xml to csv.')

create_csv()

##GENERATE TFRECORDS##
from collections import namedtuple


def create_function(lista_tags): 
    def class_text_to_int(row_label):
        if row_label in lista_tags:
            return lista_tags.index(row_label)+1
        else:
            None
    return class_text_to_int
class_text_to_int = create_function(list) #Tag list HERE

def split(df, group):
    data = namedtuple('data', ['filename', 'object'])
    gb = df.groupby(group)
    return [data(filename, gb.get_group(x)) for filename, x in zip(gb.groups.keys(), gb.groups)]
def create_tf_example(group, path):
    with tf.compat.v1.gfile.GFile(os.path.join(path, '{}'.format(group.filename)), 'rb') as fid:
        encoded_jpg = fid.read()
    encoded_jpg_io = io.BytesIO(encoded_jpg)
    image = Image.open(encoded_jpg_io)
    width, height = image.size
    key = hashlib.sha256(encoded_jpg).hexdigest()

    filename = group.filename.encode('utf8')
    image_format = b'jpg'
    xmins = []
    xmaxs = []
    ymins = []
    ymaxs = []
    classes_text = []
    classes = []
    difficult_obj = []
    truncated = []
    poses = []

    for i,(index,row) in enumerate(group.object.iterrows()):
        if i == 100:
            break
        xmins.append(row['xmin'] / width)
        xmaxs.append(row['xmax'] / width)
        ymins.append(row['ymin'] / height)
        ymaxs.append(row['ymax'] / height)
        classes_text.append(row['class'].encode('utf8'))
        classes.append(class_text_to_int(row['class']))
        difficult_obj.append(int(False))
        truncated.append(int(False))
        poses.append("Unspecified".encode('utf8'))
        
    tf_example = tf.train.Example(features=tf.train.Features(feature={
        'image/height': dataset_util.int64_feature(height),
        'image/width': dataset_util.int64_feature(width),
        'image/key/sha256': dataset_util.bytes_feature(key.encode('utf8')),
        'image/filename': dataset_util.bytes_feature(filename),
        'image/source_id': dataset_util.bytes_feature(filename),
        'image/encoded': dataset_util.bytes_feature(encoded_jpg),
        'image/format': dataset_util.bytes_feature(image_format),
        'image/object/bbox/xmin': dataset_util.float_list_feature(xmins),
        'image/object/bbox/xmax': dataset_util.float_list_feature(xmaxs),
        'image/object/bbox/ymin': dataset_util.float_list_feature(ymins),
        'image/object/bbox/ymax': dataset_util.float_list_feature(ymaxs),
        'image/object/class/text': dataset_util.bytes_list_feature(classes_text),
        'image/object/difficult': dataset_util.int64_list_feature(difficult_obj),
        'image/object/truncated': dataset_util.int64_list_feature(truncated),
        'image/object/view': dataset_util.bytes_list_feature(poses),
        'image/object/class/label': dataset_util.int64_list_feature(classes),
    }))
    return tf_example

path = os.path.join(os.getcwd(), "./images/train") # PATH Imagenes

examples = pd.read_csv("./02_17_20.csv") #Path CSV

grouped = split(examples, 'filename')

for group in grouped:
    a = "./tfrecords/"+list(group.object.filename)[0].replace(".jpg","")+".tfrecord" #PATH tfrecords
    writer = tf.compat.v1.python_io.TFRecordWriter(a)
    tf_example = create_tf_example(group, path)
    writer.write(tf_example.SerializeToString())
    writer.close()

Also, you need dataset_util.py that contains:

import tensorflow as tf

def int64_feature(value):
  return tf.train.Feature(int64_list=tf.train.Int64List(value=[value]))

def int64_list_feature(value):
  return tf.train.Feature(int64_list=tf.train.Int64List(value=value))

def bytes_feature(value):
  return tf.train.Feature(bytes_list=tf.train.BytesList(value=[value]))

def bytes_list_feature(value):
  return tf.train.Feature(bytes_list=tf.train.BytesList(value=value))

def float_list_feature(value):
  return tf.train.Feature(float_list=tf.train.FloatList(value=value))


def read_examples_list(path):
  with tf.gfile.GFile(path) as fid:
    lines = fid.readlines()
  return [line.strip().split(' ')[0] for line in lines]


def recursive_parse_xml_to_dict(xml):
  if not xml:
    return {xml.tag: xml.text}
  result = {}
  for child in xml:
    child_result = recursive_parse_xml_to_dict(child)
    if child.tag != 'object':
      result[child.tag] = child_result[child.tag]
    else:
      if child.tag not in result:
        result[child.tag] = []
      result[child.tag].append(child_result[child.tag])
  return {xml.tag: result}

Is needless to say that those are not my scripts (The first script I modified for my convenience... The second one too i guess :P). That's why you could see multiple equal imports and it's because it's a kinda old version of my code (The new version is very specific, that's why i didn't paste that code here). The original material is from [here].(https://github.com/EdjeElectronics/TensorFlow-Object-Detection-API-Tutorial-Train-Multiple-Objects-Windows-10)
Hope this could help you.

YoYukeJa · 2020-04-12T08:52:21Z

Thank you very much!
I've started training now with the instructions that you gave me (and of course with multiple tfrecords now!).
I'll post an update as soon as I have one.

YoYukeJa · 2020-04-12T09:49:00Z

I think I might have found the mistake in my annotation boxes. Sometimes ymin is larger than ymax and sometimes ymax is larger than ymin. Do you think this could be the cause of the problem? And if so, which way should I save them as, ymax the larger one or ymin the larger one?

MiXaiLL76 · 2020-04-15T10:38:27Z

I think I might have found the mistake in my annotation boxes. Sometimes ymin is larger than ymax and sometimes ymax is larger than ymin. Do you think this could be the cause of the problem? And if so, which way should I save them as, ymax the larger one or ymin the larger one?

Yes, this is just a mistake.
I had the same problem when there were errors in the annotations.

YoYukeJa · 2020-04-20T08:17:04Z

Problems have been fixed by the solutions offered in this thread. Closing the issue now.

YoYukeJa closed this as completed Apr 20, 2020

YoYukeJa mentioned this issue Aug 6, 2020

Problem with tf.train.FloatList tensorflow/tensorflow#37794

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Problem with loss = nan during training #214

Problem with loss = nan during training #214

YoYukeJa commented Mar 26, 2020 •

edited

PieroCV commented Mar 26, 2020

YoYukeJa commented Mar 27, 2020

YoYukeJa commented Apr 9, 2020

PieroCV commented Apr 9, 2020

YoYukeJa commented Apr 9, 2020 •

edited

PieroCV commented Apr 9, 2020

YoYukeJa commented Apr 10, 2020

PieroCV commented Apr 10, 2020

PieroCV commented Apr 10, 2020

YoYukeJa commented Apr 12, 2020

YoYukeJa commented Apr 12, 2020

MiXaiLL76 commented Apr 15, 2020

YoYukeJa commented Apr 20, 2020

Problem with loss = nan during training #214

Problem with loss = nan during training #214

Comments

YoYukeJa commented Mar 26, 2020 • edited

PieroCV commented Mar 26, 2020

YoYukeJa commented Mar 27, 2020

YoYukeJa commented Apr 9, 2020

PieroCV commented Apr 9, 2020

YoYukeJa commented Apr 9, 2020 • edited

PieroCV commented Apr 9, 2020

YoYukeJa commented Apr 10, 2020

PieroCV commented Apr 10, 2020

PieroCV commented Apr 10, 2020

YoYukeJa commented Apr 12, 2020

YoYukeJa commented Apr 12, 2020

MiXaiLL76 commented Apr 15, 2020

YoYukeJa commented Apr 20, 2020

YoYukeJa commented Mar 26, 2020 •

edited

YoYukeJa commented Apr 9, 2020 •

edited