New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Error when I train with train_gpu.py. (tf_records) #85
Comments
Please double check the directory |
@kimiyoung , Thanks for your response. I located at the same location and same message comes out again. I0701 13:58:11.560843 4427961792 train_gpu.py:317] n_token 32000 I might make a small mistake or is there any similar effect if I preprocessed in a wrong way? (There is no error though) |
look at the batch size... |
Thanks! even though that was not a main issue (upload old picture version tfrecords), it gives me hint for solving issues (I will share later). |
I have the same error. I ran
Can you help me ? |
I think it's just simple path issues. What you could easily do, you can just add 2~3 lines of codes on data_utils.py like below |
@AIscientist Thank you i have done it, |
@3NFBAGDU Yes, you can do that.
or
to get your input embeddings. |
I'm still struggling with this issue, if possible could you post your data_utils.py or email it to me please ? @3NFBAGDU @AIscientist |
@abhi060698 def get_input_fn(
tfrecord_dir,
split,
bsz_per_host,
seq_len,
reuse_len,
bi_data,
num_hosts=1,
num_core_per_host=1,
perm_size=None,
mask_alpha=None,
mask_beta=None,
uncased=False,
num_passes=None,
use_bfloat16=False,
num_predict=None):
# Merge all record infos into a single one
record_glob_base = format_filename(
prefix="record_info-{}-*".format(split),
bsz_per_host=bsz_per_host,
seq_len=seq_len,
bi_data=bi_data,
suffix="json",
mask_alpha=mask_alpha,
mask_beta=mask_beta,
reuse_len=reuse_len,
uncased=uncased,
fixed_num_predict=num_predict)
record_info = {"num_batch": 0, "filenames": []}
tfrecord_dirs = tfrecord_dir.split(",")
tf.logging.info("Use the following tfrecord dirs: %s", tfrecord_dirs)
for idx, record_dir in enumerate(tfrecord_dirs):
record_glob = os.path.join(record_dir, record_glob_base)
tf.logging.info("[%d] Record glob: %s", idx, record_glob)
record_paths = sorted(tf.gfile.Glob(record_glob))
## File load error -> maunual change
import glob
# path = 'xlnet_cased_KR_3/tfrecords/'
path = 'xlnet_cased_JP_12/tfrecords/'
# path = 'xlnet_cased_JP_3/tfrecords/'
record_paths = [f for f in glob.glob(path + "*.json", recursive=True)]
tf.logging.info(record_paths)
tf.logging.info("[%d] Num of record info path: %d",
idx, len(record_paths))
cur_record_info = {"num_batch": 0, "filenames": []}
for record_info_path in record_paths:
if num_passes is not None:
record_info_name = os.path.basename(record_info_path)
fields = record_info_name.split(".")[0].split("-")
pass_id = int(fields[-1])
if len(fields) == 5 and pass_id >= num_passes:
tf.logging.info("Skip pass %d: %s", pass_id, record_info_name)
continue
with tf.gfile.Open(record_info_path, "r") as fp:
info = json.load(fp)
if num_passes is not None:
eff_num_passes = min(num_passes, len(info["filenames"]))
ratio = eff_num_passes / len(info["filenames"])
cur_record_info["num_batch"] += int(info["num_batch"] * ratio)
cur_record_info["filenames"] += info["filenames"][:eff_num_passes]
else:
cur_record_info["num_batch"] += info["num_batch"]
cur_record_info["filenames"] += info["filenames"]
# overwrite directory for `cur_record_info`
new_filenames = []
for filename in cur_record_info["filenames"]:
basename = os.path.basename(filename)
new_filename = os.path.join(record_dir, basename)
new_filenames.append(new_filename)
cur_record_info["filenames"] = new_filenames
tf.logging.info("[Dir %d] Number of chosen batches: %s",
idx, cur_record_info["num_batch"])
tf.logging.info("[Dir %d] Number of chosen files: %s",
idx, len(cur_record_info["filenames"]))
tf.logging.info(cur_record_info["filenames"])
# add `cur_record_info` to global `record_info`
record_info["num_batch"] += cur_record_info["num_batch"]
record_info["filenames"] += cur_record_info["filenames"]
tf.logging.info("Total number of batches: %d",
record_info["num_batch"])
tf.logging.info("Total number of files: %d",
len(record_info["filenames"]))
tf.logging.info(record_info["filenames"]) |
Hi, thanks for your contribution.
I was trying to preprocess my own data and training own my gpu machine, but after I created the tfrecords file from the wiki data and try to run train, it does not work with "TypeError:
filenames
must be atf.data.Dataset
oftf.string
elements."It seems like simple dir issues or tfrecords does not complete correctly since I saw on the logs num of record info path is zero.
Does anyone have a similar issue?
Appreciate for help.
I0629 08:26:34.599769 4393649600 tf_logging.py:115] n_token 32000
I0629 08:26:34.600133 4393649600 tf_logging.py:115] Use the following tfrecord dirs: ['data_out3/tfrecords']
I0629 08:26:34.600275 4393649600 tf_logging.py:115] [0] Record glob: data_out3/tfrecords/record_info-train-*.bsz-16.seqlen-128.reuse-64.bi.alpha-6.beta-1.fnp-85.json
I0629 08:26:34.600965 4393649600 tf_logging.py:115] [0] Num of record info path: 0
I0629 08:26:34.601068 4393649600 tf_logging.py:115] [Dir 0] Number of chosen batches: 0
I0629 08:26:34.601134 4393649600 tf_logging.py:115] [Dir 0] Number of chosen files: 0
I0629 08:26:34.601197 4393649600 tf_logging.py:115] []
I0629 08:26:34.601253 4393649600 tf_logging.py:115] Total number of batches: 0
I0629 08:26:34.601778 4393649600 tf_logging.py:115] Total number of files: 0
I0629 08:26:34.601840 4393649600 tf_logging.py:115] []
I0629 08:26:34.601900 4393649600 tf_logging.py:115] num of batches 0
I0629 08:26:34.601970 4393649600 tf_logging.py:115] Host 0 handles 0 files
Traceback (most recent call last):
File "train_gpu.py", line 328, in
tf.app.run()
File "/Users/user/tf110/lib/python3.6/site-packages/tensorflow/python/platform/app.py", line 125, in run
_sys.exit(main(argv))
File "train_gpu.py", line 324, in main
train("/gpu:0")
File "train_gpu.py", line 212, in train
train_set = train_input_fn(params)
File "/Users/user/xlnet/data_utils.py", line 868, in input_fn
num_predict=num_predict)
File "/Users/user/xlnet/data_utils.py", line 757, in get_dataset
bsz_per_core=bsz_per_core)
File "/Users/user/xlnet/data_utils.py", line 566, in parse_files_to_dataset
dataset = tf.data.TFRecordDataset(dataset)
File "/Users/user/tf110/lib/python3.6/site-packages/tensorflow/python/data/ops/readers.py", line 194, in init
"
filenames
must be atf.data.Dataset
oftf.string
elements.")TypeError:
filenames
must be atf.data.Dataset
oftf.string
elements.The text was updated successfully, but these errors were encountered: