WARNING: Logging before flag parsing goes to stderr. W0422 04:50:51.221502 140630240360192 deprecation_wrapper.py:119] From /tmp/lingvo/bazel-bin/lingvo/trainer.runfiles/__main__/lingvo/core/ops/py_x_ops.py:26: The name tf.resource_loader.get_path_to_datafile is deprecated. Please use tf.compat.v1.resource_loader.get_path_to_datafile instead. W0422 04:50:51.240633 140630240360192 deprecation_wrapper.py:119] From /tmp/lingvo/bazel-bin/lingvo/trainer.runfiles/__main__/lingvo/core/py_utils.py:1234: The name tf.get_variable_scope is deprecated. Please use tf.compat.v1.get_variable_scope instead. W0422 04:50:51.334028 140630240360192 deprecation_wrapper.py:119] From /tmp/lingvo/bazel-bin/lingvo/trainer.runfiles/__main__/lingvo/trainer.py:1554: The name tf.app.run is deprecated. Please use tf.compat.v1.app.run instead. W0422 04:50:51.334525 140630240360192 deprecation_wrapper.py:119] From /tmp/lingvo/bazel-bin/lingvo/trainer.runfiles/__main__/lingvo/model_imports.py:46: The name tf.logging.info is deprecated. Please use tf.compat.v1.logging.info instead. I0422 04:50:51.334608 140630240360192 model_imports.py:46] Importing lingvo.tasks.asr.params W0422 04:50:51.350225 140630240360192 deprecation_wrapper.py:119] From /tmp/lingvo/bazel-bin/lingvo/trainer.runfiles/__main__/lingvo/model_registry.py:121: The name tf.logging.debug is deprecated. Please use tf.compat.v1.logging.debug instead. I0422 04:50:51.350325 140630240360192 model_registry.py:124] Registering models from module: lingvo.tasks.asr.params.librispeech I0422 04:50:51.352879 140630240360192 model_imports.py:46] Importing lingvo.tasks.image.params I0422 04:50:51.354234 140630240360192 model_registry.py:124] Registering models from module: lingvo.tasks.image.params.mnist I0422 04:50:51.354332 140630240360192 model_imports.py:46] Importing lingvo.tasks.lm.params I0422 04:50:51.355633 140630240360192 model_registry.py:124] Registering models from module: lingvo.tasks.lm.params.one_billion_wds I0422 04:50:51.357244 140630240360192 model_imports.py:46] Importing lingvo.tasks.mt.params I0422 04:50:51.360419 140630240360192 model_registry.py:124] Registering models from module: lingvo.tasks.mt.params.wmt14_en_de I0422 04:50:51.364412 140630240360192 model_registry.py:124] Registering models from module: lingvo.tasks.mt.params.wmtm16_en_de I0422 04:50:51.364511 140630240360192 model_imports.py:46] Importing lingvo.tasks.punctuator.params I0422 04:50:51.365694 140630240360192 model_registry.py:124] Registering models from module: lingvo.tasks.punctuator.params.codelab W0422 04:50:51.365833 140630240360192 deprecation_wrapper.py:119] From /tmp/lingvo/bazel-bin/lingvo/trainer.runfiles/__main__/lingvo/trainer.py:1513: The name tf.logging.set_verbosity is deprecated. Please use tf.compat.v1.logging.set_verbosity instead. W0422 04:50:51.365932 140630240360192 deprecation_wrapper.py:119] From /tmp/lingvo/bazel-bin/lingvo/trainer.runfiles/__main__/lingvo/trainer.py:1513: The name tf.logging.INFO is deprecated. Please use tf.compat.v1.logging.INFO instead. W0422 04:50:51.366064 140630240360192 deprecation_wrapper.py:119] From /tmp/lingvo/bazel-bin/lingvo/trainer.runfiles/__main__/lingvo/trainer.py:1381: The name tf.train.Server is deprecated. Please use tf.distribute.Server instead. 2019-04-22 04:50:51.366349: I tensorflow/core/platform/cpu_feature_guard.cc:142] Your CPU supports instructions that this TensorFlow binary was not compiled to use: AVX2 FMA 2019-04-22 04:50:51.384792: I tensorflow/stream_executor/platform/default/dso_loader.cc:43] Successfully opened dynamic library libcuda.so.1 2019-04-22 04:50:54.159227: I tensorflow/compiler/xla/service/service.cc:168] XLA service 0x69c4ee0 executing computations on platform CUDA. Devices: 2019-04-22 04:50:54.159318: I tensorflow/compiler/xla/service/service.cc:175] StreamExecutor device (0): Tesla P100-PCIE-16GB, Compute Capability 6.0 2019-04-22 04:50:54.159340: I tensorflow/compiler/xla/service/service.cc:175] StreamExecutor device (1): Tesla P100-PCIE-16GB, Compute Capability 6.0 2019-04-22 04:50:54.159357: I tensorflow/compiler/xla/service/service.cc:175] StreamExecutor device (2): Tesla P100-PCIE-16GB, Compute Capability 6.0 2019-04-22 04:50:54.159388: I tensorflow/compiler/xla/service/service.cc:175] StreamExecutor device (3): Tesla P100-PCIE-16GB, Compute Capability 6.0 2019-04-22 04:50:54.169771: I tensorflow/core/platform/profile_utils/cpu_utils.cc:94] CPU Frequency: 2399910000 Hz 2019-04-22 04:50:54.174203: I tensorflow/compiler/xla/service/service.cc:168] XLA service 0x6b03fb0 executing computations on platform Host. Devices: 2019-04-22 04:50:54.174264: I tensorflow/compiler/xla/service/service.cc:175] StreamExecutor device (0): , 2019-04-22 04:50:54.175184: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1595] Found device 0 with properties: name: Tesla P100-PCIE-16GB major: 6 minor: 0 memoryClockRate(GHz): 1.3285 pciBusID: 0000:02:00.0 totalMemory: 15.90GiB freeMemory: 15.61GiB 2019-04-22 04:50:54.175857: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1595] Found device 1 with properties: name: Tesla P100-PCIE-16GB major: 6 minor: 0 memoryClockRate(GHz): 1.3285 pciBusID: 0000:82:00.0 totalMemory: 15.90GiB freeMemory: 15.61GiB 2019-04-22 04:50:54.176519: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1595] Found device 2 with properties: name: Tesla P100-PCIE-16GB major: 6 minor: 0 memoryClockRate(GHz): 1.3285 pciBusID: 0000:85:00.0 totalMemory: 15.90GiB freeMemory: 15.61GiB 2019-04-22 04:50:54.177156: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1595] Found device 3 with properties: name: Tesla P100-PCIE-16GB major: 6 minor: 0 memoryClockRate(GHz): 1.3285 pciBusID: 0000:86:00.0 totalMemory: 15.90GiB freeMemory: 15.61GiB 2019-04-22 04:50:54.186127: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1718] Adding visible gpu devices: 0, 1, 2, 3 2019-04-22 04:50:54.186792: I tensorflow/stream_executor/platform/default/dso_loader.cc:43] Successfully opened dynamic library libcudart.so.10.0 2019-04-22 04:50:54.194814: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1126] Device interconnect StreamExecutor with strength 1 edge matrix: 2019-04-22 04:50:54.194862: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1132] 0 1 2 3 2019-04-22 04:50:54.194892: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1145] 0: N N N N 2019-04-22 04:50:54.194910: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1145] 1: N N Y Y 2019-04-22 04:50:54.194925: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1145] 2: N Y N Y 2019-04-22 04:50:54.194939: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1145] 3: N Y Y N 2019-04-22 04:50:54.197344: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1266] Created TensorFlow device (/job:local/replica:0/task:0/device:GPU:0 with 15190 MB memory) -> physical GPU (device: 0, name: Tesla P100-PCIE-16GB, pci bus id: 0000:02:00.0, compute capability: 6.0) 2019-04-22 04:50:54.197997: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1266] Created TensorFlow device (/job:local/replica:0/task:0/device:GPU:1 with 15190 MB memory) -> physical GPU (device: 1, name: Tesla P100-PCIE-16GB, pci bus id: 0000:82:00.0, compute capability: 6.0) 2019-04-22 04:50:54.198703: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1266] Created TensorFlow device (/job:local/replica:0/task:0/device:GPU:2 with 15190 MB memory) -> physical GPU (device: 2, name: Tesla P100-PCIE-16GB, pci bus id: 0000:85:00.0, compute capability: 6.0) 2019-04-22 04:50:54.199385: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1266] Created TensorFlow device (/job:local/replica:0/task:0/device:GPU:3 with 15190 MB memory) -> physical GPU (device: 3, name: Tesla P100-PCIE-16GB, pci bus id: 0000:86:00.0, compute capability: 6.0) 2019-04-22 04:50:54.203603: I tensorflow/core/distributed_runtime/rpc/grpc_channel.cc:250] Initialize GrpcChannelCache for job local -> {0 -> localhost:45806} 2019-04-22 04:50:54.209726: I tensorflow/core/distributed_runtime/rpc/grpc_server_lib.cc:365] Started server with target: grpc://localhost:45806 I0422 04:50:54.217583 140630240360192 trainer.py:1261] Job controller start I0422 04:50:54.275299 140630240360192 base_runner.py:67] ============================================================ I0422 04:50:54.292784 140630240360192 base_runner.py:69] allow_implicit_capture : NoneType I0422 04:50:54.292994 140630240360192 base_runner.py:69] cls : type/lingvo.core.base_model/SingleTaskModel I0422 04:50:54.293131 140630240360192 base_runner.py:69] cluster.add_summary : NoneType I0422 04:50:54.293251 140630240360192 base_runner.py:69] cluster.cls : type/lingvo.core.cluster/_Cluster I0422 04:50:54.293366 140630240360192 base_runner.py:69] cluster.controller.cpus_per_replica : 1 I0422 04:50:54.293481 140630240360192 base_runner.py:69] cluster.controller.devices_per_split : 1 I0422 04:50:54.293593 140630240360192 base_runner.py:69] cluster.controller.gpus_per_replica : 0 I0422 04:50:54.293705 140630240360192 base_runner.py:69] cluster.controller.name : '/job:local' I0422 04:50:54.293817 140630240360192 base_runner.py:69] cluster.controller.num_tpu_hosts : 0 I0422 04:50:54.293930 140630240360192 base_runner.py:69] cluster.controller.replicas : 1 I0422 04:50:54.294040 140630240360192 base_runner.py:69] cluster.controller.tpus_per_replica : 0 I0422 04:50:54.294151 140630240360192 base_runner.py:69] cluster.decoder.cpus_per_replica : 1 I0422 04:50:54.294261 140630240360192 base_runner.py:69] cluster.decoder.devices_per_split : 1 I0422 04:50:54.294373 140630240360192 base_runner.py:69] cluster.decoder.gpus_per_replica : 1 I0422 04:50:54.294482 140630240360192 base_runner.py:69] cluster.decoder.name : '/job:local' I0422 04:50:54.294594 140630240360192 base_runner.py:69] cluster.decoder.num_tpu_hosts : 0 I0422 04:50:54.294703 140630240360192 base_runner.py:69] cluster.decoder.replicas : 1 I0422 04:50:54.294814 140630240360192 base_runner.py:69] cluster.decoder.tpus_per_replica : 0 I0422 04:50:54.294941 140630240360192 base_runner.py:69] cluster.evaler.cpus_per_replica : 1 I0422 04:50:54.295054 140630240360192 base_runner.py:69] cluster.evaler.devices_per_split : 1 I0422 04:50:54.295166 140630240360192 base_runner.py:69] cluster.evaler.gpus_per_replica : 1 I0422 04:50:54.295274 140630240360192 base_runner.py:69] cluster.evaler.name : '/job:local' I0422 04:50:54.295386 140630240360192 base_runner.py:69] cluster.evaler.num_tpu_hosts : 0 I0422 04:50:54.295495 140630240360192 base_runner.py:69] cluster.evaler.replicas : 1 I0422 04:50:54.295605 140630240360192 base_runner.py:69] cluster.evaler.tpus_per_replica : 0 I0422 04:50:54.295716 140630240360192 base_runner.py:69] cluster.input.cpus_per_replica : 1 I0422 04:50:54.295826 140630240360192 base_runner.py:69] cluster.input.devices_per_split : 1 I0422 04:50:54.295937 140630240360192 base_runner.py:69] cluster.input.gpus_per_replica : 0 I0422 04:50:54.296047 140630240360192 base_runner.py:69] cluster.input.name : '/job:local' I0422 04:50:54.296156 140630240360192 base_runner.py:69] cluster.input.num_tpu_hosts : 0 I0422 04:50:54.296267 140630240360192 base_runner.py:69] cluster.input.replicas : 0 I0422 04:50:54.296376 140630240360192 base_runner.py:69] cluster.input.tpus_per_replica : 0 I0422 04:50:54.296487 140630240360192 base_runner.py:69] cluster.job : 'controller' I0422 04:50:54.296596 140630240360192 base_runner.py:69] cluster.mode : 'sync' I0422 04:50:54.296705 140630240360192 base_runner.py:69] cluster.ps.cpus_per_replica : 1 I0422 04:50:54.296813 140630240360192 base_runner.py:69] cluster.ps.devices_per_split : 1 I0422 04:50:54.296921 140630240360192 base_runner.py:69] cluster.ps.gpus_per_replica : 0 I0422 04:50:54.297032 140630240360192 base_runner.py:69] cluster.ps.name : '/job:local' I0422 04:50:54.297141 140630240360192 base_runner.py:69] cluster.ps.num_tpu_hosts : 0 I0422 04:50:54.297250 140630240360192 base_runner.py:69] cluster.ps.replicas : 1 I0422 04:50:54.297360 140630240360192 base_runner.py:69] cluster.ps.tpus_per_replica : 0 I0422 04:50:54.297471 140630240360192 base_runner.py:69] cluster.task : 0 I0422 04:50:54.297597 140630240360192 base_runner.py:69] cluster.worker.cpus_per_replica : 1 I0422 04:50:54.297705 140630240360192 base_runner.py:69] cluster.worker.devices_per_split : 4 I0422 04:50:54.297831 140630240360192 base_runner.py:69] cluster.worker.gpus_per_replica : 1 I0422 04:50:54.297941 140630240360192 base_runner.py:69] cluster.worker.name : '/job:local' I0422 04:50:54.298048 140630240360192 base_runner.py:69] cluster.worker.num_tpu_hosts : 0 I0422 04:50:54.298156 140630240360192 base_runner.py:69] cluster.worker.replicas : 1 I0422 04:50:54.298265 140630240360192 base_runner.py:69] cluster.worker.tpus_per_replica : 0 I0422 04:50:54.298373 140630240360192 base_runner.py:69] dtype : float32 I0422 04:50:54.298481 140630240360192 base_runner.py:69] fprop_dtype : NoneType I0422 04:50:54.298588 140630240360192 base_runner.py:69] inference_driver_name : NoneType I0422 04:50:54.298696 140630240360192 base_runner.py:69] input.allow_implicit_capture : NoneType I0422 04:50:54.298804 140630240360192 base_runner.py:69] input.bucket_adjust_every_n : 0 I0422 04:50:54.298923 140630240360192 base_runner.py:69] input.bucket_batch_limit : [8] I0422 04:50:54.299035 140630240360192 base_runner.py:69] input.bucket_upper_bound : [100] I0422 04:50:54.299141 140630240360192 base_runner.py:69] input.cls : type/lingvo.tasks.lm.input_generator/LmInput I0422 04:50:54.299252 140630240360192 base_runner.py:69] input.dtype : float32 I0422 04:50:54.299360 140630240360192 base_runner.py:69] input.file_buffer_size : 10000000 I0422 04:50:54.299467 140630240360192 base_runner.py:69] input.file_parallelism : 10 I0422 04:50:54.299572 140630240360192 base_runner.py:69] input.file_pattern : 'text:/tmp/lm1b/1-billion-word-language-modeling-benchmark-r13output/training-monolingual.tokenized.shuffled/news.en*' I0422 04:50:54.299683 140630240360192 base_runner.py:69] input.file_random_seed : 301 I0422 04:50:54.299791 140630240360192 base_runner.py:69] input.fixed_input_shape : True I0422 04:50:54.299899 140630240360192 base_runner.py:69] input.flush_every_n : 0 I0422 04:50:54.300005 140630240360192 base_runner.py:69] input.fprop_dtype : NoneType I0422 04:50:54.300112 140630240360192 base_runner.py:69] input.inference_driver_name : NoneType I0422 04:50:54.300216 140630240360192 base_runner.py:69] input.is_eval : NoneType I0422 04:50:54.300323 140630240360192 base_runner.py:69] input.is_inference : NoneType I0422 04:50:54.300430 140630240360192 base_runner.py:69] input.name : '1bwds_train_set' I0422 04:50:54.300537 140630240360192 base_runner.py:69] input.num_batcher_threads : 16 I0422 04:50:54.300643 140630240360192 base_runner.py:69] input.num_samples : 0 I0422 04:50:54.300750 140630240360192 base_runner.py:69] input.pad_to_max_seq_length : False I0422 04:50:54.300857 140630240360192 base_runner.py:69] input.params_init.method : 'xavier' I0422 04:50:54.300964 140630240360192 base_runner.py:69] input.params_init.scale : 1.000001 I0422 04:50:54.301069 140630240360192 base_runner.py:69] input.params_init.seed : NoneType I0422 04:50:54.301177 140630240360192 base_runner.py:69] input.random_seed : NoneType I0422 04:50:54.301284 140630240360192 base_runner.py:69] input.require_sequential_order : False I0422 04:50:54.301390 140630240360192 base_runner.py:69] input.skip_lp_regularization : NoneType I0422 04:50:54.301500 140630240360192 base_runner.py:69] input.source_max_length : NoneType I0422 04:50:54.301606 140630240360192 base_runner.py:69] input.target_max_length : 1024 I0422 04:50:54.301714 140630240360192 base_runner.py:69] input.tokenizer.allow_implicit_capture : NoneType I0422 04:50:54.301822 140630240360192 base_runner.py:69] input.tokenizer.append_eos : True I0422 04:50:54.301928 140630240360192 base_runner.py:69] input.tokenizer.cls : type/lingvo.core.tokenizers/VocabFileTokenizer I0422 04:50:54.302041 140630240360192 base_runner.py:69] input.tokenizer.dtype : float32 I0422 04:50:54.302146 140630240360192 base_runner.py:69] input.tokenizer.fprop_dtype : NoneType I0422 04:50:54.302253 140630240360192 base_runner.py:69] input.tokenizer.inference_driver_name : NoneType I0422 04:50:54.302361 140630240360192 base_runner.py:69] input.tokenizer.is_eval : NoneType I0422 04:50:54.302467 140630240360192 base_runner.py:69] input.tokenizer.is_inference : NoneType I0422 04:50:54.302582 140630240360192 base_runner.py:69] input.tokenizer.load_token_ids_from_vocab : True I0422 04:50:54.302692 140630240360192 base_runner.py:69] input.tokenizer.name : 'tokenizer' I0422 04:50:54.302799 140630240360192 base_runner.py:69] input.tokenizer.ngram_separator : '' I0422 04:50:54.302922 140630240360192 base_runner.py:69] input.tokenizer.ngram_vocab_filepath : NoneType I0422 04:50:54.303031 140630240360192 base_runner.py:69] input.tokenizer.pad_to_max_length : True I0422 04:50:54.303141 140630240360192 base_runner.py:69] input.tokenizer.params_init.method : 'xavier' I0422 04:50:54.303247 140630240360192 base_runner.py:69] input.tokenizer.params_init.scale : 1.000001 I0422 04:50:54.303355 140630240360192 base_runner.py:69] input.tokenizer.params_init.seed : NoneType I0422 04:50:54.303472 140630240360192 base_runner.py:69] input.tokenizer.random_seed : NoneType I0422 04:50:54.303575 140630240360192 base_runner.py:69] input.tokenizer.skip_lp_regularization : NoneType I0422 04:50:54.303678 140630240360192 base_runner.py:69] input.tokenizer.target_eos_id : 2 I0422 04:50:54.303781 140630240360192 base_runner.py:69] input.tokenizer.target_sos_id : 1 I0422 04:50:54.303884 140630240360192 base_runner.py:69] input.tokenizer.target_unk_id : 3 I0422 04:50:54.303987 140630240360192 base_runner.py:69] input.tokenizer.token_vocab_filepath : '/tmp/lm1b/1-billion-word-language-modeling-benchmark-r13output/vocab.txt' I0422 04:50:54.304095 140630240360192 base_runner.py:69] input.tokenizer.tokens_delimiter : ' ' I0422 04:50:54.304198 140630240360192 base_runner.py:69] input.tokenizer.vn.global_vn : False I0422 04:50:54.304300 140630240360192 base_runner.py:69] input.tokenizer.vn.per_step_vn : False I0422 04:50:54.304402 140630240360192 base_runner.py:69] input.tokenizer.vn.scale : NoneType I0422 04:50:54.304503 140630240360192 base_runner.py:69] input.tokenizer.vn.seed : NoneType I0422 04:50:54.304605 140630240360192 base_runner.py:69] input.tokenizer.vocab_size : 32000 I0422 04:50:54.304708 140630240360192 base_runner.py:69] input.tokenizer_dict : {} I0422 04:50:54.304811 140630240360192 base_runner.py:69] input.tpu_infeed_parallism : 1 I0422 04:50:54.304913 140630240360192 base_runner.py:69] input.use_per_host_infeed : False I0422 04:50:54.305016 140630240360192 base_runner.py:69] input.use_within_batch_mixing : False I0422 04:50:54.305119 140630240360192 base_runner.py:69] input.vn.global_vn : False I0422 04:50:54.305222 140630240360192 base_runner.py:69] input.vn.per_step_vn : False I0422 04:50:54.305325 140630240360192 base_runner.py:69] input.vn.scale : NoneType I0422 04:50:54.305428 140630240360192 base_runner.py:69] input.vn.seed : NoneType I0422 04:50:54.305531 140630240360192 base_runner.py:69] is_eval : NoneType I0422 04:50:54.305634 140630240360192 base_runner.py:69] is_inference : NoneType I0422 04:50:54.305738 140630240360192 base_runner.py:69] model : 'lm.one_billion_wds.OneBWdsGPipeTransformer@/tmp/lingvo/bazel-bin/lingvo/trainer.runfiles/__main__/lingvo/tasks/lm/params/one_billion_wds.py:186' I0422 04:50:54.305850 140630240360192 base_runner.py:69] name : '' I0422 04:50:54.305953 140630240360192 base_runner.py:69] params_init.method : 'xavier' I0422 04:50:54.306056 140630240360192 base_runner.py:69] params_init.scale : 1.000001 I0422 04:50:54.306159 140630240360192 base_runner.py:69] params_init.seed : NoneType I0422 04:50:54.306261 140630240360192 base_runner.py:69] random_seed : NoneType I0422 04:50:54.306363 140630240360192 base_runner.py:69] skip_lp_regularization : NoneType I0422 04:50:54.306466 140630240360192 base_runner.py:69] task.allow_implicit_capture : NoneType I0422 04:50:54.306569 140630240360192 base_runner.py:69] task.cls : type/lingvo.tasks.lm.model/FixedShapeInputLanguageModel I0422 04:50:54.306674 140630240360192 base_runner.py:69] task.decoder : NoneType I0422 04:50:54.306777 140630240360192 base_runner.py:69] task.dtype : float32 I0422 04:50:54.306890 140630240360192 base_runner.py:69] task.encoder : NoneType I0422 04:50:54.307007 140630240360192 base_runner.py:69] task.eval.decoder_samples_per_summary : 0 I0422 04:50:54.307113 140630240360192 base_runner.py:69] task.eval.samples_per_summary : 0 I0422 04:50:54.307216 140630240360192 base_runner.py:69] task.fprop_dtype : NoneType I0422 04:50:54.307320 140630240360192 base_runner.py:69] task.inference_driver_name : NoneType I0422 04:50:54.307421 140630240360192 base_runner.py:69] task.input : NoneType I0422 04:50:54.307544 140630240360192 base_runner.py:69] task.is_eval : NoneType I0422 04:50:54.307646 140630240360192 base_runner.py:69] task.is_inference : NoneType I0422 04:50:54.307746 140630240360192 base_runner.py:69] task.lm.allow_implicit_capture : NoneType I0422 04:50:54.307847 140630240360192 base_runner.py:69] task.lm.atten_dropout_prob : 0.1 I0422 04:50:54.307949 140630240360192 base_runner.py:69] task.lm.cls : type/lingvo.tasks.lm.layers/GPipeTransformerLm I0422 04:50:54.308052 140630240360192 base_runner.py:69] task.lm.dtype : float32 I0422 04:50:54.308151 140630240360192 base_runner.py:69] task.lm.emb.allow_implicit_capture : NoneType I0422 04:50:54.308252 140630240360192 base_runner.py:69] task.lm.emb.apply_pruning : False I0422 04:50:54.308351 140630240360192 base_runner.py:69] task.lm.emb.cls : type/lingvo.core.layers/SimpleEmbeddingLayer I0422 04:50:54.308451 140630240360192 base_runner.py:69] task.lm.emb.dtype : float32 I0422 04:50:54.308552 140630240360192 base_runner.py:69] task.lm.emb.embedding_dim : 2048 I0422 04:50:54.308651 140630240360192 base_runner.py:69] task.lm.emb.fprop_dtype : NoneType I0422 04:50:54.308752 140630240360192 base_runner.py:69] task.lm.emb.fprop_mode : NoneType I0422 04:50:54.308851 140630240360192 base_runner.py:69] task.lm.emb.inference_driver_name : NoneType I0422 04:50:54.308949 140630240360192 base_runner.py:69] task.lm.emb.is_eval : NoneType I0422 04:50:54.309051 140630240360192 base_runner.py:69] task.lm.emb.is_inference : NoneType I0422 04:50:54.309150 140630240360192 base_runner.py:69] task.lm.emb.name : '' I0422 04:50:54.309252 140630240360192 base_runner.py:69] task.lm.emb.params_init.method : 'gaussian' I0422 04:50:54.309353 140630240360192 base_runner.py:69] task.lm.emb.params_init.scale : 0.0220970869121 I0422 04:50:54.309454 140630240360192 base_runner.py:69] task.lm.emb.params_init.seed : NoneType I0422 04:50:54.309555 140630240360192 base_runner.py:69] task.lm.emb.qdomain.default : NoneType I0422 04:50:54.309654 140630240360192 base_runner.py:69] task.lm.emb.random_seed : NoneType I0422 04:50:54.309755 140630240360192 base_runner.py:69] task.lm.emb.skip_lp_regularization : NoneType I0422 04:50:54.309855 140630240360192 base_runner.py:69] task.lm.emb.use_3d_weight_tensor : False I0422 04:50:54.309957 140630240360192 base_runner.py:69] task.lm.emb.use_matmul : False I0422 04:50:54.310059 140630240360192 base_runner.py:69] task.lm.emb.vn.global_vn : False I0422 04:50:54.310159 140630240360192 base_runner.py:69] task.lm.emb.vn.per_step_vn : False I0422 04:50:54.310261 140630240360192 base_runner.py:69] task.lm.emb.vn.scale : NoneType I0422 04:50:54.310360 140630240360192 base_runner.py:69] task.lm.emb.vn.seed : NoneType I0422 04:50:54.310461 140630240360192 base_runner.py:69] task.lm.emb.vocab_size : 32000 I0422 04:50:54.310560 140630240360192 base_runner.py:69] task.lm.fprop_dtype : NoneType I0422 04:50:54.310661 140630240360192 base_runner.py:69] task.lm.inference_driver_name : NoneType I0422 04:50:54.310761 140630240360192 base_runner.py:69] task.lm.input_dropout_prob : 0.0 I0422 04:50:54.310862 140630240360192 base_runner.py:69] task.lm.is_eval : NoneType I0422 04:50:54.310975 140630240360192 base_runner.py:69] task.lm.is_inference : NoneType I0422 04:50:54.311075 140630240360192 base_runner.py:69] task.lm.label_smoother : NoneType I0422 04:50:54.311177 140630240360192 base_runner.py:69] task.lm.model_dim : 2048 I0422 04:50:54.311279 140630240360192 base_runner.py:69] task.lm.name : 'transformerlm' I0422 04:50:54.311381 140630240360192 base_runner.py:69] task.lm.params_init.method : 'xavier' I0422 04:50:54.311480 140630240360192 base_runner.py:69] task.lm.params_init.scale : 1.000001 I0422 04:50:54.311589 140630240360192 base_runner.py:69] task.lm.params_init.seed : NoneType I0422 04:50:54.311692 140630240360192 base_runner.py:69] task.lm.position_emb.allow_implicit_capture : NoneType I0422 04:50:54.311794 140630240360192 base_runner.py:69] task.lm.position_emb.cls : type/lingvo.core.layers/PositionalEmbeddingLayer I0422 04:50:54.311902 140630240360192 base_runner.py:69] task.lm.position_emb.dtype : float32 I0422 04:50:54.312002 140630240360192 base_runner.py:69] task.lm.position_emb.embedding_dim : 2048 I0422 04:50:54.312103 140630240360192 base_runner.py:69] task.lm.position_emb.fprop_dtype : NoneType I0422 04:50:54.312206 140630240360192 base_runner.py:69] task.lm.position_emb.inference_driver_name : NoneType I0422 04:50:54.312306 140630240360192 base_runner.py:69] task.lm.position_emb.is_eval : NoneType I0422 04:50:54.312407 140630240360192 base_runner.py:69] task.lm.position_emb.is_inference : NoneType I0422 04:50:54.312509 140630240360192 base_runner.py:69] task.lm.position_emb.max_timescale : 10000 I0422 04:50:54.312608 140630240360192 base_runner.py:69] task.lm.position_emb.min_timescale : 1 I0422 04:50:54.312709 140630240360192 base_runner.py:69] task.lm.position_emb.name : '' I0422 04:50:54.312808 140630240360192 base_runner.py:69] task.lm.position_emb.params_init.method : 'xavier' I0422 04:50:54.312910 140630240360192 base_runner.py:69] task.lm.position_emb.params_init.scale : 1.000001 I0422 04:50:54.313010 140630240360192 base_runner.py:69] task.lm.position_emb.params_init.seed : NoneType I0422 04:50:54.313111 140630240360192 base_runner.py:69] task.lm.position_emb.random_seed : NoneType I0422 04:50:54.313210 140630240360192 base_runner.py:69] task.lm.position_emb.skip_lp_regularization : NoneType I0422 04:50:54.313312 140630240360192 base_runner.py:69] task.lm.position_emb.trainable_scaling : False I0422 04:50:54.313412 140630240360192 base_runner.py:69] task.lm.position_emb.trainable_scaling_init : 1.0 I0422 04:50:54.313513 140630240360192 base_runner.py:69] task.lm.position_emb.vn.global_vn : False I0422 04:50:54.313615 140630240360192 base_runner.py:69] task.lm.position_emb.vn.per_step_vn : False I0422 04:50:54.313714 140630240360192 base_runner.py:69] task.lm.position_emb.vn.scale : NoneType I0422 04:50:54.313815 140630240360192 base_runner.py:69] task.lm.position_emb.vn.seed : NoneType I0422 04:50:54.313915 140630240360192 base_runner.py:69] task.lm.random_seed : NoneType I0422 04:50:54.314013 140630240360192 base_runner.py:69] task.lm.relu_dropout_prob : 0.0 I0422 04:50:54.314114 140630240360192 base_runner.py:69] task.lm.residual_dropout_prob : 0.1 I0422 04:50:54.314213 140630240360192 base_runner.py:69] task.lm.skip_lp_regularization : NoneType I0422 04:50:54.314311 140630240360192 base_runner.py:69] task.lm.softmax.allow_implicit_capture : NoneType I0422 04:50:54.314412 140630240360192 base_runner.py:69] task.lm.softmax.apply_pruning : False I0422 04:50:54.314511 140630240360192 base_runner.py:69] task.lm.softmax.chunk_size : 4194 I0422 04:50:54.314610 140630240360192 base_runner.py:69] task.lm.softmax.cls : type/lingvo.core.layers/SimpleFullSoftmax I0422 04:50:54.314711 140630240360192 base_runner.py:69] task.lm.softmax.dtype : float32 I0422 04:50:54.314811 140630240360192 base_runner.py:69] task.lm.softmax.fprop_dtype : NoneType I0422 04:50:54.314927 140630240360192 base_runner.py:69] task.lm.softmax.inference_driver_name : NoneType I0422 04:50:54.315038 140630240360192 base_runner.py:69] task.lm.softmax.input_dim : 0 I0422 04:50:54.315135 140630240360192 base_runner.py:69] task.lm.softmax.is_eval : NoneType I0422 04:50:54.315232 140630240360192 base_runner.py:69] task.lm.softmax.is_inference : NoneType I0422 04:50:54.315327 140630240360192 base_runner.py:69] task.lm.softmax.logits_abs_max : NoneType I0422 04:50:54.315423 140630240360192 base_runner.py:69] task.lm.softmax.name : '' I0422 04:50:54.315521 140630240360192 base_runner.py:69] task.lm.softmax.num_classes : 32000 I0422 04:50:54.315618 140630240360192 base_runner.py:69] task.lm.softmax.num_sampled : 0 I0422 04:50:54.315721 140630240360192 base_runner.py:69] task.lm.softmax.num_shards : 16 I0422 04:50:54.315819 140630240360192 base_runner.py:69] task.lm.softmax.params_init.method : 'xavier' I0422 04:50:54.315917 140630240360192 base_runner.py:69] task.lm.softmax.params_init.scale : 1.000001 I0422 04:50:54.316014 140630240360192 base_runner.py:69] task.lm.softmax.params_init.seed : NoneType I0422 04:50:54.316109 140630240360192 base_runner.py:69] task.lm.softmax.qdomain.default : NoneType I0422 04:50:54.316206 140630240360192 base_runner.py:69] task.lm.softmax.random_seed : NoneType I0422 04:50:54.316303 140630240360192 base_runner.py:69] task.lm.softmax.skip_lp_regularization : NoneType I0422 04:50:54.316399 140630240360192 base_runner.py:69] task.lm.softmax.vn.global_vn : False I0422 04:50:54.316494 140630240360192 base_runner.py:69] task.lm.softmax.vn.per_step_vn : False I0422 04:50:54.316591 140630240360192 base_runner.py:69] task.lm.softmax.vn.scale : NoneType I0422 04:50:54.316689 140630240360192 base_runner.py:69] task.lm.softmax.vn.seed : NoneType I0422 04:50:54.316786 140630240360192 base_runner.py:69] task.lm.stack.allow_implicit_capture : NoneType I0422 04:50:54.316884 140630240360192 base_runner.py:69] task.lm.stack.apply_dropout_every_n : 1 I0422 04:50:54.316982 140630240360192 base_runner.py:69] task.lm.stack.batch_dim : 1 I0422 04:50:54.317079 140630240360192 base_runner.py:69] task.lm.stack.cls : type/lingvo.core.layers_with_gpipe/GPipeTransformerStack I0422 04:50:54.317176 140630240360192 base_runner.py:69] task.lm.stack.decoder_tpl.allow_implicit_capture : NoneType I0422 04:50:54.317272 140630240360192 base_runner.py:69] task.lm.stack.decoder_tpl.cls : type/lingvo.core.layers_with_gpipe/GPipeTransformerLayer I0422 04:50:54.317369 140630240360192 base_runner.py:69] task.lm.stack.decoder_tpl.dtype : float32 I0422 04:50:54.317466 140630240360192 base_runner.py:69] task.lm.stack.decoder_tpl.fprop_dtype : NoneType I0422 04:50:54.317581 140630240360192 base_runner.py:69] task.lm.stack.decoder_tpl.has_aux_atten : True I0422 04:50:54.317677 140630240360192 base_runner.py:69] task.lm.stack.decoder_tpl.inference_driver_name : NoneType I0422 04:50:54.317773 140630240360192 base_runner.py:69] task.lm.stack.decoder_tpl.is_decoder : False I0422 04:50:54.317868 140630240360192 base_runner.py:69] task.lm.stack.decoder_tpl.is_eval : NoneType I0422 04:50:54.317962 140630240360192 base_runner.py:69] task.lm.stack.decoder_tpl.is_inference : NoneType I0422 04:50:54.318058 140630240360192 base_runner.py:69] task.lm.stack.decoder_tpl.is_transparent : False I0422 04:50:54.318152 140630240360192 base_runner.py:69] task.lm.stack.decoder_tpl.mask_self_atten : True I0422 04:50:54.318248 140630240360192 base_runner.py:69] task.lm.stack.decoder_tpl.name : '' I0422 04:50:54.318342 140630240360192 base_runner.py:69] task.lm.stack.decoder_tpl.num_transparent_outputs : 0 I0422 04:50:54.318438 140630240360192 base_runner.py:69] task.lm.stack.decoder_tpl.output_dim : 0 I0422 04:50:54.318532 140630240360192 base_runner.py:69] task.lm.stack.decoder_tpl.packed_input : False I0422 04:50:54.318628 140630240360192 base_runner.py:69] task.lm.stack.decoder_tpl.params_init.method : 'xavier' I0422 04:50:54.318722 140630240360192 base_runner.py:69] task.lm.stack.decoder_tpl.params_init.scale : 1.000001 I0422 04:50:54.318819 140630240360192 base_runner.py:69] task.lm.stack.decoder_tpl.params_init.seed : NoneType I0422 04:50:54.318923 140630240360192 base_runner.py:69] task.lm.stack.decoder_tpl.random_seed : NoneType I0422 04:50:54.319022 140630240360192 base_runner.py:69] task.lm.stack.decoder_tpl.skip_lp_regularization : NoneType I0422 04:50:54.319118 140630240360192 base_runner.py:69] task.lm.stack.decoder_tpl.source_dim : 0 I0422 04:50:54.319214 140630240360192 base_runner.py:69] task.lm.stack.decoder_tpl.tr_atten_tpl.add_unnormalized_input : False I0422 04:50:54.319309 140630240360192 base_runner.py:69] task.lm.stack.decoder_tpl.tr_atten_tpl.allow_implicit_capture : NoneType I0422 04:50:54.319406 140630240360192 base_runner.py:69] task.lm.stack.decoder_tpl.tr_atten_tpl.atten_dropout_prob : 0.0 I0422 04:50:54.319506 140630240360192 base_runner.py:69] task.lm.stack.decoder_tpl.tr_atten_tpl.atten_hidden_dim : 0 I0422 04:50:54.319603 140630240360192 base_runner.py:69] task.lm.stack.decoder_tpl.tr_atten_tpl.atten_tpl.allow_implicit_capture : NoneType I0422 04:50:54.319700 140630240360192 base_runner.py:69] task.lm.stack.decoder_tpl.tr_atten_tpl.atten_tpl.atten_dropout_deterministic : False I0422 04:50:54.319794 140630240360192 base_runner.py:69] task.lm.stack.decoder_tpl.tr_atten_tpl.atten_tpl.atten_dropout_prob : 0.0 I0422 04:50:54.319890 140630240360192 base_runner.py:69] task.lm.stack.decoder_tpl.tr_atten_tpl.atten_tpl.cls : type/lingvo.core.attention/MultiHeadedAttention I0422 04:50:54.319994 140630240360192 base_runner.py:69] task.lm.stack.decoder_tpl.tr_atten_tpl.atten_tpl.context_dim : 0 I0422 04:50:54.320090 140630240360192 base_runner.py:69] task.lm.stack.decoder_tpl.tr_atten_tpl.atten_tpl.ctx_post_proj_dim : 0 I0422 04:50:54.320185 140630240360192 base_runner.py:69] task.lm.stack.decoder_tpl.tr_atten_tpl.atten_tpl.dtype : float32 I0422 04:50:54.320281 140630240360192 base_runner.py:69] task.lm.stack.decoder_tpl.tr_atten_tpl.atten_tpl.enable_ctx_post_proj : True I0422 04:50:54.320378 140630240360192 base_runner.py:69] task.lm.stack.decoder_tpl.tr_atten_tpl.atten_tpl.enable_ctx_pre_proj : False I0422 04:50:54.320473 140630240360192 base_runner.py:69] task.lm.stack.decoder_tpl.tr_atten_tpl.atten_tpl.enable_query_proj : True I0422 04:50:54.320569 140630240360192 base_runner.py:69] task.lm.stack.decoder_tpl.tr_atten_tpl.atten_tpl.enable_source_proj : True I0422 04:50:54.320666 140630240360192 base_runner.py:69] task.lm.stack.decoder_tpl.tr_atten_tpl.atten_tpl.fprop_dtype : NoneType I0422 04:50:54.320761 140630240360192 base_runner.py:69] task.lm.stack.decoder_tpl.tr_atten_tpl.atten_tpl.hidden_dim : 0 I0422 04:50:54.320857 140630240360192 base_runner.py:69] task.lm.stack.decoder_tpl.tr_atten_tpl.atten_tpl.inference_driver_name : NoneType I0422 04:50:54.320952 140630240360192 base_runner.py:69] task.lm.stack.decoder_tpl.tr_atten_tpl.atten_tpl.inner_atten_params.allow_implicit_capture : NoneType I0422 04:50:54.321048 140630240360192 base_runner.py:69] task.lm.stack.decoder_tpl.tr_atten_tpl.atten_tpl.inner_atten_params.atten_dropout_deterministic : False I0422 04:50:54.321145 140630240360192 base_runner.py:69] task.lm.stack.decoder_tpl.tr_atten_tpl.atten_tpl.inner_atten_params.atten_dropout_prob : 0.0 I0422 04:50:54.321239 140630240360192 base_runner.py:69] task.lm.stack.decoder_tpl.tr_atten_tpl.atten_tpl.inner_atten_params.cls : type/lingvo.core.attention/DotProductAttention I0422 04:50:54.321337 140630240360192 base_runner.py:69] task.lm.stack.decoder_tpl.tr_atten_tpl.atten_tpl.inner_atten_params.dtype : float32 I0422 04:50:54.321433 140630240360192 base_runner.py:69] task.lm.stack.decoder_tpl.tr_atten_tpl.atten_tpl.inner_atten_params.fprop_dtype : NoneType I0422 04:50:54.321527 140630240360192 base_runner.py:69] task.lm.stack.decoder_tpl.tr_atten_tpl.atten_tpl.inner_atten_params.hidden_dim : 0 I0422 04:50:54.321621 140630240360192 base_runner.py:69] task.lm.stack.decoder_tpl.tr_atten_tpl.atten_tpl.inner_atten_params.inference_driver_name : NoneType I0422 04:50:54.321717 140630240360192 base_runner.py:69] task.lm.stack.decoder_tpl.tr_atten_tpl.atten_tpl.inner_atten_params.is_eval : NoneType I0422 04:50:54.321811 140630240360192 base_runner.py:69] task.lm.stack.decoder_tpl.tr_atten_tpl.atten_tpl.inner_atten_params.is_inference : NoneType I0422 04:50:54.321906 140630240360192 base_runner.py:69] task.lm.stack.decoder_tpl.tr_atten_tpl.atten_tpl.inner_atten_params.name : '' I0422 04:50:54.322000 140630240360192 base_runner.py:69] task.lm.stack.decoder_tpl.tr_atten_tpl.atten_tpl.inner_atten_params.packed_input : False I0422 04:50:54.322096 140630240360192 base_runner.py:69] task.lm.stack.decoder_tpl.tr_atten_tpl.atten_tpl.inner_atten_params.params_init.method : 'xavier' I0422 04:50:54.322190 140630240360192 base_runner.py:69] task.lm.stack.decoder_tpl.tr_atten_tpl.atten_tpl.inner_atten_params.params_init.scale : 1.000001 I0422 04:50:54.322293 140630240360192 base_runner.py:69] task.lm.stack.decoder_tpl.tr_atten_tpl.atten_tpl.inner_atten_params.params_init.seed : NoneType I0422 04:50:54.322388 140630240360192 base_runner.py:69] task.lm.stack.decoder_tpl.tr_atten_tpl.atten_tpl.inner_atten_params.qdomain.default : NoneType I0422 04:50:54.322484 140630240360192 base_runner.py:69] task.lm.stack.decoder_tpl.tr_atten_tpl.atten_tpl.inner_atten_params.qdomain.fullyconnected : NoneType I0422 04:50:54.322580 140630240360192 base_runner.py:69] task.lm.stack.decoder_tpl.tr_atten_tpl.atten_tpl.inner_atten_params.qdomain.softmax : NoneType I0422 04:50:54.322674 140630240360192 base_runner.py:69] task.lm.stack.decoder_tpl.tr_atten_tpl.atten_tpl.inner_atten_params.query_dim : 0 I0422 04:50:54.322768 140630240360192 base_runner.py:69] task.lm.stack.decoder_tpl.tr_atten_tpl.atten_tpl.inner_atten_params.random_seed : NoneType I0422 04:50:54.322864 140630240360192 base_runner.py:69] task.lm.stack.decoder_tpl.tr_atten_tpl.atten_tpl.inner_atten_params.skip_lp_regularization : NoneType I0422 04:50:54.322972 140630240360192 base_runner.py:69] task.lm.stack.decoder_tpl.tr_atten_tpl.atten_tpl.inner_atten_params.source_dim : 0 I0422 04:50:54.323069 140630240360192 base_runner.py:69] task.lm.stack.decoder_tpl.tr_atten_tpl.atten_tpl.inner_atten_params.vn.global_vn : False I0422 04:50:54.323163 140630240360192 base_runner.py:69] task.lm.stack.decoder_tpl.tr_atten_tpl.atten_tpl.inner_atten_params.vn.per_step_vn : False I0422 04:50:54.323259 140630240360192 base_runner.py:69] task.lm.stack.decoder_tpl.tr_atten_tpl.atten_tpl.inner_atten_params.vn.scale : NoneType I0422 04:50:54.323354 140630240360192 base_runner.py:69] task.lm.stack.decoder_tpl.tr_atten_tpl.atten_tpl.inner_atten_params.vn.seed : NoneType I0422 04:50:54.323451 140630240360192 base_runner.py:69] task.lm.stack.decoder_tpl.tr_atten_tpl.atten_tpl.is_eval : NoneType I0422 04:50:54.323545 140630240360192 base_runner.py:69] task.lm.stack.decoder_tpl.tr_atten_tpl.atten_tpl.is_inference : NoneType I0422 04:50:54.323642 140630240360192 base_runner.py:69] task.lm.stack.decoder_tpl.tr_atten_tpl.atten_tpl.name : '' I0422 04:50:54.323736 140630240360192 base_runner.py:69] task.lm.stack.decoder_tpl.tr_atten_tpl.atten_tpl.num_attention_heads : 2 I0422 04:50:54.323831 140630240360192 base_runner.py:69] task.lm.stack.decoder_tpl.tr_atten_tpl.atten_tpl.packed_input : False I0422 04:50:54.323925 140630240360192 base_runner.py:69] task.lm.stack.decoder_tpl.tr_atten_tpl.atten_tpl.params_init.method : 'xavier' I0422 04:50:54.324021 140630240360192 base_runner.py:69] task.lm.stack.decoder_tpl.tr_atten_tpl.atten_tpl.params_init.scale : 1.0 I0422 04:50:54.324115 140630240360192 base_runner.py:69] task.lm.stack.decoder_tpl.tr_atten_tpl.atten_tpl.params_init.seed : NoneType I0422 04:50:54.324210 140630240360192 base_runner.py:69] task.lm.stack.decoder_tpl.tr_atten_tpl.atten_tpl.qdomain.atten_context : NoneType I0422 04:50:54.324306 140630240360192 base_runner.py:69] task.lm.stack.decoder_tpl.tr_atten_tpl.atten_tpl.qdomain.default : NoneType I0422 04:50:54.324402 140630240360192 base_runner.py:69] task.lm.stack.decoder_tpl.tr_atten_tpl.atten_tpl.qdomain.fullyconnected : NoneType I0422 04:50:54.324496 140630240360192 base_runner.py:69] task.lm.stack.decoder_tpl.tr_atten_tpl.atten_tpl.qdomain.softmax : NoneType I0422 04:50:54.324593 140630240360192 base_runner.py:69] task.lm.stack.decoder_tpl.tr_atten_tpl.atten_tpl.query_dim : 0 I0422 04:50:54.324687 140630240360192 base_runner.py:69] task.lm.stack.decoder_tpl.tr_atten_tpl.atten_tpl.random_seed : NoneType I0422 04:50:54.324781 140630240360192 base_runner.py:69] task.lm.stack.decoder_tpl.tr_atten_tpl.atten_tpl.skip_lp_regularization : NoneType I0422 04:50:54.324876 140630240360192 base_runner.py:69] task.lm.stack.decoder_tpl.tr_atten_tpl.atten_tpl.source_dim : 0 I0422 04:50:54.324970 140630240360192 base_runner.py:69] task.lm.stack.decoder_tpl.tr_atten_tpl.atten_tpl.use_source_vec_as_attention_value : False I0422 04:50:54.325072 140630240360192 base_runner.py:69] task.lm.stack.decoder_tpl.tr_atten_tpl.atten_tpl.vn.global_vn : False I0422 04:50:54.325169 140630240360192 base_runner.py:69] task.lm.stack.decoder_tpl.tr_atten_tpl.atten_tpl.vn.per_step_vn : False I0422 04:50:54.325263 140630240360192 base_runner.py:69] task.lm.stack.decoder_tpl.tr_atten_tpl.atten_tpl.vn.scale : NoneType I0422 04:50:54.325360 140630240360192 base_runner.py:69] task.lm.stack.decoder_tpl.tr_atten_tpl.atten_tpl.vn.seed : NoneType I0422 04:50:54.325454 140630240360192 base_runner.py:69] task.lm.stack.decoder_tpl.tr_atten_tpl.cls : type/lingvo.core.layers_with_attention/TransformerAttentionLayer I0422 04:50:54.325551 140630240360192 base_runner.py:69] task.lm.stack.decoder_tpl.tr_atten_tpl.dtype : float32 I0422 04:50:54.325644 140630240360192 base_runner.py:69] task.lm.stack.decoder_tpl.tr_atten_tpl.fprop_dtype : NoneType I0422 04:50:54.325740 140630240360192 base_runner.py:69] task.lm.stack.decoder_tpl.tr_atten_tpl.inference_driver_name : NoneType I0422 04:50:54.325835 140630240360192 base_runner.py:69] task.lm.stack.decoder_tpl.tr_atten_tpl.is_eval : NoneType I0422 04:50:54.325931 140630240360192 base_runner.py:69] task.lm.stack.decoder_tpl.tr_atten_tpl.is_inference : NoneType I0422 04:50:54.326025 140630240360192 base_runner.py:69] task.lm.stack.decoder_tpl.tr_atten_tpl.is_masked : False I0422 04:50:54.326119 140630240360192 base_runner.py:69] task.lm.stack.decoder_tpl.tr_atten_tpl.ln_tpl.allow_implicit_capture : NoneType I0422 04:50:54.326215 140630240360192 base_runner.py:69] task.lm.stack.decoder_tpl.tr_atten_tpl.ln_tpl.cls : type/lingvo.core.layers/LayerNorm I0422 04:50:54.326308 140630240360192 base_runner.py:69] task.lm.stack.decoder_tpl.tr_atten_tpl.ln_tpl.dtype : float32 I0422 04:50:54.326406 140630240360192 base_runner.py:69] task.lm.stack.decoder_tpl.tr_atten_tpl.ln_tpl.epsilon : 1e-06 I0422 04:50:54.326502 140630240360192 base_runner.py:69] task.lm.stack.decoder_tpl.tr_atten_tpl.ln_tpl.fprop_dtype : NoneType I0422 04:50:54.326597 140630240360192 base_runner.py:69] task.lm.stack.decoder_tpl.tr_atten_tpl.ln_tpl.inference_driver_name : NoneType I0422 04:50:54.326694 140630240360192 base_runner.py:69] task.lm.stack.decoder_tpl.tr_atten_tpl.ln_tpl.input_dim : 0 I0422 04:50:54.326790 140630240360192 base_runner.py:69] task.lm.stack.decoder_tpl.tr_atten_tpl.ln_tpl.is_eval : NoneType I0422 04:50:54.326896 140630240360192 base_runner.py:69] task.lm.stack.decoder_tpl.tr_atten_tpl.ln_tpl.is_inference : NoneType I0422 04:50:54.326997 140630240360192 base_runner.py:69] task.lm.stack.decoder_tpl.tr_atten_tpl.ln_tpl.name : '' I0422 04:50:54.327095 140630240360192 base_runner.py:69] task.lm.stack.decoder_tpl.tr_atten_tpl.ln_tpl.params_init.method : 'xavier' I0422 04:50:54.327203 140630240360192 base_runner.py:69] task.lm.stack.decoder_tpl.tr_atten_tpl.ln_tpl.params_init.scale : 1.000001 I0422 04:50:54.327296 140630240360192 base_runner.py:69] task.lm.stack.decoder_tpl.tr_atten_tpl.ln_tpl.params_init.seed : NoneType I0422 04:50:54.327388 140630240360192 base_runner.py:69] task.lm.stack.decoder_tpl.tr_atten_tpl.ln_tpl.random_seed : NoneType I0422 04:50:54.327480 140630240360192 base_runner.py:69] task.lm.stack.decoder_tpl.tr_atten_tpl.ln_tpl.skip_lp_regularization : NoneType I0422 04:50:54.327572 140630240360192 base_runner.py:69] task.lm.stack.decoder_tpl.tr_atten_tpl.ln_tpl.vn.global_vn : False I0422 04:50:54.327662 140630240360192 base_runner.py:69] task.lm.stack.decoder_tpl.tr_atten_tpl.ln_tpl.vn.per_step_vn : False I0422 04:50:54.327771 140630240360192 base_runner.py:69] task.lm.stack.decoder_tpl.tr_atten_tpl.ln_tpl.vn.scale : NoneType I0422 04:50:54.327862 140630240360192 base_runner.py:69] task.lm.stack.decoder_tpl.tr_atten_tpl.ln_tpl.vn.seed : NoneType I0422 04:50:54.327953 140630240360192 base_runner.py:69] task.lm.stack.decoder_tpl.tr_atten_tpl.name : '' I0422 04:50:54.328043 140630240360192 base_runner.py:69] task.lm.stack.decoder_tpl.tr_atten_tpl.num_attention_heads : 8 I0422 04:50:54.328135 140630240360192 base_runner.py:69] task.lm.stack.decoder_tpl.tr_atten_tpl.packed_input : False I0422 04:50:54.328231 140630240360192 base_runner.py:69] task.lm.stack.decoder_tpl.tr_atten_tpl.params_init.method : 'xavier' I0422 04:50:54.328322 140630240360192 base_runner.py:69] task.lm.stack.decoder_tpl.tr_atten_tpl.params_init.scale : 1.000001 I0422 04:50:54.328413 140630240360192 base_runner.py:69] task.lm.stack.decoder_tpl.tr_atten_tpl.params_init.seed : NoneType I0422 04:50:54.328502 140630240360192 base_runner.py:69] task.lm.stack.decoder_tpl.tr_atten_tpl.random_seed : NoneType I0422 04:50:54.328592 140630240360192 base_runner.py:69] task.lm.stack.decoder_tpl.tr_atten_tpl.residual_dropout_prob : 0.0 I0422 04:50:54.328680 140630240360192 base_runner.py:69] task.lm.stack.decoder_tpl.tr_atten_tpl.residual_dropout_tpl.allow_implicit_capture : NoneType I0422 04:50:54.328772 140630240360192 base_runner.py:69] task.lm.stack.decoder_tpl.tr_atten_tpl.residual_dropout_tpl.cls : type/lingvo.core.layers/DropoutLayer I0422 04:50:54.328861 140630240360192 base_runner.py:69] task.lm.stack.decoder_tpl.tr_atten_tpl.residual_dropout_tpl.dropout_at_eval : False I0422 04:50:54.328950 140630240360192 base_runner.py:69] task.lm.stack.decoder_tpl.tr_atten_tpl.residual_dropout_tpl.dtype : float32 I0422 04:50:54.329041 140630240360192 base_runner.py:69] task.lm.stack.decoder_tpl.tr_atten_tpl.residual_dropout_tpl.fprop_dtype : NoneType I0422 04:50:54.329132 140630240360192 base_runner.py:69] task.lm.stack.decoder_tpl.tr_atten_tpl.residual_dropout_tpl.inference_driver_name : NoneType I0422 04:50:54.329221 140630240360192 base_runner.py:69] task.lm.stack.decoder_tpl.tr_atten_tpl.residual_dropout_tpl.is_eval : NoneType I0422 04:50:54.329312 140630240360192 base_runner.py:69] task.lm.stack.decoder_tpl.tr_atten_tpl.residual_dropout_tpl.is_inference : NoneType I0422 04:50:54.329401 140630240360192 base_runner.py:69] task.lm.stack.decoder_tpl.tr_atten_tpl.residual_dropout_tpl.keep_prob : 1.0 I0422 04:50:54.329490 140630240360192 base_runner.py:69] task.lm.stack.decoder_tpl.tr_atten_tpl.residual_dropout_tpl.name : '' I0422 04:50:54.329581 140630240360192 base_runner.py:69] task.lm.stack.decoder_tpl.tr_atten_tpl.residual_dropout_tpl.noise_shape : NoneType I0422 04:50:54.329670 140630240360192 base_runner.py:69] task.lm.stack.decoder_tpl.tr_atten_tpl.residual_dropout_tpl.noise_shape_broadcast_dims : NoneType I0422 04:50:54.329761 140630240360192 base_runner.py:69] task.lm.stack.decoder_tpl.tr_atten_tpl.residual_dropout_tpl.params_init.method : 'xavier' I0422 04:50:54.329850 140630240360192 base_runner.py:69] task.lm.stack.decoder_tpl.tr_atten_tpl.residual_dropout_tpl.params_init.scale : 1.000001 I0422 04:50:54.329941 140630240360192 base_runner.py:69] task.lm.stack.decoder_tpl.tr_atten_tpl.residual_dropout_tpl.params_init.seed : NoneType I0422 04:50:54.330029 140630240360192 base_runner.py:69] task.lm.stack.decoder_tpl.tr_atten_tpl.residual_dropout_tpl.random_seed : NoneType I0422 04:50:54.330120 140630240360192 base_runner.py:69] task.lm.stack.decoder_tpl.tr_atten_tpl.residual_dropout_tpl.skip_lp_regularization : NoneType I0422 04:50:54.330209 140630240360192 base_runner.py:69] task.lm.stack.decoder_tpl.tr_atten_tpl.residual_dropout_tpl.vn.global_vn : False I0422 04:50:54.330301 140630240360192 base_runner.py:69] task.lm.stack.decoder_tpl.tr_atten_tpl.residual_dropout_tpl.vn.per_step_vn : False I0422 04:50:54.330390 140630240360192 base_runner.py:69] task.lm.stack.decoder_tpl.tr_atten_tpl.residual_dropout_tpl.vn.scale : NoneType I0422 04:50:54.330481 140630240360192 base_runner.py:69] task.lm.stack.decoder_tpl.tr_atten_tpl.residual_dropout_tpl.vn.seed : NoneType I0422 04:50:54.330570 140630240360192 base_runner.py:69] task.lm.stack.decoder_tpl.tr_atten_tpl.skip_lp_regularization : NoneType I0422 04:50:54.330661 140630240360192 base_runner.py:69] task.lm.stack.decoder_tpl.tr_atten_tpl.source_dim : 0 I0422 04:50:54.330750 140630240360192 base_runner.py:69] task.lm.stack.decoder_tpl.tr_atten_tpl.vn.global_vn : False I0422 04:50:54.330841 140630240360192 base_runner.py:69] task.lm.stack.decoder_tpl.tr_atten_tpl.vn.per_step_vn : False I0422 04:50:54.330948 140630240360192 base_runner.py:69] task.lm.stack.decoder_tpl.tr_atten_tpl.vn.scale : NoneType I0422 04:50:54.331041 140630240360192 base_runner.py:69] task.lm.stack.decoder_tpl.tr_atten_tpl.vn.seed : NoneType I0422 04:50:54.331130 140630240360192 base_runner.py:69] task.lm.stack.decoder_tpl.tr_aux_atten_tpl : NoneType I0422 04:50:54.331222 140630240360192 base_runner.py:69] task.lm.stack.decoder_tpl.tr_fflayer_tpl.activation : 'RELU' I0422 04:50:54.331312 140630240360192 base_runner.py:69] task.lm.stack.decoder_tpl.tr_fflayer_tpl.allow_implicit_capture : NoneType I0422 04:50:54.331403 140630240360192 base_runner.py:69] task.lm.stack.decoder_tpl.tr_fflayer_tpl.cls : type/lingvo.core.layers_with_attention/TransformerFeedForwardLayer I0422 04:50:54.331496 140630240360192 base_runner.py:69] task.lm.stack.decoder_tpl.tr_fflayer_tpl.dtype : float32 I0422 04:50:54.331587 140630240360192 base_runner.py:69] task.lm.stack.decoder_tpl.tr_fflayer_tpl.fflayer_tpl.activation : ['RELU', 'NONE'] I0422 04:50:54.331677 140630240360192 base_runner.py:69] task.lm.stack.decoder_tpl.tr_fflayer_tpl.fflayer_tpl.allow_implicit_capture : NoneType I0422 04:50:54.331768 140630240360192 base_runner.py:69] task.lm.stack.decoder_tpl.tr_fflayer_tpl.fflayer_tpl.batch_norm : False I0422 04:50:54.331859 140630240360192 base_runner.py:69] task.lm.stack.decoder_tpl.tr_fflayer_tpl.fflayer_tpl.bn_fold_weights : NoneType I0422 04:50:54.331948 140630240360192 base_runner.py:69] task.lm.stack.decoder_tpl.tr_fflayer_tpl.fflayer_tpl.cls : type/lingvo.core.layers/FeedForwardNet I0422 04:50:54.332039 140630240360192 base_runner.py:69] task.lm.stack.decoder_tpl.tr_fflayer_tpl.fflayer_tpl.dropout.allow_implicit_capture : NoneType I0422 04:50:54.332129 140630240360192 base_runner.py:69] task.lm.stack.decoder_tpl.tr_fflayer_tpl.fflayer_tpl.dropout.cls : type/lingvo.core.layers/DropoutLayer I0422 04:50:54.332221 140630240360192 base_runner.py:69] task.lm.stack.decoder_tpl.tr_fflayer_tpl.fflayer_tpl.dropout.dropout_at_eval : False I0422 04:50:54.332312 140630240360192 base_runner.py:69] task.lm.stack.decoder_tpl.tr_fflayer_tpl.fflayer_tpl.dropout.dtype : float32 I0422 04:50:54.332401 140630240360192 base_runner.py:69] task.lm.stack.decoder_tpl.tr_fflayer_tpl.fflayer_tpl.dropout.fprop_dtype : NoneType I0422 04:50:54.332492 140630240360192 base_runner.py:69] task.lm.stack.decoder_tpl.tr_fflayer_tpl.fflayer_tpl.dropout.inference_driver_name : NoneType I0422 04:50:54.332582 140630240360192 base_runner.py:69] task.lm.stack.decoder_tpl.tr_fflayer_tpl.fflayer_tpl.dropout.is_eval : NoneType I0422 04:50:54.332673 140630240360192 base_runner.py:69] task.lm.stack.decoder_tpl.tr_fflayer_tpl.fflayer_tpl.dropout.is_inference : NoneType I0422 04:50:54.332762 140630240360192 base_runner.py:69] task.lm.stack.decoder_tpl.tr_fflayer_tpl.fflayer_tpl.dropout.keep_prob : 1.0 I0422 04:50:54.332853 140630240360192 base_runner.py:69] task.lm.stack.decoder_tpl.tr_fflayer_tpl.fflayer_tpl.dropout.name : '' I0422 04:50:54.332942 140630240360192 base_runner.py:69] task.lm.stack.decoder_tpl.tr_fflayer_tpl.fflayer_tpl.dropout.noise_shape : NoneType I0422 04:50:54.333033 140630240360192 base_runner.py:69] task.lm.stack.decoder_tpl.tr_fflayer_tpl.fflayer_tpl.dropout.noise_shape_broadcast_dims : NoneType I0422 04:50:54.333123 140630240360192 base_runner.py:69] task.lm.stack.decoder_tpl.tr_fflayer_tpl.fflayer_tpl.dropout.params_init.method : 'xavier' I0422 04:50:54.333214 140630240360192 base_runner.py:69] task.lm.stack.decoder_tpl.tr_fflayer_tpl.fflayer_tpl.dropout.params_init.scale : 1.000001 I0422 04:50:54.333304 140630240360192 base_runner.py:69] task.lm.stack.decoder_tpl.tr_fflayer_tpl.fflayer_tpl.dropout.params_init.seed : NoneType I0422 04:50:54.333395 140630240360192 base_runner.py:69] task.lm.stack.decoder_tpl.tr_fflayer_tpl.fflayer_tpl.dropout.random_seed : NoneType I0422 04:50:54.333484 140630240360192 base_runner.py:69] task.lm.stack.decoder_tpl.tr_fflayer_tpl.fflayer_tpl.dropout.skip_lp_regularization : NoneType I0422 04:50:54.333580 140630240360192 base_runner.py:69] task.lm.stack.decoder_tpl.tr_fflayer_tpl.fflayer_tpl.dropout.vn.global_vn : False I0422 04:50:54.333672 140630240360192 base_runner.py:69] task.lm.stack.decoder_tpl.tr_fflayer_tpl.fflayer_tpl.dropout.vn.per_step_vn : False I0422 04:50:54.333762 140630240360192 base_runner.py:69] task.lm.stack.decoder_tpl.tr_fflayer_tpl.fflayer_tpl.dropout.vn.scale : NoneType I0422 04:50:54.333852 140630240360192 base_runner.py:69] task.lm.stack.decoder_tpl.tr_fflayer_tpl.fflayer_tpl.dropout.vn.seed : NoneType I0422 04:50:54.333941 140630240360192 base_runner.py:69] task.lm.stack.decoder_tpl.tr_fflayer_tpl.fflayer_tpl.dtype : float32 I0422 04:50:54.334031 140630240360192 base_runner.py:69] task.lm.stack.decoder_tpl.tr_fflayer_tpl.fflayer_tpl.fprop_dtype : NoneType I0422 04:50:54.334121 140630240360192 base_runner.py:69] task.lm.stack.decoder_tpl.tr_fflayer_tpl.fflayer_tpl.inference_driver_name : NoneType I0422 04:50:54.334212 140630240360192 base_runner.py:69] task.lm.stack.decoder_tpl.tr_fflayer_tpl.fflayer_tpl.input_dim : 0 I0422 04:50:54.334300 140630240360192 base_runner.py:69] task.lm.stack.decoder_tpl.tr_fflayer_tpl.fflayer_tpl.is_eval : NoneType I0422 04:50:54.334391 140630240360192 base_runner.py:69] task.lm.stack.decoder_tpl.tr_fflayer_tpl.fflayer_tpl.is_inference : NoneType I0422 04:50:54.334481 140630240360192 base_runner.py:69] task.lm.stack.decoder_tpl.tr_fflayer_tpl.fflayer_tpl.name : '' I0422 04:50:54.334569 140630240360192 base_runner.py:69] task.lm.stack.decoder_tpl.tr_fflayer_tpl.fflayer_tpl.params_init.method : 'xavier' I0422 04:50:54.334661 140630240360192 base_runner.py:69] task.lm.stack.decoder_tpl.tr_fflayer_tpl.fflayer_tpl.params_init.scale : 1.000001 I0422 04:50:54.334749 140630240360192 base_runner.py:69] task.lm.stack.decoder_tpl.tr_fflayer_tpl.fflayer_tpl.params_init.seed : NoneType I0422 04:50:54.334839 140630240360192 base_runner.py:69] task.lm.stack.decoder_tpl.tr_fflayer_tpl.fflayer_tpl.qdomain.default : NoneType I0422 04:50:54.334939 140630240360192 base_runner.py:69] task.lm.stack.decoder_tpl.tr_fflayer_tpl.fflayer_tpl.random_seed : NoneType I0422 04:50:54.335031 140630240360192 base_runner.py:69] task.lm.stack.decoder_tpl.tr_fflayer_tpl.fflayer_tpl.skip_connections : NoneType I0422 04:50:54.335120 140630240360192 base_runner.py:69] task.lm.stack.decoder_tpl.tr_fflayer_tpl.fflayer_tpl.skip_lp_regularization : NoneType I0422 04:50:54.335210 140630240360192 base_runner.py:69] task.lm.stack.decoder_tpl.tr_fflayer_tpl.fflayer_tpl.vn.global_vn : False I0422 04:50:54.335299 140630240360192 base_runner.py:69] task.lm.stack.decoder_tpl.tr_fflayer_tpl.fflayer_tpl.vn.per_step_vn : False I0422 04:50:54.335388 140630240360192 base_runner.py:69] task.lm.stack.decoder_tpl.tr_fflayer_tpl.fflayer_tpl.vn.scale : NoneType I0422 04:50:54.335478 140630240360192 base_runner.py:69] task.lm.stack.decoder_tpl.tr_fflayer_tpl.fflayer_tpl.vn.seed : NoneType I0422 04:50:54.335568 140630240360192 base_runner.py:69] task.lm.stack.decoder_tpl.tr_fflayer_tpl.fprop_dtype : NoneType I0422 04:50:54.335659 140630240360192 base_runner.py:69] task.lm.stack.decoder_tpl.tr_fflayer_tpl.hidden_dim : 2048 I0422 04:50:54.335750 140630240360192 base_runner.py:69] task.lm.stack.decoder_tpl.tr_fflayer_tpl.inference_driver_name : NoneType I0422 04:50:54.335839 140630240360192 base_runner.py:69] task.lm.stack.decoder_tpl.tr_fflayer_tpl.input_dim : 0 I0422 04:50:54.335930 140630240360192 base_runner.py:69] task.lm.stack.decoder_tpl.tr_fflayer_tpl.is_eval : NoneType I0422 04:50:54.336019 140630240360192 base_runner.py:69] task.lm.stack.decoder_tpl.tr_fflayer_tpl.is_inference : NoneType I0422 04:50:54.336110 140630240360192 base_runner.py:69] task.lm.stack.decoder_tpl.tr_fflayer_tpl.ln_tpl.allow_implicit_capture : NoneType I0422 04:50:54.336199 140630240360192 base_runner.py:69] task.lm.stack.decoder_tpl.tr_fflayer_tpl.ln_tpl.cls : type/lingvo.core.layers/LayerNorm I0422 04:50:54.336288 140630240360192 base_runner.py:69] task.lm.stack.decoder_tpl.tr_fflayer_tpl.ln_tpl.dtype : float32 I0422 04:50:54.336379 140630240360192 base_runner.py:69] task.lm.stack.decoder_tpl.tr_fflayer_tpl.ln_tpl.epsilon : 1e-06 I0422 04:50:54.336474 140630240360192 base_runner.py:69] task.lm.stack.decoder_tpl.tr_fflayer_tpl.ln_tpl.fprop_dtype : NoneType I0422 04:50:54.336565 140630240360192 base_runner.py:69] task.lm.stack.decoder_tpl.tr_fflayer_tpl.ln_tpl.inference_driver_name : NoneType I0422 04:50:54.336657 140630240360192 base_runner.py:69] task.lm.stack.decoder_tpl.tr_fflayer_tpl.ln_tpl.input_dim : 0 I0422 04:50:54.336746 140630240360192 base_runner.py:69] task.lm.stack.decoder_tpl.tr_fflayer_tpl.ln_tpl.is_eval : NoneType I0422 04:50:54.336836 140630240360192 base_runner.py:69] task.lm.stack.decoder_tpl.tr_fflayer_tpl.ln_tpl.is_inference : NoneType I0422 04:50:54.336925 140630240360192 base_runner.py:69] task.lm.stack.decoder_tpl.tr_fflayer_tpl.ln_tpl.name : '' I0422 04:50:54.337014 140630240360192 base_runner.py:69] task.lm.stack.decoder_tpl.tr_fflayer_tpl.ln_tpl.params_init.method : 'xavier' I0422 04:50:54.337104 140630240360192 base_runner.py:69] task.lm.stack.decoder_tpl.tr_fflayer_tpl.ln_tpl.params_init.scale : 1.000001 I0422 04:50:54.337193 140630240360192 base_runner.py:69] task.lm.stack.decoder_tpl.tr_fflayer_tpl.ln_tpl.params_init.seed : NoneType I0422 04:50:54.337282 140630240360192 base_runner.py:69] task.lm.stack.decoder_tpl.tr_fflayer_tpl.ln_tpl.random_seed : NoneType I0422 04:50:54.337373 140630240360192 base_runner.py:69] task.lm.stack.decoder_tpl.tr_fflayer_tpl.ln_tpl.skip_lp_regularization : NoneType I0422 04:50:54.337481 140630240360192 base_runner.py:69] task.lm.stack.decoder_tpl.tr_fflayer_tpl.ln_tpl.vn.global_vn : False I0422 04:50:54.337568 140630240360192 base_runner.py:69] task.lm.stack.decoder_tpl.tr_fflayer_tpl.ln_tpl.vn.per_step_vn : False I0422 04:50:54.337656 140630240360192 base_runner.py:69] task.lm.stack.decoder_tpl.tr_fflayer_tpl.ln_tpl.vn.scale : NoneType I0422 04:50:54.337744 140630240360192 base_runner.py:69] task.lm.stack.decoder_tpl.tr_fflayer_tpl.ln_tpl.vn.seed : NoneType I0422 04:50:54.337832 140630240360192 base_runner.py:69] task.lm.stack.decoder_tpl.tr_fflayer_tpl.name : '' I0422 04:50:54.337922 140630240360192 base_runner.py:69] task.lm.stack.decoder_tpl.tr_fflayer_tpl.output_dim : 0 I0422 04:50:54.338011 140630240360192 base_runner.py:69] task.lm.stack.decoder_tpl.tr_fflayer_tpl.params_init.method : 'xavier' I0422 04:50:54.338099 140630240360192 base_runner.py:69] task.lm.stack.decoder_tpl.tr_fflayer_tpl.params_init.scale : 1.000001 I0422 04:50:54.338186 140630240360192 base_runner.py:69] task.lm.stack.decoder_tpl.tr_fflayer_tpl.params_init.seed : NoneType I0422 04:50:54.338274 140630240360192 base_runner.py:69] task.lm.stack.decoder_tpl.tr_fflayer_tpl.random_seed : NoneType I0422 04:50:54.338363 140630240360192 base_runner.py:69] task.lm.stack.decoder_tpl.tr_fflayer_tpl.relu_dropout_prob : 0.0 I0422 04:50:54.338450 140630240360192 base_runner.py:69] task.lm.stack.decoder_tpl.tr_fflayer_tpl.res_proj_tpl.activation : 'RELU' I0422 04:50:54.338538 140630240360192 base_runner.py:69] task.lm.stack.decoder_tpl.tr_fflayer_tpl.res_proj_tpl.affine_last : False I0422 04:50:54.338627 140630240360192 base_runner.py:69] task.lm.stack.decoder_tpl.tr_fflayer_tpl.res_proj_tpl.allow_implicit_capture : NoneType I0422 04:50:54.338715 140630240360192 base_runner.py:69] task.lm.stack.decoder_tpl.tr_fflayer_tpl.res_proj_tpl.batch_norm : True I0422 04:50:54.338802 140630240360192 base_runner.py:69] task.lm.stack.decoder_tpl.tr_fflayer_tpl.res_proj_tpl.bias_init : 0.0 I0422 04:50:54.338901 140630240360192 base_runner.py:69] task.lm.stack.decoder_tpl.tr_fflayer_tpl.res_proj_tpl.bn_fold_weights : NoneType I0422 04:50:54.338993 140630240360192 base_runner.py:69] task.lm.stack.decoder_tpl.tr_fflayer_tpl.res_proj_tpl.cls : type/lingvo.core.layers/ProjectionLayer I0422 04:50:54.339092 140630240360192 base_runner.py:69] task.lm.stack.decoder_tpl.tr_fflayer_tpl.res_proj_tpl.dtype : float32 I0422 04:50:54.339176 140630240360192 base_runner.py:69] task.lm.stack.decoder_tpl.tr_fflayer_tpl.res_proj_tpl.fprop_dtype : NoneType I0422 04:50:54.339262 140630240360192 base_runner.py:69] task.lm.stack.decoder_tpl.tr_fflayer_tpl.res_proj_tpl.has_bias : False I0422 04:50:54.339353 140630240360192 base_runner.py:69] task.lm.stack.decoder_tpl.tr_fflayer_tpl.res_proj_tpl.inference_driver_name : NoneType I0422 04:50:54.339440 140630240360192 base_runner.py:69] task.lm.stack.decoder_tpl.tr_fflayer_tpl.res_proj_tpl.input_dim : 0 I0422 04:50:54.339525 140630240360192 base_runner.py:69] task.lm.stack.decoder_tpl.tr_fflayer_tpl.res_proj_tpl.is_eval : NoneType I0422 04:50:54.339611 140630240360192 base_runner.py:69] task.lm.stack.decoder_tpl.tr_fflayer_tpl.res_proj_tpl.is_inference : NoneType I0422 04:50:54.339695 140630240360192 base_runner.py:69] task.lm.stack.decoder_tpl.tr_fflayer_tpl.res_proj_tpl.name : '' I0422 04:50:54.339781 140630240360192 base_runner.py:69] task.lm.stack.decoder_tpl.tr_fflayer_tpl.res_proj_tpl.output_dim : 0 I0422 04:50:54.339865 140630240360192 base_runner.py:69] task.lm.stack.decoder_tpl.tr_fflayer_tpl.res_proj_tpl.params_init.method : 'xavier' I0422 04:50:54.339951 140630240360192 base_runner.py:69] task.lm.stack.decoder_tpl.tr_fflayer_tpl.res_proj_tpl.params_init.scale : 1.000001 I0422 04:50:54.340035 140630240360192 base_runner.py:69] task.lm.stack.decoder_tpl.tr_fflayer_tpl.res_proj_tpl.params_init.seed : NoneType I0422 04:50:54.340121 140630240360192 base_runner.py:69] task.lm.stack.decoder_tpl.tr_fflayer_tpl.res_proj_tpl.qdomain.default : NoneType I0422 04:50:54.340204 140630240360192 base_runner.py:69] task.lm.stack.decoder_tpl.tr_fflayer_tpl.res_proj_tpl.random_seed : NoneType I0422 04:50:54.340289 140630240360192 base_runner.py:69] task.lm.stack.decoder_tpl.tr_fflayer_tpl.res_proj_tpl.skip_lp_regularization : NoneType I0422 04:50:54.340373 140630240360192 base_runner.py:69] task.lm.stack.decoder_tpl.tr_fflayer_tpl.res_proj_tpl.vn.global_vn : False I0422 04:50:54.340456 140630240360192 base_runner.py:69] task.lm.stack.decoder_tpl.tr_fflayer_tpl.res_proj_tpl.vn.per_step_vn : False I0422 04:50:54.340542 140630240360192 base_runner.py:69] task.lm.stack.decoder_tpl.tr_fflayer_tpl.res_proj_tpl.vn.scale : NoneType I0422 04:50:54.340626 140630240360192 base_runner.py:69] task.lm.stack.decoder_tpl.tr_fflayer_tpl.res_proj_tpl.vn.seed : NoneType I0422 04:50:54.340709 140630240360192 base_runner.py:69] task.lm.stack.decoder_tpl.tr_fflayer_tpl.res_proj_tpl.weight_norm : False I0422 04:50:54.340794 140630240360192 base_runner.py:69] task.lm.stack.decoder_tpl.tr_fflayer_tpl.residual_dropout_prob : 0.0 I0422 04:50:54.340878 140630240360192 base_runner.py:69] task.lm.stack.decoder_tpl.tr_fflayer_tpl.residual_dropout_tpl.allow_implicit_capture : NoneType I0422 04:50:54.340964 140630240360192 base_runner.py:69] task.lm.stack.decoder_tpl.tr_fflayer_tpl.residual_dropout_tpl.cls : type/lingvo.core.layers/DropoutLayer I0422 04:50:54.341048 140630240360192 base_runner.py:69] task.lm.stack.decoder_tpl.tr_fflayer_tpl.residual_dropout_tpl.dropout_at_eval : False I0422 04:50:54.341134 140630240360192 base_runner.py:69] task.lm.stack.decoder_tpl.tr_fflayer_tpl.residual_dropout_tpl.dtype : float32 I0422 04:50:54.341218 140630240360192 base_runner.py:69] task.lm.stack.decoder_tpl.tr_fflayer_tpl.residual_dropout_tpl.fprop_dtype : NoneType I0422 04:50:54.341303 140630240360192 base_runner.py:69] task.lm.stack.decoder_tpl.tr_fflayer_tpl.residual_dropout_tpl.inference_driver_name : NoneType I0422 04:50:54.341387 140630240360192 base_runner.py:69] task.lm.stack.decoder_tpl.tr_fflayer_tpl.residual_dropout_tpl.is_eval : NoneType I0422 04:50:54.341473 140630240360192 base_runner.py:69] task.lm.stack.decoder_tpl.tr_fflayer_tpl.residual_dropout_tpl.is_inference : NoneType I0422 04:50:54.341557 140630240360192 base_runner.py:69] task.lm.stack.decoder_tpl.tr_fflayer_tpl.residual_dropout_tpl.keep_prob : 1.0 I0422 04:50:54.341643 140630240360192 base_runner.py:69] task.lm.stack.decoder_tpl.tr_fflayer_tpl.residual_dropout_tpl.name : '' I0422 04:50:54.341727 140630240360192 base_runner.py:69] task.lm.stack.decoder_tpl.tr_fflayer_tpl.residual_dropout_tpl.noise_shape : NoneType I0422 04:50:54.341824 140630240360192 base_runner.py:69] task.lm.stack.decoder_tpl.tr_fflayer_tpl.residual_dropout_tpl.noise_shape_broadcast_dims : NoneType I0422 04:50:54.341909 140630240360192 base_runner.py:69] task.lm.stack.decoder_tpl.tr_fflayer_tpl.residual_dropout_tpl.params_init.method : 'xavier' I0422 04:50:54.341995 140630240360192 base_runner.py:69] task.lm.stack.decoder_tpl.tr_fflayer_tpl.residual_dropout_tpl.params_init.scale : 1.000001 I0422 04:50:54.342078 140630240360192 base_runner.py:69] task.lm.stack.decoder_tpl.tr_fflayer_tpl.residual_dropout_tpl.params_init.seed : NoneType I0422 04:50:54.342163 140630240360192 base_runner.py:69] task.lm.stack.decoder_tpl.tr_fflayer_tpl.residual_dropout_tpl.random_seed : NoneType I0422 04:50:54.342248 140630240360192 base_runner.py:69] task.lm.stack.decoder_tpl.tr_fflayer_tpl.residual_dropout_tpl.skip_lp_regularization : NoneType I0422 04:50:54.342334 140630240360192 base_runner.py:69] task.lm.stack.decoder_tpl.tr_fflayer_tpl.residual_dropout_tpl.vn.global_vn : False I0422 04:50:54.342417 140630240360192 base_runner.py:69] task.lm.stack.decoder_tpl.tr_fflayer_tpl.residual_dropout_tpl.vn.per_step_vn : False I0422 04:50:54.342502 140630240360192 base_runner.py:69] task.lm.stack.decoder_tpl.tr_fflayer_tpl.residual_dropout_tpl.vn.scale : NoneType I0422 04:50:54.342587 140630240360192 base_runner.py:69] task.lm.stack.decoder_tpl.tr_fflayer_tpl.residual_dropout_tpl.vn.seed : NoneType I0422 04:50:54.342673 140630240360192 base_runner.py:69] task.lm.stack.decoder_tpl.tr_fflayer_tpl.skip_lp_regularization : NoneType I0422 04:50:54.342756 140630240360192 base_runner.py:69] task.lm.stack.decoder_tpl.tr_fflayer_tpl.vn.global_vn : False I0422 04:50:54.342842 140630240360192 base_runner.py:69] task.lm.stack.decoder_tpl.tr_fflayer_tpl.vn.per_step_vn : False I0422 04:50:54.342937 140630240360192 base_runner.py:69] task.lm.stack.decoder_tpl.tr_fflayer_tpl.vn.scale : NoneType I0422 04:50:54.343024 140630240360192 base_runner.py:69] task.lm.stack.decoder_tpl.tr_fflayer_tpl.vn.seed : NoneType I0422 04:50:54.343111 140630240360192 base_runner.py:69] task.lm.stack.decoder_tpl.transparent_merger_tpl : NoneType I0422 04:50:54.343195 140630240360192 base_runner.py:69] task.lm.stack.decoder_tpl.vn.global_vn : False I0422 04:50:54.343281 140630240360192 base_runner.py:69] task.lm.stack.decoder_tpl.vn.per_step_vn : False I0422 04:50:54.343364 140630240360192 base_runner.py:69] task.lm.stack.decoder_tpl.vn.scale : NoneType I0422 04:50:54.343451 140630240360192 base_runner.py:69] task.lm.stack.decoder_tpl.vn.seed : NoneType I0422 04:50:54.343535 140630240360192 base_runner.py:69] task.lm.stack.dtype : float32 I0422 04:50:54.343621 140630240360192 base_runner.py:69] task.lm.stack.emb_tpl.add_tgt_embedding_layer : False I0422 04:50:54.343705 140630240360192 base_runner.py:69] task.lm.stack.emb_tpl.allow_implicit_capture : NoneType I0422 04:50:54.343791 140630240360192 base_runner.py:69] task.lm.stack.emb_tpl.cls : type/lingvo.core.layers_with_gpipe/GPipeTransformerEmbeddingLayer I0422 04:50:54.343877 140630240360192 base_runner.py:69] task.lm.stack.emb_tpl.dropout_tpl.allow_implicit_capture : NoneType I0422 04:50:54.343962 140630240360192 base_runner.py:69] task.lm.stack.emb_tpl.dropout_tpl.cls : type/lingvo.core.layers/DropoutLayer I0422 04:50:54.344048 140630240360192 base_runner.py:69] task.lm.stack.emb_tpl.dropout_tpl.dropout_at_eval : False I0422 04:50:54.344132 140630240360192 base_runner.py:69] task.lm.stack.emb_tpl.dropout_tpl.dtype : float32 I0422 04:50:54.344218 140630240360192 base_runner.py:69] task.lm.stack.emb_tpl.dropout_tpl.fprop_dtype : NoneType I0422 04:50:54.344304 140630240360192 base_runner.py:69] task.lm.stack.emb_tpl.dropout_tpl.inference_driver_name : NoneType I0422 04:50:54.344388 140630240360192 base_runner.py:69] task.lm.stack.emb_tpl.dropout_tpl.is_eval : NoneType I0422 04:50:54.344474 140630240360192 base_runner.py:69] task.lm.stack.emb_tpl.dropout_tpl.is_inference : NoneType I0422 04:50:54.344559 140630240360192 base_runner.py:69] task.lm.stack.emb_tpl.dropout_tpl.keep_prob : 1.0 I0422 04:50:54.344651 140630240360192 base_runner.py:69] task.lm.stack.emb_tpl.dropout_tpl.name : '' I0422 04:50:54.344739 140630240360192 base_runner.py:69] task.lm.stack.emb_tpl.dropout_tpl.noise_shape : NoneType I0422 04:50:54.344825 140630240360192 base_runner.py:69] task.lm.stack.emb_tpl.dropout_tpl.noise_shape_broadcast_dims : NoneType I0422 04:50:54.344909 140630240360192 base_runner.py:69] task.lm.stack.emb_tpl.dropout_tpl.params_init.method : 'xavier' I0422 04:50:54.344995 140630240360192 base_runner.py:69] task.lm.stack.emb_tpl.dropout_tpl.params_init.scale : 1.000001 I0422 04:50:54.345079 140630240360192 base_runner.py:69] task.lm.stack.emb_tpl.dropout_tpl.params_init.seed : NoneType I0422 04:50:54.345165 140630240360192 base_runner.py:69] task.lm.stack.emb_tpl.dropout_tpl.random_seed : NoneType I0422 04:50:54.345249 140630240360192 base_runner.py:69] task.lm.stack.emb_tpl.dropout_tpl.skip_lp_regularization : NoneType I0422 04:50:54.345335 140630240360192 base_runner.py:69] task.lm.stack.emb_tpl.dropout_tpl.vn.global_vn : False I0422 04:50:54.345419 140630240360192 base_runner.py:69] task.lm.stack.emb_tpl.dropout_tpl.vn.per_step_vn : False I0422 04:50:54.345504 140630240360192 base_runner.py:69] task.lm.stack.emb_tpl.dropout_tpl.vn.scale : NoneType I0422 04:50:54.345588 140630240360192 base_runner.py:69] task.lm.stack.emb_tpl.dropout_tpl.vn.seed : NoneType I0422 04:50:54.345673 140630240360192 base_runner.py:69] task.lm.stack.emb_tpl.dtype : float32 I0422 04:50:54.345758 140630240360192 base_runner.py:69] task.lm.stack.emb_tpl.fprop_dtype : NoneType I0422 04:50:54.345843 140630240360192 base_runner.py:69] task.lm.stack.emb_tpl.inference_driver_name : NoneType I0422 04:50:54.345927 140630240360192 base_runner.py:69] task.lm.stack.emb_tpl.input_dropout_prob : 0.0 I0422 04:50:54.346013 140630240360192 base_runner.py:69] task.lm.stack.emb_tpl.is_eval : NoneType I0422 04:50:54.346098 140630240360192 base_runner.py:69] task.lm.stack.emb_tpl.is_inference : NoneType I0422 04:50:54.346184 140630240360192 base_runner.py:69] task.lm.stack.emb_tpl.is_transparent : False I0422 04:50:54.346268 140630240360192 base_runner.py:69] task.lm.stack.emb_tpl.max_seq_len : 300 I0422 04:50:54.346353 140630240360192 base_runner.py:69] task.lm.stack.emb_tpl.name : '' I0422 04:50:54.346438 140630240360192 base_runner.py:69] task.lm.stack.emb_tpl.packed_input : False I0422 04:50:54.346523 140630240360192 base_runner.py:69] task.lm.stack.emb_tpl.params_init.method : 'xavier' I0422 04:50:54.346607 140630240360192 base_runner.py:69] task.lm.stack.emb_tpl.params_init.scale : 1.000001 I0422 04:50:54.346693 140630240360192 base_runner.py:69] task.lm.stack.emb_tpl.params_init.seed : NoneType I0422 04:50:54.346777 140630240360192 base_runner.py:69] task.lm.stack.emb_tpl.position_emb.allow_implicit_capture : NoneType I0422 04:50:54.346862 140630240360192 base_runner.py:69] task.lm.stack.emb_tpl.position_emb.cls : type/lingvo.core.layers/PositionalEmbeddingLayer I0422 04:50:54.346956 140630240360192 base_runner.py:69] task.lm.stack.emb_tpl.position_emb.dtype : float32 I0422 04:50:54.347043 140630240360192 base_runner.py:69] task.lm.stack.emb_tpl.position_emb.embedding_dim : 0 I0422 04:50:54.347126 140630240360192 base_runner.py:69] task.lm.stack.emb_tpl.position_emb.fprop_dtype : NoneType I0422 04:50:54.347213 140630240360192 base_runner.py:69] task.lm.stack.emb_tpl.position_emb.inference_driver_name : NoneType I0422 04:50:54.347297 140630240360192 base_runner.py:69] task.lm.stack.emb_tpl.position_emb.is_eval : NoneType I0422 04:50:54.347383 140630240360192 base_runner.py:69] task.lm.stack.emb_tpl.position_emb.is_inference : NoneType I0422 04:50:54.347469 140630240360192 base_runner.py:69] task.lm.stack.emb_tpl.position_emb.max_timescale : 10000 I0422 04:50:54.347554 140630240360192 base_runner.py:69] task.lm.stack.emb_tpl.position_emb.min_timescale : 1 I0422 04:50:54.347640 140630240360192 base_runner.py:69] task.lm.stack.emb_tpl.position_emb.name : '' I0422 04:50:54.347742 140630240360192 base_runner.py:69] task.lm.stack.emb_tpl.position_emb.params_init.method : 'xavier' I0422 04:50:54.347835 140630240360192 base_runner.py:69] task.lm.stack.emb_tpl.position_emb.params_init.scale : 1.000001 I0422 04:50:54.347919 140630240360192 base_runner.py:69] task.lm.stack.emb_tpl.position_emb.params_init.seed : NoneType I0422 04:50:54.348004 140630240360192 base_runner.py:69] task.lm.stack.emb_tpl.position_emb.random_seed : NoneType I0422 04:50:54.348088 140630240360192 base_runner.py:69] task.lm.stack.emb_tpl.position_emb.skip_lp_regularization : NoneType I0422 04:50:54.348171 140630240360192 base_runner.py:69] task.lm.stack.emb_tpl.position_emb.trainable_scaling : False I0422 04:50:54.348256 140630240360192 base_runner.py:69] task.lm.stack.emb_tpl.position_emb.trainable_scaling_init : 1.0 I0422 04:50:54.348340 140630240360192 base_runner.py:69] task.lm.stack.emb_tpl.position_emb.vn.global_vn : False I0422 04:50:54.348423 140630240360192 base_runner.py:69] task.lm.stack.emb_tpl.position_emb.vn.per_step_vn : False I0422 04:50:54.348506 140630240360192 base_runner.py:69] task.lm.stack.emb_tpl.position_emb.vn.scale : NoneType I0422 04:50:54.348589 140630240360192 base_runner.py:69] task.lm.stack.emb_tpl.position_emb.vn.seed : NoneType I0422 04:50:54.348674 140630240360192 base_runner.py:69] task.lm.stack.emb_tpl.random_seed : NoneType I0422 04:50:54.348757 140630240360192 base_runner.py:69] task.lm.stack.emb_tpl.skip_lp_regularization : NoneType I0422 04:50:54.348839 140630240360192 base_runner.py:69] task.lm.stack.emb_tpl.token_emb.allow_implicit_capture : NoneType I0422 04:50:54.348923 140630240360192 base_runner.py:69] task.lm.stack.emb_tpl.token_emb.apply_pruning : False I0422 04:50:54.349006 140630240360192 base_runner.py:69] task.lm.stack.emb_tpl.token_emb.cls : type/lingvo.core.layers/SimpleEmbeddingLayer I0422 04:50:54.349090 140630240360192 base_runner.py:69] task.lm.stack.emb_tpl.token_emb.dtype : float32 I0422 04:50:54.349174 140630240360192 base_runner.py:69] task.lm.stack.emb_tpl.token_emb.embedding_dim : 0 I0422 04:50:54.349257 140630240360192 base_runner.py:69] task.lm.stack.emb_tpl.token_emb.fprop_dtype : NoneType I0422 04:50:54.349340 140630240360192 base_runner.py:69] task.lm.stack.emb_tpl.token_emb.fprop_mode : NoneType I0422 04:50:54.349425 140630240360192 base_runner.py:69] task.lm.stack.emb_tpl.token_emb.inference_driver_name : NoneType I0422 04:50:54.349508 140630240360192 base_runner.py:69] task.lm.stack.emb_tpl.token_emb.is_eval : NoneType I0422 04:50:54.349591 140630240360192 base_runner.py:69] task.lm.stack.emb_tpl.token_emb.is_inference : NoneType I0422 04:50:54.349675 140630240360192 base_runner.py:69] task.lm.stack.emb_tpl.token_emb.name : '' I0422 04:50:54.349760 140630240360192 base_runner.py:69] task.lm.stack.emb_tpl.token_emb.params_init.method : 'xavier' I0422 04:50:54.349843 140630240360192 base_runner.py:69] task.lm.stack.emb_tpl.token_emb.params_init.scale : 1.000001 I0422 04:50:54.349927 140630240360192 base_runner.py:69] task.lm.stack.emb_tpl.token_emb.params_init.seed : NoneType I0422 04:50:54.350012 140630240360192 base_runner.py:69] task.lm.stack.emb_tpl.token_emb.qdomain.default : NoneType I0422 04:50:54.350095 140630240360192 base_runner.py:69] task.lm.stack.emb_tpl.token_emb.random_seed : NoneType I0422 04:50:54.350179 140630240360192 base_runner.py:69] task.lm.stack.emb_tpl.token_emb.skip_lp_regularization : NoneType I0422 04:50:54.350265 140630240360192 base_runner.py:69] task.lm.stack.emb_tpl.token_emb.use_3d_weight_tensor : False I0422 04:50:54.350348 140630240360192 base_runner.py:69] task.lm.stack.emb_tpl.token_emb.use_matmul : False I0422 04:50:54.350434 140630240360192 base_runner.py:69] task.lm.stack.emb_tpl.token_emb.vn.global_vn : False I0422 04:50:54.350517 140630240360192 base_runner.py:69] task.lm.stack.emb_tpl.token_emb.vn.per_step_vn : False I0422 04:50:54.350600 140630240360192 base_runner.py:69] task.lm.stack.emb_tpl.token_emb.vn.scale : NoneType I0422 04:50:54.350684 140630240360192 base_runner.py:69] task.lm.stack.emb_tpl.token_emb.vn.seed : NoneType I0422 04:50:54.350769 140630240360192 base_runner.py:69] task.lm.stack.emb_tpl.token_emb.vocab_size : 0 I0422 04:50:54.350858 140630240360192 base_runner.py:69] task.lm.stack.emb_tpl.vn.global_vn : False I0422 04:50:54.350969 140630240360192 base_runner.py:69] task.lm.stack.emb_tpl.vn.per_step_vn : False I0422 04:50:54.351051 140630240360192 base_runner.py:69] task.lm.stack.emb_tpl.vn.scale : NoneType I0422 04:50:54.351133 140630240360192 base_runner.py:69] task.lm.stack.emb_tpl.vn.seed : NoneType I0422 04:50:54.351212 140630240360192 base_runner.py:69] task.lm.stack.encoder_tpl.allow_implicit_capture : NoneType I0422 04:50:54.351294 140630240360192 base_runner.py:69] task.lm.stack.encoder_tpl.cls : type/lingvo.core.layers_with_gpipe/GPipeTransformerLayer I0422 04:50:54.351376 140630240360192 base_runner.py:69] task.lm.stack.encoder_tpl.dtype : float32 I0422 04:50:54.351457 140630240360192 base_runner.py:69] task.lm.stack.encoder_tpl.fprop_dtype : NoneType I0422 04:50:54.351537 140630240360192 base_runner.py:69] task.lm.stack.encoder_tpl.has_aux_atten : False I0422 04:50:54.351618 140630240360192 base_runner.py:69] task.lm.stack.encoder_tpl.inference_driver_name : NoneType I0422 04:50:54.351697 140630240360192 base_runner.py:69] task.lm.stack.encoder_tpl.is_decoder : False I0422 04:50:54.351778 140630240360192 base_runner.py:69] task.lm.stack.encoder_tpl.is_eval : NoneType I0422 04:50:54.351859 140630240360192 base_runner.py:69] task.lm.stack.encoder_tpl.is_inference : NoneType I0422 04:50:54.351939 140630240360192 base_runner.py:69] task.lm.stack.encoder_tpl.is_transparent : False I0422 04:50:54.352020 140630240360192 base_runner.py:69] task.lm.stack.encoder_tpl.mask_self_atten : True I0422 04:50:54.352101 140630240360192 base_runner.py:69] task.lm.stack.encoder_tpl.name : '' I0422 04:50:54.352181 140630240360192 base_runner.py:69] task.lm.stack.encoder_tpl.num_transparent_outputs : 0 I0422 04:50:54.352262 140630240360192 base_runner.py:69] task.lm.stack.encoder_tpl.output_dim : 0 I0422 04:50:54.352343 140630240360192 base_runner.py:69] task.lm.stack.encoder_tpl.packed_input : False I0422 04:50:54.352423 140630240360192 base_runner.py:69] task.lm.stack.encoder_tpl.params_init.method : 'xavier' I0422 04:50:54.352504 140630240360192 base_runner.py:69] task.lm.stack.encoder_tpl.params_init.scale : 1.000001 I0422 04:50:54.352585 140630240360192 base_runner.py:69] task.lm.stack.encoder_tpl.params_init.seed : NoneType I0422 04:50:54.352664 140630240360192 base_runner.py:69] task.lm.stack.encoder_tpl.random_seed : NoneType I0422 04:50:54.352745 140630240360192 base_runner.py:69] task.lm.stack.encoder_tpl.skip_lp_regularization : NoneType I0422 04:50:54.352826 140630240360192 base_runner.py:69] task.lm.stack.encoder_tpl.source_dim : 0 I0422 04:50:54.352907 140630240360192 base_runner.py:69] task.lm.stack.encoder_tpl.tr_atten_tpl.add_unnormalized_input : False I0422 04:50:54.352988 140630240360192 base_runner.py:69] task.lm.stack.encoder_tpl.tr_atten_tpl.allow_implicit_capture : NoneType I0422 04:50:54.353070 140630240360192 base_runner.py:69] task.lm.stack.encoder_tpl.tr_atten_tpl.atten_dropout_prob : 0.0 I0422 04:50:54.353152 140630240360192 base_runner.py:69] task.lm.stack.encoder_tpl.tr_atten_tpl.atten_hidden_dim : 0 I0422 04:50:54.353233 140630240360192 base_runner.py:69] task.lm.stack.encoder_tpl.tr_atten_tpl.atten_tpl.allow_implicit_capture : NoneType I0422 04:50:54.353313 140630240360192 base_runner.py:69] task.lm.stack.encoder_tpl.tr_atten_tpl.atten_tpl.atten_dropout_deterministic : False I0422 04:50:54.353394 140630240360192 base_runner.py:69] task.lm.stack.encoder_tpl.tr_atten_tpl.atten_tpl.atten_dropout_prob : 0.0 I0422 04:50:54.353475 140630240360192 base_runner.py:69] task.lm.stack.encoder_tpl.tr_atten_tpl.atten_tpl.cls : type/lingvo.core.attention/MultiHeadedAttention I0422 04:50:54.353555 140630240360192 base_runner.py:69] task.lm.stack.encoder_tpl.tr_atten_tpl.atten_tpl.context_dim : 0 I0422 04:50:54.353636 140630240360192 base_runner.py:69] task.lm.stack.encoder_tpl.tr_atten_tpl.atten_tpl.ctx_post_proj_dim : 0 I0422 04:50:54.353718 140630240360192 base_runner.py:69] task.lm.stack.encoder_tpl.tr_atten_tpl.atten_tpl.dtype : float32 I0422 04:50:54.353806 140630240360192 base_runner.py:69] task.lm.stack.encoder_tpl.tr_atten_tpl.atten_tpl.enable_ctx_post_proj : True I0422 04:50:54.353888 140630240360192 base_runner.py:69] task.lm.stack.encoder_tpl.tr_atten_tpl.atten_tpl.enable_ctx_pre_proj : True I0422 04:50:54.353972 140630240360192 base_runner.py:69] task.lm.stack.encoder_tpl.tr_atten_tpl.atten_tpl.enable_query_proj : True I0422 04:50:54.354054 140630240360192 base_runner.py:69] task.lm.stack.encoder_tpl.tr_atten_tpl.atten_tpl.enable_source_proj : True I0422 04:50:54.354135 140630240360192 base_runner.py:69] task.lm.stack.encoder_tpl.tr_atten_tpl.atten_tpl.fprop_dtype : NoneType I0422 04:50:54.354217 140630240360192 base_runner.py:69] task.lm.stack.encoder_tpl.tr_atten_tpl.atten_tpl.hidden_dim : 0 I0422 04:50:54.354299 140630240360192 base_runner.py:69] task.lm.stack.encoder_tpl.tr_atten_tpl.atten_tpl.inference_driver_name : NoneType I0422 04:50:54.354378 140630240360192 base_runner.py:69] task.lm.stack.encoder_tpl.tr_atten_tpl.atten_tpl.inner_atten_params.allow_implicit_capture : NoneType I0422 04:50:54.354460 140630240360192 base_runner.py:69] task.lm.stack.encoder_tpl.tr_atten_tpl.atten_tpl.inner_atten_params.atten_dropout_deterministic : False I0422 04:50:54.354541 140630240360192 base_runner.py:69] task.lm.stack.encoder_tpl.tr_atten_tpl.atten_tpl.inner_atten_params.atten_dropout_prob : 0.0 I0422 04:50:54.354621 140630240360192 base_runner.py:69] task.lm.stack.encoder_tpl.tr_atten_tpl.atten_tpl.inner_atten_params.cls : type/lingvo.core.attention/DotProductAttention I0422 04:50:54.354702 140630240360192 base_runner.py:69] task.lm.stack.encoder_tpl.tr_atten_tpl.atten_tpl.inner_atten_params.dtype : float32 I0422 04:50:54.354784 140630240360192 base_runner.py:69] task.lm.stack.encoder_tpl.tr_atten_tpl.atten_tpl.inner_atten_params.fprop_dtype : NoneType I0422 04:50:54.354863 140630240360192 base_runner.py:69] task.lm.stack.encoder_tpl.tr_atten_tpl.atten_tpl.inner_atten_params.hidden_dim : 0 I0422 04:50:54.354960 140630240360192 base_runner.py:69] task.lm.stack.encoder_tpl.tr_atten_tpl.atten_tpl.inner_atten_params.inference_driver_name : NoneType I0422 04:50:54.355043 140630240360192 base_runner.py:69] task.lm.stack.encoder_tpl.tr_atten_tpl.atten_tpl.inner_atten_params.is_eval : NoneType I0422 04:50:54.355125 140630240360192 base_runner.py:69] task.lm.stack.encoder_tpl.tr_atten_tpl.atten_tpl.inner_atten_params.is_inference : NoneType I0422 04:50:54.355206 140630240360192 base_runner.py:69] task.lm.stack.encoder_tpl.tr_atten_tpl.atten_tpl.inner_atten_params.name : '' I0422 04:50:54.355287 140630240360192 base_runner.py:69] task.lm.stack.encoder_tpl.tr_atten_tpl.atten_tpl.inner_atten_params.packed_input : False I0422 04:50:54.355367 140630240360192 base_runner.py:69] task.lm.stack.encoder_tpl.tr_atten_tpl.atten_tpl.inner_atten_params.params_init.method : 'xavier' I0422 04:50:54.355448 140630240360192 base_runner.py:69] task.lm.stack.encoder_tpl.tr_atten_tpl.atten_tpl.inner_atten_params.params_init.scale : 1.000001 I0422 04:50:54.355530 140630240360192 base_runner.py:69] task.lm.stack.encoder_tpl.tr_atten_tpl.atten_tpl.inner_atten_params.params_init.seed : NoneType I0422 04:50:54.355611 140630240360192 base_runner.py:69] task.lm.stack.encoder_tpl.tr_atten_tpl.atten_tpl.inner_atten_params.qdomain.default : NoneType I0422 04:50:54.355691 140630240360192 base_runner.py:69] task.lm.stack.encoder_tpl.tr_atten_tpl.atten_tpl.inner_atten_params.qdomain.fullyconnected : NoneType I0422 04:50:54.355773 140630240360192 base_runner.py:69] task.lm.stack.encoder_tpl.tr_atten_tpl.atten_tpl.inner_atten_params.qdomain.softmax : NoneType I0422 04:50:54.355854 140630240360192 base_runner.py:69] task.lm.stack.encoder_tpl.tr_atten_tpl.atten_tpl.inner_atten_params.query_dim : 0 I0422 04:50:54.355936 140630240360192 base_runner.py:69] task.lm.stack.encoder_tpl.tr_atten_tpl.atten_tpl.inner_atten_params.random_seed : NoneType I0422 04:50:54.356017 140630240360192 base_runner.py:69] task.lm.stack.encoder_tpl.tr_atten_tpl.atten_tpl.inner_atten_params.skip_lp_regularization : NoneType I0422 04:50:54.356106 140630240360192 base_runner.py:69] task.lm.stack.encoder_tpl.tr_atten_tpl.atten_tpl.inner_atten_params.source_dim : 0 I0422 04:50:54.356188 140630240360192 base_runner.py:69] task.lm.stack.encoder_tpl.tr_atten_tpl.atten_tpl.inner_atten_params.vn.global_vn : False I0422 04:50:54.356270 140630240360192 base_runner.py:69] task.lm.stack.encoder_tpl.tr_atten_tpl.atten_tpl.inner_atten_params.vn.per_step_vn : False I0422 04:50:54.356350 140630240360192 base_runner.py:69] task.lm.stack.encoder_tpl.tr_atten_tpl.atten_tpl.inner_atten_params.vn.scale : NoneType I0422 04:50:54.356431 140630240360192 base_runner.py:69] task.lm.stack.encoder_tpl.tr_atten_tpl.atten_tpl.inner_atten_params.vn.seed : NoneType I0422 04:50:54.356513 140630240360192 base_runner.py:69] task.lm.stack.encoder_tpl.tr_atten_tpl.atten_tpl.is_eval : NoneType I0422 04:50:54.356595 140630240360192 base_runner.py:69] task.lm.stack.encoder_tpl.tr_atten_tpl.atten_tpl.is_inference : NoneType I0422 04:50:54.356676 140630240360192 base_runner.py:69] task.lm.stack.encoder_tpl.tr_atten_tpl.atten_tpl.name : '' I0422 04:50:54.356758 140630240360192 base_runner.py:69] task.lm.stack.encoder_tpl.tr_atten_tpl.atten_tpl.num_attention_heads : 2 I0422 04:50:54.356838 140630240360192 base_runner.py:69] task.lm.stack.encoder_tpl.tr_atten_tpl.atten_tpl.packed_input : False I0422 04:50:54.356920 140630240360192 base_runner.py:69] task.lm.stack.encoder_tpl.tr_atten_tpl.atten_tpl.params_init.method : 'xavier' I0422 04:50:54.357001 140630240360192 base_runner.py:69] task.lm.stack.encoder_tpl.tr_atten_tpl.atten_tpl.params_init.scale : 1.0 I0422 04:50:54.357080 140630240360192 base_runner.py:69] task.lm.stack.encoder_tpl.tr_atten_tpl.atten_tpl.params_init.seed : NoneType I0422 04:50:54.357161 140630240360192 base_runner.py:69] task.lm.stack.encoder_tpl.tr_atten_tpl.atten_tpl.qdomain.atten_context : NoneType I0422 04:50:54.357243 140630240360192 base_runner.py:69] task.lm.stack.encoder_tpl.tr_atten_tpl.atten_tpl.qdomain.default : NoneType I0422 04:50:54.357323 140630240360192 base_runner.py:69] task.lm.stack.encoder_tpl.tr_atten_tpl.atten_tpl.qdomain.fullyconnected : NoneType I0422 04:50:54.357403 140630240360192 base_runner.py:69] task.lm.stack.encoder_tpl.tr_atten_tpl.atten_tpl.qdomain.softmax : NoneType I0422 04:50:54.357485 140630240360192 base_runner.py:69] task.lm.stack.encoder_tpl.tr_atten_tpl.atten_tpl.query_dim : 0 I0422 04:50:54.357564 140630240360192 base_runner.py:69] task.lm.stack.encoder_tpl.tr_atten_tpl.atten_tpl.random_seed : NoneType I0422 04:50:54.357645 140630240360192 base_runner.py:69] task.lm.stack.encoder_tpl.tr_atten_tpl.atten_tpl.skip_lp_regularization : NoneType I0422 04:50:54.357727 140630240360192 base_runner.py:69] task.lm.stack.encoder_tpl.tr_atten_tpl.atten_tpl.source_dim : 0 I0422 04:50:54.357808 140630240360192 base_runner.py:69] task.lm.stack.encoder_tpl.tr_atten_tpl.atten_tpl.use_source_vec_as_attention_value : False I0422 04:50:54.357906 140630240360192 base_runner.py:69] task.lm.stack.encoder_tpl.tr_atten_tpl.atten_tpl.vn.global_vn : False I0422 04:50:54.357986 140630240360192 base_runner.py:69] task.lm.stack.encoder_tpl.tr_atten_tpl.atten_tpl.vn.per_step_vn : False I0422 04:50:54.358066 140630240360192 base_runner.py:69] task.lm.stack.encoder_tpl.tr_atten_tpl.atten_tpl.vn.scale : NoneType I0422 04:50:54.358145 140630240360192 base_runner.py:69] task.lm.stack.encoder_tpl.tr_atten_tpl.atten_tpl.vn.seed : NoneType I0422 04:50:54.358227 140630240360192 base_runner.py:69] task.lm.stack.encoder_tpl.tr_atten_tpl.cls : type/lingvo.core.layers_with_attention/TransformerAttentionLayer I0422 04:50:54.358308 140630240360192 base_runner.py:69] task.lm.stack.encoder_tpl.tr_atten_tpl.dtype : float32 I0422 04:50:54.358387 140630240360192 base_runner.py:69] task.lm.stack.encoder_tpl.tr_atten_tpl.fprop_dtype : NoneType I0422 04:50:54.358468 140630240360192 base_runner.py:69] task.lm.stack.encoder_tpl.tr_atten_tpl.inference_driver_name : NoneType I0422 04:50:54.358547 140630240360192 base_runner.py:69] task.lm.stack.encoder_tpl.tr_atten_tpl.is_eval : NoneType I0422 04:50:54.358633 140630240360192 base_runner.py:69] task.lm.stack.encoder_tpl.tr_atten_tpl.is_inference : NoneType I0422 04:50:54.358714 140630240360192 base_runner.py:69] task.lm.stack.encoder_tpl.tr_atten_tpl.is_masked : True I0422 04:50:54.358793 140630240360192 base_runner.py:69] task.lm.stack.encoder_tpl.tr_atten_tpl.ln_tpl.allow_implicit_capture : NoneType I0422 04:50:54.358880 140630240360192 base_runner.py:69] task.lm.stack.encoder_tpl.tr_atten_tpl.ln_tpl.cls : type/lingvo.core.layers/LayerNorm I0422 04:50:54.358964 140630240360192 base_runner.py:69] task.lm.stack.encoder_tpl.tr_atten_tpl.ln_tpl.dtype : float32 I0422 04:50:54.359045 140630240360192 base_runner.py:69] task.lm.stack.encoder_tpl.tr_atten_tpl.ln_tpl.epsilon : 1e-06 I0422 04:50:54.359126 140630240360192 base_runner.py:69] task.lm.stack.encoder_tpl.tr_atten_tpl.ln_tpl.fprop_dtype : NoneType I0422 04:50:54.359206 140630240360192 base_runner.py:69] task.lm.stack.encoder_tpl.tr_atten_tpl.ln_tpl.inference_driver_name : NoneType I0422 04:50:54.359286 140630240360192 base_runner.py:69] task.lm.stack.encoder_tpl.tr_atten_tpl.ln_tpl.input_dim : 0 I0422 04:50:54.359365 140630240360192 base_runner.py:69] task.lm.stack.encoder_tpl.tr_atten_tpl.ln_tpl.is_eval : NoneType I0422 04:50:54.359446 140630240360192 base_runner.py:69] task.lm.stack.encoder_tpl.tr_atten_tpl.ln_tpl.is_inference : NoneType I0422 04:50:54.359525 140630240360192 base_runner.py:69] task.lm.stack.encoder_tpl.tr_atten_tpl.ln_tpl.name : '' I0422 04:50:54.359606 140630240360192 base_runner.py:69] task.lm.stack.encoder_tpl.tr_atten_tpl.ln_tpl.params_init.method : 'xavier' I0422 04:50:54.359685 140630240360192 base_runner.py:69] task.lm.stack.encoder_tpl.tr_atten_tpl.ln_tpl.params_init.scale : 1.000001 I0422 04:50:54.359766 140630240360192 base_runner.py:69] task.lm.stack.encoder_tpl.tr_atten_tpl.ln_tpl.params_init.seed : NoneType I0422 04:50:54.359847 140630240360192 base_runner.py:69] task.lm.stack.encoder_tpl.tr_atten_tpl.ln_tpl.random_seed : NoneType I0422 04:50:54.359926 140630240360192 base_runner.py:69] task.lm.stack.encoder_tpl.tr_atten_tpl.ln_tpl.skip_lp_regularization : NoneType I0422 04:50:54.360007 140630240360192 base_runner.py:69] task.lm.stack.encoder_tpl.tr_atten_tpl.ln_tpl.vn.global_vn : False I0422 04:50:54.360085 140630240360192 base_runner.py:69] task.lm.stack.encoder_tpl.tr_atten_tpl.ln_tpl.vn.per_step_vn : False I0422 04:50:54.360166 140630240360192 base_runner.py:69] task.lm.stack.encoder_tpl.tr_atten_tpl.ln_tpl.vn.scale : NoneType I0422 04:50:54.360245 140630240360192 base_runner.py:69] task.lm.stack.encoder_tpl.tr_atten_tpl.ln_tpl.vn.seed : NoneType I0422 04:50:54.360325 140630240360192 base_runner.py:69] task.lm.stack.encoder_tpl.tr_atten_tpl.name : '' I0422 04:50:54.360404 140630240360192 base_runner.py:69] task.lm.stack.encoder_tpl.tr_atten_tpl.num_attention_heads : 16 I0422 04:50:54.360483 140630240360192 base_runner.py:69] task.lm.stack.encoder_tpl.tr_atten_tpl.packed_input : False I0422 04:50:54.360562 140630240360192 base_runner.py:69] task.lm.stack.encoder_tpl.tr_atten_tpl.params_init.method : 'xavier' I0422 04:50:54.360641 140630240360192 base_runner.py:69] task.lm.stack.encoder_tpl.tr_atten_tpl.params_init.scale : 1.000001 I0422 04:50:54.360719 140630240360192 base_runner.py:69] task.lm.stack.encoder_tpl.tr_atten_tpl.params_init.seed : NoneType I0422 04:50:54.360800 140630240360192 base_runner.py:69] task.lm.stack.encoder_tpl.tr_atten_tpl.random_seed : NoneType I0422 04:50:54.360881 140630240360192 base_runner.py:69] task.lm.stack.encoder_tpl.tr_atten_tpl.residual_dropout_prob : 0.0 I0422 04:50:54.360960 140630240360192 base_runner.py:69] task.lm.stack.encoder_tpl.tr_atten_tpl.residual_dropout_tpl.allow_implicit_capture : NoneType I0422 04:50:54.361041 140630240360192 base_runner.py:69] task.lm.stack.encoder_tpl.tr_atten_tpl.residual_dropout_tpl.cls : type/lingvo.core.layers/DropoutLayer I0422 04:50:54.361121 140630240360192 base_runner.py:69] task.lm.stack.encoder_tpl.tr_atten_tpl.residual_dropout_tpl.dropout_at_eval : False I0422 04:50:54.361210 140630240360192 base_runner.py:69] task.lm.stack.encoder_tpl.tr_atten_tpl.residual_dropout_tpl.dtype : float32 I0422 04:50:54.361290 140630240360192 base_runner.py:69] task.lm.stack.encoder_tpl.tr_atten_tpl.residual_dropout_tpl.fprop_dtype : NoneType I0422 04:50:54.361371 140630240360192 base_runner.py:69] task.lm.stack.encoder_tpl.tr_atten_tpl.residual_dropout_tpl.inference_driver_name : NoneType I0422 04:50:54.361452 140630240360192 base_runner.py:69] task.lm.stack.encoder_tpl.tr_atten_tpl.residual_dropout_tpl.is_eval : NoneType I0422 04:50:54.361532 140630240360192 base_runner.py:69] task.lm.stack.encoder_tpl.tr_atten_tpl.residual_dropout_tpl.is_inference : NoneType I0422 04:50:54.361612 140630240360192 base_runner.py:69] task.lm.stack.encoder_tpl.tr_atten_tpl.residual_dropout_tpl.keep_prob : 1.0 I0422 04:50:54.361691 140630240360192 base_runner.py:69] task.lm.stack.encoder_tpl.tr_atten_tpl.residual_dropout_tpl.name : '' I0422 04:50:54.361772 140630240360192 base_runner.py:69] task.lm.stack.encoder_tpl.tr_atten_tpl.residual_dropout_tpl.noise_shape : NoneType I0422 04:50:54.361851 140630240360192 base_runner.py:69] task.lm.stack.encoder_tpl.tr_atten_tpl.residual_dropout_tpl.noise_shape_broadcast_dims : NoneType I0422 04:50:54.361932 140630240360192 base_runner.py:69] task.lm.stack.encoder_tpl.tr_atten_tpl.residual_dropout_tpl.params_init.method : 'xavier' I0422 04:50:54.362011 140630240360192 base_runner.py:69] task.lm.stack.encoder_tpl.tr_atten_tpl.residual_dropout_tpl.params_init.scale : 1.000001 I0422 04:50:54.362092 140630240360192 base_runner.py:69] task.lm.stack.encoder_tpl.tr_atten_tpl.residual_dropout_tpl.params_init.seed : NoneType I0422 04:50:54.362171 140630240360192 base_runner.py:69] task.lm.stack.encoder_tpl.tr_atten_tpl.residual_dropout_tpl.random_seed : NoneType I0422 04:50:54.362251 140630240360192 base_runner.py:69] task.lm.stack.encoder_tpl.tr_atten_tpl.residual_dropout_tpl.skip_lp_regularization : NoneType I0422 04:50:54.362332 140630240360192 base_runner.py:69] task.lm.stack.encoder_tpl.tr_atten_tpl.residual_dropout_tpl.vn.global_vn : False I0422 04:50:54.362411 140630240360192 base_runner.py:69] task.lm.stack.encoder_tpl.tr_atten_tpl.residual_dropout_tpl.vn.per_step_vn : False I0422 04:50:54.362492 140630240360192 base_runner.py:69] task.lm.stack.encoder_tpl.tr_atten_tpl.residual_dropout_tpl.vn.scale : NoneType I0422 04:50:54.362571 140630240360192 base_runner.py:69] task.lm.stack.encoder_tpl.tr_atten_tpl.residual_dropout_tpl.vn.seed : NoneType I0422 04:50:54.362652 140630240360192 base_runner.py:69] task.lm.stack.encoder_tpl.tr_atten_tpl.skip_lp_regularization : NoneType I0422 04:50:54.362731 140630240360192 base_runner.py:69] task.lm.stack.encoder_tpl.tr_atten_tpl.source_dim : 0 I0422 04:50:54.362812 140630240360192 base_runner.py:69] task.lm.stack.encoder_tpl.tr_atten_tpl.vn.global_vn : False I0422 04:50:54.362901 140630240360192 base_runner.py:69] task.lm.stack.encoder_tpl.tr_atten_tpl.vn.per_step_vn : False I0422 04:50:54.362987 140630240360192 base_runner.py:69] task.lm.stack.encoder_tpl.tr_atten_tpl.vn.scale : NoneType I0422 04:50:54.363068 140630240360192 base_runner.py:69] task.lm.stack.encoder_tpl.tr_atten_tpl.vn.seed : NoneType I0422 04:50:54.363147 140630240360192 base_runner.py:69] task.lm.stack.encoder_tpl.tr_aux_atten_tpl : NoneType I0422 04:50:54.363236 140630240360192 base_runner.py:69] task.lm.stack.encoder_tpl.tr_fflayer_tpl.activation : 'RELU' I0422 04:50:54.363313 140630240360192 base_runner.py:69] task.lm.stack.encoder_tpl.tr_fflayer_tpl.allow_implicit_capture : NoneType I0422 04:50:54.363392 140630240360192 base_runner.py:69] task.lm.stack.encoder_tpl.tr_fflayer_tpl.cls : type/lingvo.core.layers_with_attention/TransformerFeedForwardLayer I0422 04:50:54.363470 140630240360192 base_runner.py:69] task.lm.stack.encoder_tpl.tr_fflayer_tpl.dtype : float32 I0422 04:50:54.363548 140630240360192 base_runner.py:69] task.lm.stack.encoder_tpl.tr_fflayer_tpl.fflayer_tpl.activation : ['RELU', 'NONE'] I0422 04:50:54.363625 140630240360192 base_runner.py:69] task.lm.stack.encoder_tpl.tr_fflayer_tpl.fflayer_tpl.allow_implicit_capture : NoneType I0422 04:50:54.363711 140630240360192 base_runner.py:69] task.lm.stack.encoder_tpl.tr_fflayer_tpl.fflayer_tpl.batch_norm : False I0422 04:50:54.363790 140630240360192 base_runner.py:69] task.lm.stack.encoder_tpl.tr_fflayer_tpl.fflayer_tpl.bn_fold_weights : NoneType I0422 04:50:54.363867 140630240360192 base_runner.py:69] task.lm.stack.encoder_tpl.tr_fflayer_tpl.fflayer_tpl.cls : type/lingvo.core.layers/FeedForwardNet I0422 04:50:54.363944 140630240360192 base_runner.py:69] task.lm.stack.encoder_tpl.tr_fflayer_tpl.fflayer_tpl.dropout.allow_implicit_capture : NoneType I0422 04:50:54.364021 140630240360192 base_runner.py:69] task.lm.stack.encoder_tpl.tr_fflayer_tpl.fflayer_tpl.dropout.cls : type/lingvo.core.layers/DropoutLayer I0422 04:50:54.364098 140630240360192 base_runner.py:69] task.lm.stack.encoder_tpl.tr_fflayer_tpl.fflayer_tpl.dropout.dropout_at_eval : False I0422 04:50:54.364175 140630240360192 base_runner.py:69] task.lm.stack.encoder_tpl.tr_fflayer_tpl.fflayer_tpl.dropout.dtype : float32 I0422 04:50:54.364252 140630240360192 base_runner.py:69] task.lm.stack.encoder_tpl.tr_fflayer_tpl.fflayer_tpl.dropout.fprop_dtype : NoneType I0422 04:50:54.364329 140630240360192 base_runner.py:69] task.lm.stack.encoder_tpl.tr_fflayer_tpl.fflayer_tpl.dropout.inference_driver_name : NoneType I0422 04:50:54.364404 140630240360192 base_runner.py:69] task.lm.stack.encoder_tpl.tr_fflayer_tpl.fflayer_tpl.dropout.is_eval : NoneType I0422 04:50:54.364481 140630240360192 base_runner.py:69] task.lm.stack.encoder_tpl.tr_fflayer_tpl.fflayer_tpl.dropout.is_inference : NoneType I0422 04:50:54.364558 140630240360192 base_runner.py:69] task.lm.stack.encoder_tpl.tr_fflayer_tpl.fflayer_tpl.dropout.keep_prob : 1.0 I0422 04:50:54.364635 140630240360192 base_runner.py:69] task.lm.stack.encoder_tpl.tr_fflayer_tpl.fflayer_tpl.dropout.name : '' I0422 04:50:54.364712 140630240360192 base_runner.py:69] task.lm.stack.encoder_tpl.tr_fflayer_tpl.fflayer_tpl.dropout.noise_shape : NoneType I0422 04:50:54.364789 140630240360192 base_runner.py:69] task.lm.stack.encoder_tpl.tr_fflayer_tpl.fflayer_tpl.dropout.noise_shape_broadcast_dims : NoneType I0422 04:50:54.364866 140630240360192 base_runner.py:69] task.lm.stack.encoder_tpl.tr_fflayer_tpl.fflayer_tpl.dropout.params_init.method : 'xavier' I0422 04:50:54.364943 140630240360192 base_runner.py:69] task.lm.stack.encoder_tpl.tr_fflayer_tpl.fflayer_tpl.dropout.params_init.scale : 1.000001 I0422 04:50:54.365019 140630240360192 base_runner.py:69] task.lm.stack.encoder_tpl.tr_fflayer_tpl.fflayer_tpl.dropout.params_init.seed : NoneType I0422 04:50:54.365096 140630240360192 base_runner.py:69] task.lm.stack.encoder_tpl.tr_fflayer_tpl.fflayer_tpl.dropout.random_seed : NoneType I0422 04:50:54.365174 140630240360192 base_runner.py:69] task.lm.stack.encoder_tpl.tr_fflayer_tpl.fflayer_tpl.dropout.skip_lp_regularization : NoneType I0422 04:50:54.365252 140630240360192 base_runner.py:69] task.lm.stack.encoder_tpl.tr_fflayer_tpl.fflayer_tpl.dropout.vn.global_vn : False I0422 04:50:54.365329 140630240360192 base_runner.py:69] task.lm.stack.encoder_tpl.tr_fflayer_tpl.fflayer_tpl.dropout.vn.per_step_vn : False I0422 04:50:54.365405 140630240360192 base_runner.py:69] task.lm.stack.encoder_tpl.tr_fflayer_tpl.fflayer_tpl.dropout.vn.scale : NoneType I0422 04:50:54.365482 140630240360192 base_runner.py:69] task.lm.stack.encoder_tpl.tr_fflayer_tpl.fflayer_tpl.dropout.vn.seed : NoneType I0422 04:50:54.365559 140630240360192 base_runner.py:69] task.lm.stack.encoder_tpl.tr_fflayer_tpl.fflayer_tpl.dtype : float32 I0422 04:50:54.365637 140630240360192 base_runner.py:69] task.lm.stack.encoder_tpl.tr_fflayer_tpl.fflayer_tpl.fprop_dtype : NoneType I0422 04:50:54.365714 140630240360192 base_runner.py:69] task.lm.stack.encoder_tpl.tr_fflayer_tpl.fflayer_tpl.inference_driver_name : NoneType I0422 04:50:54.365791 140630240360192 base_runner.py:69] task.lm.stack.encoder_tpl.tr_fflayer_tpl.fflayer_tpl.input_dim : 0 I0422 04:50:54.365869 140630240360192 base_runner.py:69] task.lm.stack.encoder_tpl.tr_fflayer_tpl.fflayer_tpl.is_eval : NoneType I0422 04:50:54.365956 140630240360192 base_runner.py:69] task.lm.stack.encoder_tpl.tr_fflayer_tpl.fflayer_tpl.is_inference : NoneType I0422 04:50:54.366034 140630240360192 base_runner.py:69] task.lm.stack.encoder_tpl.tr_fflayer_tpl.fflayer_tpl.name : '' I0422 04:50:54.366112 140630240360192 base_runner.py:69] task.lm.stack.encoder_tpl.tr_fflayer_tpl.fflayer_tpl.params_init.method : 'xavier' I0422 04:50:54.366190 140630240360192 base_runner.py:69] task.lm.stack.encoder_tpl.tr_fflayer_tpl.fflayer_tpl.params_init.scale : 1.000001 I0422 04:50:54.366267 140630240360192 base_runner.py:69] task.lm.stack.encoder_tpl.tr_fflayer_tpl.fflayer_tpl.params_init.seed : NoneType I0422 04:50:54.366345 140630240360192 base_runner.py:69] task.lm.stack.encoder_tpl.tr_fflayer_tpl.fflayer_tpl.qdomain.default : NoneType I0422 04:50:54.366424 140630240360192 base_runner.py:69] task.lm.stack.encoder_tpl.tr_fflayer_tpl.fflayer_tpl.random_seed : NoneType I0422 04:50:54.366501 140630240360192 base_runner.py:69] task.lm.stack.encoder_tpl.tr_fflayer_tpl.fflayer_tpl.skip_connections : NoneType I0422 04:50:54.366578 140630240360192 base_runner.py:69] task.lm.stack.encoder_tpl.tr_fflayer_tpl.fflayer_tpl.skip_lp_regularization : NoneType I0422 04:50:54.366656 140630240360192 base_runner.py:69] task.lm.stack.encoder_tpl.tr_fflayer_tpl.fflayer_tpl.vn.global_vn : False I0422 04:50:54.366733 140630240360192 base_runner.py:69] task.lm.stack.encoder_tpl.tr_fflayer_tpl.fflayer_tpl.vn.per_step_vn : False I0422 04:50:54.366811 140630240360192 base_runner.py:69] task.lm.stack.encoder_tpl.tr_fflayer_tpl.fflayer_tpl.vn.scale : NoneType I0422 04:50:54.366897 140630240360192 base_runner.py:69] task.lm.stack.encoder_tpl.tr_fflayer_tpl.fflayer_tpl.vn.seed : NoneType I0422 04:50:54.366981 140630240360192 base_runner.py:69] task.lm.stack.encoder_tpl.tr_fflayer_tpl.fprop_dtype : NoneType I0422 04:50:54.367058 140630240360192 base_runner.py:69] task.lm.stack.encoder_tpl.tr_fflayer_tpl.hidden_dim : 8192 I0422 04:50:54.367136 140630240360192 base_runner.py:69] task.lm.stack.encoder_tpl.tr_fflayer_tpl.inference_driver_name : NoneType I0422 04:50:54.367214 140630240360192 base_runner.py:69] task.lm.stack.encoder_tpl.tr_fflayer_tpl.input_dim : 0 I0422 04:50:54.367292 140630240360192 base_runner.py:69] task.lm.stack.encoder_tpl.tr_fflayer_tpl.is_eval : NoneType I0422 04:50:54.367369 140630240360192 base_runner.py:69] task.lm.stack.encoder_tpl.tr_fflayer_tpl.is_inference : NoneType I0422 04:50:54.367446 140630240360192 base_runner.py:69] task.lm.stack.encoder_tpl.tr_fflayer_tpl.ln_tpl.allow_implicit_capture : NoneType I0422 04:50:54.367541 140630240360192 base_runner.py:69] task.lm.stack.encoder_tpl.tr_fflayer_tpl.ln_tpl.cls : type/lingvo.core.layers/LayerNorm I0422 04:50:54.367616 140630240360192 base_runner.py:69] task.lm.stack.encoder_tpl.tr_fflayer_tpl.ln_tpl.dtype : float32 I0422 04:50:54.367692 140630240360192 base_runner.py:69] task.lm.stack.encoder_tpl.tr_fflayer_tpl.ln_tpl.epsilon : 1e-06 I0422 04:50:54.367768 140630240360192 base_runner.py:69] task.lm.stack.encoder_tpl.tr_fflayer_tpl.ln_tpl.fprop_dtype : NoneType I0422 04:50:54.367844 140630240360192 base_runner.py:69] task.lm.stack.encoder_tpl.tr_fflayer_tpl.ln_tpl.inference_driver_name : NoneType I0422 04:50:54.367919 140630240360192 base_runner.py:69] task.lm.stack.encoder_tpl.tr_fflayer_tpl.ln_tpl.input_dim : 0 I0422 04:50:54.367995 140630240360192 base_runner.py:69] task.lm.stack.encoder_tpl.tr_fflayer_tpl.ln_tpl.is_eval : NoneType I0422 04:50:54.368072 140630240360192 base_runner.py:69] task.lm.stack.encoder_tpl.tr_fflayer_tpl.ln_tpl.is_inference : NoneType I0422 04:50:54.368149 140630240360192 base_runner.py:69] task.lm.stack.encoder_tpl.tr_fflayer_tpl.ln_tpl.name : '' I0422 04:50:54.368225 140630240360192 base_runner.py:69] task.lm.stack.encoder_tpl.tr_fflayer_tpl.ln_tpl.params_init.method : 'xavier' I0422 04:50:54.368300 140630240360192 base_runner.py:69] task.lm.stack.encoder_tpl.tr_fflayer_tpl.ln_tpl.params_init.scale : 1.000001 I0422 04:50:54.368383 140630240360192 base_runner.py:69] task.lm.stack.encoder_tpl.tr_fflayer_tpl.ln_tpl.params_init.seed : NoneType I0422 04:50:54.368462 140630240360192 base_runner.py:69] task.lm.stack.encoder_tpl.tr_fflayer_tpl.ln_tpl.random_seed : NoneType I0422 04:50:54.368539 140630240360192 base_runner.py:69] task.lm.stack.encoder_tpl.tr_fflayer_tpl.ln_tpl.skip_lp_regularization : NoneType I0422 04:50:54.368616 140630240360192 base_runner.py:69] task.lm.stack.encoder_tpl.tr_fflayer_tpl.ln_tpl.vn.global_vn : False I0422 04:50:54.368693 140630240360192 base_runner.py:69] task.lm.stack.encoder_tpl.tr_fflayer_tpl.ln_tpl.vn.per_step_vn : False I0422 04:50:54.368769 140630240360192 base_runner.py:69] task.lm.stack.encoder_tpl.tr_fflayer_tpl.ln_tpl.vn.scale : NoneType I0422 04:50:54.368845 140630240360192 base_runner.py:69] task.lm.stack.encoder_tpl.tr_fflayer_tpl.ln_tpl.vn.seed : NoneType I0422 04:50:54.368922 140630240360192 base_runner.py:69] task.lm.stack.encoder_tpl.tr_fflayer_tpl.name : '' I0422 04:50:54.369002 140630240360192 base_runner.py:69] task.lm.stack.encoder_tpl.tr_fflayer_tpl.output_dim : 0 I0422 04:50:54.369079 140630240360192 base_runner.py:69] task.lm.stack.encoder_tpl.tr_fflayer_tpl.params_init.method : 'xavier' I0422 04:50:54.369155 140630240360192 base_runner.py:69] task.lm.stack.encoder_tpl.tr_fflayer_tpl.params_init.scale : 1.000001 I0422 04:50:54.369232 140630240360192 base_runner.py:69] task.lm.stack.encoder_tpl.tr_fflayer_tpl.params_init.seed : NoneType I0422 04:50:54.369309 140630240360192 base_runner.py:69] task.lm.stack.encoder_tpl.tr_fflayer_tpl.random_seed : NoneType I0422 04:50:54.369386 140630240360192 base_runner.py:69] task.lm.stack.encoder_tpl.tr_fflayer_tpl.relu_dropout_prob : 0.0 I0422 04:50:54.369463 140630240360192 base_runner.py:69] task.lm.stack.encoder_tpl.tr_fflayer_tpl.res_proj_tpl.activation : 'RELU' I0422 04:50:54.369539 140630240360192 base_runner.py:69] task.lm.stack.encoder_tpl.tr_fflayer_tpl.res_proj_tpl.affine_last : False I0422 04:50:54.369616 140630240360192 base_runner.py:69] task.lm.stack.encoder_tpl.tr_fflayer_tpl.res_proj_tpl.allow_implicit_capture : NoneType I0422 04:50:54.369693 140630240360192 base_runner.py:69] task.lm.stack.encoder_tpl.tr_fflayer_tpl.res_proj_tpl.batch_norm : True I0422 04:50:54.369770 140630240360192 base_runner.py:69] task.lm.stack.encoder_tpl.tr_fflayer_tpl.res_proj_tpl.bias_init : 0.0 I0422 04:50:54.369846 140630240360192 base_runner.py:69] task.lm.stack.encoder_tpl.tr_fflayer_tpl.res_proj_tpl.bn_fold_weights : NoneType I0422 04:50:54.369924 140630240360192 base_runner.py:69] task.lm.stack.encoder_tpl.tr_fflayer_tpl.res_proj_tpl.cls : type/lingvo.core.layers/ProjectionLayer I0422 04:50:54.369999 140630240360192 base_runner.py:69] task.lm.stack.encoder_tpl.tr_fflayer_tpl.res_proj_tpl.dtype : float32 I0422 04:50:54.370075 140630240360192 base_runner.py:69] task.lm.stack.encoder_tpl.tr_fflayer_tpl.res_proj_tpl.fprop_dtype : NoneType I0422 04:50:54.370151 140630240360192 base_runner.py:69] task.lm.stack.encoder_tpl.tr_fflayer_tpl.res_proj_tpl.has_bias : False I0422 04:50:54.370229 140630240360192 base_runner.py:69] task.lm.stack.encoder_tpl.tr_fflayer_tpl.res_proj_tpl.inference_driver_name : NoneType I0422 04:50:54.370305 140630240360192 base_runner.py:69] task.lm.stack.encoder_tpl.tr_fflayer_tpl.res_proj_tpl.input_dim : 0 I0422 04:50:54.370382 140630240360192 base_runner.py:69] task.lm.stack.encoder_tpl.tr_fflayer_tpl.res_proj_tpl.is_eval : NoneType I0422 04:50:54.370459 140630240360192 base_runner.py:69] task.lm.stack.encoder_tpl.tr_fflayer_tpl.res_proj_tpl.is_inference : NoneType I0422 04:50:54.370536 140630240360192 base_runner.py:69] task.lm.stack.encoder_tpl.tr_fflayer_tpl.res_proj_tpl.name : '' I0422 04:50:54.370611 140630240360192 base_runner.py:69] task.lm.stack.encoder_tpl.tr_fflayer_tpl.res_proj_tpl.output_dim : 0 I0422 04:50:54.370687 140630240360192 base_runner.py:69] task.lm.stack.encoder_tpl.tr_fflayer_tpl.res_proj_tpl.params_init.method : 'xavier' I0422 04:50:54.370764 140630240360192 base_runner.py:69] task.lm.stack.encoder_tpl.tr_fflayer_tpl.res_proj_tpl.params_init.scale : 1.000001 I0422 04:50:54.370845 140630240360192 base_runner.py:69] task.lm.stack.encoder_tpl.tr_fflayer_tpl.res_proj_tpl.params_init.seed : NoneType I0422 04:50:54.370932 140630240360192 base_runner.py:69] task.lm.stack.encoder_tpl.tr_fflayer_tpl.res_proj_tpl.qdomain.default : NoneType I0422 04:50:54.371009 140630240360192 base_runner.py:69] task.lm.stack.encoder_tpl.tr_fflayer_tpl.res_proj_tpl.random_seed : NoneType I0422 04:50:54.371085 140630240360192 base_runner.py:69] task.lm.stack.encoder_tpl.tr_fflayer_tpl.res_proj_tpl.skip_lp_regularization : NoneType I0422 04:50:54.371160 140630240360192 base_runner.py:69] task.lm.stack.encoder_tpl.tr_fflayer_tpl.res_proj_tpl.vn.global_vn : False I0422 04:50:54.371237 140630240360192 base_runner.py:69] task.lm.stack.encoder_tpl.tr_fflayer_tpl.res_proj_tpl.vn.per_step_vn : False I0422 04:50:54.371313 140630240360192 base_runner.py:69] task.lm.stack.encoder_tpl.tr_fflayer_tpl.res_proj_tpl.vn.scale : NoneType I0422 04:50:54.371388 140630240360192 base_runner.py:69] task.lm.stack.encoder_tpl.tr_fflayer_tpl.res_proj_tpl.vn.seed : NoneType I0422 04:50:54.371464 140630240360192 base_runner.py:69] task.lm.stack.encoder_tpl.tr_fflayer_tpl.res_proj_tpl.weight_norm : False I0422 04:50:54.371541 140630240360192 base_runner.py:69] task.lm.stack.encoder_tpl.tr_fflayer_tpl.residual_dropout_prob : 0.0 I0422 04:50:54.371618 140630240360192 base_runner.py:69] task.lm.stack.encoder_tpl.tr_fflayer_tpl.residual_dropout_tpl.allow_implicit_capture : NoneType I0422 04:50:54.371694 140630240360192 base_runner.py:69] task.lm.stack.encoder_tpl.tr_fflayer_tpl.residual_dropout_tpl.cls : type/lingvo.core.layers/DropoutLayer I0422 04:50:54.371768 140630240360192 base_runner.py:69] task.lm.stack.encoder_tpl.tr_fflayer_tpl.residual_dropout_tpl.dropout_at_eval : False I0422 04:50:54.371845 140630240360192 base_runner.py:69] task.lm.stack.encoder_tpl.tr_fflayer_tpl.residual_dropout_tpl.dtype : float32 I0422 04:50:54.371922 140630240360192 base_runner.py:69] task.lm.stack.encoder_tpl.tr_fflayer_tpl.residual_dropout_tpl.fprop_dtype : NoneType I0422 04:50:54.371998 140630240360192 base_runner.py:69] task.lm.stack.encoder_tpl.tr_fflayer_tpl.residual_dropout_tpl.inference_driver_name : NoneType I0422 04:50:54.372073 140630240360192 base_runner.py:69] task.lm.stack.encoder_tpl.tr_fflayer_tpl.residual_dropout_tpl.is_eval : NoneType I0422 04:50:54.372150 140630240360192 base_runner.py:69] task.lm.stack.encoder_tpl.tr_fflayer_tpl.residual_dropout_tpl.is_inference : NoneType I0422 04:50:54.372227 140630240360192 base_runner.py:69] task.lm.stack.encoder_tpl.tr_fflayer_tpl.residual_dropout_tpl.keep_prob : 1.0 I0422 04:50:54.372304 140630240360192 base_runner.py:69] task.lm.stack.encoder_tpl.tr_fflayer_tpl.residual_dropout_tpl.name : '' I0422 04:50:54.372381 140630240360192 base_runner.py:69] task.lm.stack.encoder_tpl.tr_fflayer_tpl.residual_dropout_tpl.noise_shape : NoneType I0422 04:50:54.372457 140630240360192 base_runner.py:69] task.lm.stack.encoder_tpl.tr_fflayer_tpl.residual_dropout_tpl.noise_shape_broadcast_dims : NoneType I0422 04:50:54.372534 140630240360192 base_runner.py:69] task.lm.stack.encoder_tpl.tr_fflayer_tpl.residual_dropout_tpl.params_init.method : 'xavier' I0422 04:50:54.372611 140630240360192 base_runner.py:69] task.lm.stack.encoder_tpl.tr_fflayer_tpl.residual_dropout_tpl.params_init.scale : 1.000001 I0422 04:50:54.372687 140630240360192 base_runner.py:69] task.lm.stack.encoder_tpl.tr_fflayer_tpl.residual_dropout_tpl.params_init.seed : NoneType I0422 04:50:54.372764 140630240360192 base_runner.py:69] task.lm.stack.encoder_tpl.tr_fflayer_tpl.residual_dropout_tpl.random_seed : NoneType I0422 04:50:54.372839 140630240360192 base_runner.py:69] task.lm.stack.encoder_tpl.tr_fflayer_tpl.residual_dropout_tpl.skip_lp_regularization : NoneType I0422 04:50:54.372915 140630240360192 base_runner.py:69] task.lm.stack.encoder_tpl.tr_fflayer_tpl.residual_dropout_tpl.vn.global_vn : False I0422 04:50:54.372991 140630240360192 base_runner.py:69] task.lm.stack.encoder_tpl.tr_fflayer_tpl.residual_dropout_tpl.vn.per_step_vn : False I0422 04:50:54.373071 140630240360192 base_runner.py:69] task.lm.stack.encoder_tpl.tr_fflayer_tpl.residual_dropout_tpl.vn.scale : NoneType I0422 04:50:54.373147 140630240360192 base_runner.py:69] task.lm.stack.encoder_tpl.tr_fflayer_tpl.residual_dropout_tpl.vn.seed : NoneType I0422 04:50:54.373224 140630240360192 base_runner.py:69] task.lm.stack.encoder_tpl.tr_fflayer_tpl.skip_lp_regularization : NoneType I0422 04:50:54.373300 140630240360192 base_runner.py:69] task.lm.stack.encoder_tpl.tr_fflayer_tpl.vn.global_vn : False I0422 04:50:54.373375 140630240360192 base_runner.py:69] task.lm.stack.encoder_tpl.tr_fflayer_tpl.vn.per_step_vn : False I0422 04:50:54.373452 140630240360192 base_runner.py:69] task.lm.stack.encoder_tpl.tr_fflayer_tpl.vn.scale : NoneType I0422 04:50:54.373528 140630240360192 base_runner.py:69] task.lm.stack.encoder_tpl.tr_fflayer_tpl.vn.seed : NoneType I0422 04:50:54.373605 140630240360192 base_runner.py:69] task.lm.stack.encoder_tpl.transparent_merger_tpl : NoneType I0422 04:50:54.373682 140630240360192 base_runner.py:69] task.lm.stack.encoder_tpl.vn.global_vn : False I0422 04:50:54.373759 140630240360192 base_runner.py:69] task.lm.stack.encoder_tpl.vn.per_step_vn : False I0422 04:50:54.373836 140630240360192 base_runner.py:69] task.lm.stack.encoder_tpl.vn.scale : NoneType I0422 04:50:54.373913 140630240360192 base_runner.py:69] task.lm.stack.encoder_tpl.vn.seed : NoneType I0422 04:50:54.373990 140630240360192 base_runner.py:69] task.lm.stack.fprop_dtype : NoneType I0422 04:50:54.374066 140630240360192 base_runner.py:69] task.lm.stack.inference_driver_name : NoneType I0422 04:50:54.374142 140630240360192 base_runner.py:69] task.lm.stack.is_eval : NoneType I0422 04:50:54.374218 140630240360192 base_runner.py:69] task.lm.stack.is_inference : NoneType I0422 04:50:54.374298 140630240360192 base_runner.py:69] task.lm.stack.is_transparent : False I0422 04:50:54.374373 140630240360192 base_runner.py:69] task.lm.stack.model_dim : 1024 I0422 04:50:54.374449 140630240360192 base_runner.py:69] task.lm.stack.name : '' I0422 04:50:54.374526 140630240360192 base_runner.py:69] task.lm.stack.num_decoder_layers : 0 I0422 04:50:54.374602 140630240360192 base_runner.py:69] task.lm.stack.num_encoder_layers : 6 I0422 04:50:54.374677 140630240360192 base_runner.py:69] task.lm.stack.num_micro_batches : 1 I0422 04:50:54.374752 140630240360192 base_runner.py:69] task.lm.stack.num_splits : 1 I0422 04:50:54.374829 140630240360192 base_runner.py:69] task.lm.stack.num_transparent_outputs : 0 I0422 04:50:54.374914 140630240360192 base_runner.py:69] task.lm.stack.packed_input : False I0422 04:50:54.374993 140630240360192 base_runner.py:69] task.lm.stack.params_init.method : 'xavier' I0422 04:50:54.375078 140630240360192 base_runner.py:69] task.lm.stack.params_init.scale : 1.000001 I0422 04:50:54.375153 140630240360192 base_runner.py:69] task.lm.stack.params_init.seed : NoneType I0422 04:50:54.375226 140630240360192 base_runner.py:69] task.lm.stack.random_seed : NoneType I0422 04:50:54.375299 140630240360192 base_runner.py:69] task.lm.stack.skip_lp_regularization : NoneType I0422 04:50:54.375372 140630240360192 base_runner.py:69] task.lm.stack.splits : 1 I0422 04:50:54.375446 140630240360192 base_runner.py:69] task.lm.stack.state_dtype : NoneType I0422 04:50:54.375519 140630240360192 base_runner.py:69] task.lm.stack.transparent_merger_dropout_prob : 0.1 I0422 04:50:54.375592 140630240360192 base_runner.py:69] task.lm.stack.use_pipelined_embeddings : False I0422 04:50:54.375667 140630240360192 base_runner.py:69] task.lm.stack.vn.global_vn : False I0422 04:50:54.375740 140630240360192 base_runner.py:69] task.lm.stack.vn.per_step_vn : False I0422 04:50:54.375813 140630240360192 base_runner.py:69] task.lm.stack.vn.scale : NoneType I0422 04:50:54.375885 140630240360192 base_runner.py:69] task.lm.stack.vn.seed : NoneType I0422 04:50:54.375958 140630240360192 base_runner.py:69] task.lm.vn.global_vn : False I0422 04:50:54.376032 140630240360192 base_runner.py:69] task.lm.vn.per_step_vn : False I0422 04:50:54.376111 140630240360192 base_runner.py:69] task.lm.vn.scale : NoneType I0422 04:50:54.376187 140630240360192 base_runner.py:69] task.lm.vn.seed : NoneType I0422 04:50:54.376260 140630240360192 base_runner.py:69] task.lm.vocab_size : 32000 I0422 04:50:54.376333 140630240360192 base_runner.py:69] task.name : '1bwds_wpm_level_lm' I0422 04:50:54.376406 140630240360192 base_runner.py:69] task.online_encoder : NoneType I0422 04:50:54.376481 140630240360192 base_runner.py:69] task.params_init.method : 'xavier' I0422 04:50:54.376554 140630240360192 base_runner.py:69] task.params_init.scale : 1.000001 I0422 04:50:54.376629 140630240360192 base_runner.py:69] task.params_init.seed : NoneType I0422 04:50:54.376703 140630240360192 base_runner.py:69] task.random_seed : NoneType I0422 04:50:54.376777 140630240360192 base_runner.py:69] task.skip_lp_regularization : NoneType I0422 04:50:54.376852 140630240360192 base_runner.py:69] task.train.bprop_variable_filter : NoneType I0422 04:50:54.376924 140630240360192 base_runner.py:69] task.train.clip_gradient_norm_to_value : 0.0 I0422 04:50:54.376998 140630240360192 base_runner.py:69] task.train.clip_gradient_single_norm_to_value : 0.0 I0422 04:50:54.377073 140630240360192 base_runner.py:69] task.train.colocate_gradients_with_ops : True I0422 04:50:54.377146 140630240360192 base_runner.py:69] task.train.early_stop.metric_history.jobname : 'eval_dev' I0422 04:50:54.377221 140630240360192 base_runner.py:69] task.train.early_stop.metric_history.local_filesystem : False I0422 04:50:54.377295 140630240360192 base_runner.py:69] task.train.early_stop.metric_history.logdir : '' I0422 04:50:54.377368 140630240360192 base_runner.py:69] task.train.early_stop.metric_history.metric : 'log_pplx' I0422 04:50:54.377441 140630240360192 base_runner.py:69] task.train.early_stop.metric_history.minimize : True I0422 04:50:54.377515 140630240360192 base_runner.py:69] task.train.early_stop.metric_history.name : 'MetricHistory' I0422 04:50:54.377588 140630240360192 base_runner.py:69] task.train.early_stop.metric_history.tfevent_file : False I0422 04:50:54.377680 140630240360192 base_runner.py:69] task.train.early_stop.name : 'EarlyStop' I0422 04:50:54.377753 140630240360192 base_runner.py:69] task.train.early_stop.tolerance : 0.0 I0422 04:50:54.377825 140630240360192 base_runner.py:69] task.train.early_stop.verbose : True I0422 04:50:54.377897 140630240360192 base_runner.py:69] task.train.early_stop.window : 0 I0422 04:50:54.377970 140630240360192 base_runner.py:69] task.train.ema_decay : 0.0 I0422 04:50:54.378043 140630240360192 base_runner.py:69] task.train.gate_gradients : False I0422 04:50:54.378117 140630240360192 base_runner.py:69] task.train.grad_aggregation_method : 1 I0422 04:50:54.378190 140630240360192 base_runner.py:69] task.train.grad_norm_to_clip_to_zero : 0.0 I0422 04:50:54.378262 140630240360192 base_runner.py:69] task.train.grad_norm_tracker : NoneType I0422 04:50:54.378334 140630240360192 base_runner.py:69] task.train.init_from_checkpoint_rules : {} I0422 04:50:54.378407 140630240360192 base_runner.py:69] task.train.l1_regularizer_weight : NoneType I0422 04:50:54.378479 140630240360192 base_runner.py:69] task.train.l2_regularizer_weight : 1e-06 I0422 04:50:54.378552 140630240360192 base_runner.py:69] task.train.learning_rate : 0.5 I0422 04:50:54.378626 140630240360192 base_runner.py:69] task.train.lr_schedule.allow_implicit_capture : NoneType I0422 04:50:54.378699 140630240360192 base_runner.py:69] task.train.lr_schedule.cls : type/lingvo.core.lr_schedule/TransformerLearningRateSchedule I0422 04:50:54.378772 140630240360192 base_runner.py:69] task.train.lr_schedule.decay_end : NoneType I0422 04:50:54.378845 140630240360192 base_runner.py:69] task.train.lr_schedule.dtype : float32 I0422 04:50:54.378931 140630240360192 base_runner.py:69] task.train.lr_schedule.fprop_dtype : NoneType I0422 04:50:54.379005 140630240360192 base_runner.py:69] task.train.lr_schedule.inference_driver_name : NoneType I0422 04:50:54.379080 140630240360192 base_runner.py:69] task.train.lr_schedule.is_eval : NoneType I0422 04:50:54.379158 140630240360192 base_runner.py:69] task.train.lr_schedule.is_inference : NoneType I0422 04:50:54.379235 140630240360192 base_runner.py:69] task.train.lr_schedule.model_dim : 2048 I0422 04:50:54.379307 140630240360192 base_runner.py:69] task.train.lr_schedule.name : 'LRSched' I0422 04:50:54.379380 140630240360192 base_runner.py:69] task.train.lr_schedule.params_init.method : 'xavier' I0422 04:50:54.379455 140630240360192 base_runner.py:69] task.train.lr_schedule.params_init.scale : 1.000001 I0422 04:50:54.379528 140630240360192 base_runner.py:69] task.train.lr_schedule.params_init.seed : NoneType I0422 04:50:54.379601 140630240360192 base_runner.py:69] task.train.lr_schedule.random_seed : NoneType I0422 04:50:54.379674 140630240360192 base_runner.py:69] task.train.lr_schedule.skip_lp_regularization : NoneType I0422 04:50:54.379749 140630240360192 base_runner.py:69] task.train.lr_schedule.vn.global_vn : False I0422 04:50:54.379822 140630240360192 base_runner.py:69] task.train.lr_schedule.vn.per_step_vn : False I0422 04:50:54.379894 140630240360192 base_runner.py:69] task.train.lr_schedule.vn.scale : NoneType I0422 04:50:54.379968 140630240360192 base_runner.py:69] task.train.lr_schedule.vn.seed : NoneType I0422 04:50:54.380042 140630240360192 base_runner.py:69] task.train.lr_schedule.warmup_steps : 40000 I0422 04:50:54.380115 140630240360192 base_runner.py:69] task.train.lr_schedule.worker_replicas : 1 I0422 04:50:54.380188 140630240360192 base_runner.py:69] task.train.max_lstm_gradient_norm : 0.0 I0422 04:50:54.380261 140630240360192 base_runner.py:69] task.train.max_steps : 4000000 I0422 04:50:54.380335 140630240360192 base_runner.py:69] task.train.optimizer.allow_implicit_capture : NoneType I0422 04:50:54.380409 140630240360192 base_runner.py:69] task.train.optimizer.beta1 : 0.9 I0422 04:50:54.380481 140630240360192 base_runner.py:69] task.train.optimizer.beta2 : 0.997 I0422 04:50:54.380554 140630240360192 base_runner.py:69] task.train.optimizer.cls : type/lingvo.core.optimizer/Adam I0422 04:50:54.380628 140630240360192 base_runner.py:69] task.train.optimizer.dtype : float32 I0422 04:50:54.380701 140630240360192 base_runner.py:69] task.train.optimizer.epsilon : 1e-09 I0422 04:50:54.380774 140630240360192 base_runner.py:69] task.train.optimizer.fprop_dtype : NoneType I0422 04:50:54.380847 140630240360192 base_runner.py:69] task.train.optimizer.inference_driver_name : NoneType I0422 04:50:54.380920 140630240360192 base_runner.py:69] task.train.optimizer.is_eval : NoneType I0422 04:50:54.380995 140630240360192 base_runner.py:69] task.train.optimizer.is_inference : NoneType I0422 04:50:54.381067 140630240360192 base_runner.py:69] task.train.optimizer.name : 'Adam' I0422 04:50:54.381140 140630240360192 base_runner.py:69] task.train.optimizer.params_init.method : 'xavier' I0422 04:50:54.381213 140630240360192 base_runner.py:69] task.train.optimizer.params_init.scale : 1.000001 I0422 04:50:54.381287 140630240360192 base_runner.py:69] task.train.optimizer.params_init.seed : NoneType I0422 04:50:54.381360 140630240360192 base_runner.py:69] task.train.optimizer.random_seed : NoneType I0422 04:50:54.381433 140630240360192 base_runner.py:69] task.train.optimizer.skip_lp_regularization : NoneType I0422 04:50:54.381505 140630240360192 base_runner.py:69] task.train.optimizer.vn.global_vn : False I0422 04:50:54.381580 140630240360192 base_runner.py:69] task.train.optimizer.vn.per_step_vn : False I0422 04:50:54.381652 140630240360192 base_runner.py:69] task.train.optimizer.vn.scale : NoneType I0422 04:50:54.381726 140630240360192 base_runner.py:69] task.train.optimizer.vn.seed : NoneType I0422 04:50:54.381799 140630240360192 base_runner.py:69] task.train.pruning_hparams_dict : NoneType I0422 04:50:54.381872 140630240360192 base_runner.py:69] task.train.save_interval_seconds : 600 I0422 04:50:54.381946 140630240360192 base_runner.py:69] task.train.start_up_delay_steps : 200 I0422 04:50:54.382018 140630240360192 base_runner.py:69] task.train.sum_loss_across_tokens_in_batch : False I0422 04:50:54.382097 140630240360192 base_runner.py:69] task.train.summary_interval_steps : 100 I0422 04:50:54.382172 140630240360192 base_runner.py:69] task.train.tpu_steps_per_loop : 100 I0422 04:50:54.382245 140630240360192 base_runner.py:69] task.train.vn_start_step : 20000 I0422 04:50:54.382318 140630240360192 base_runner.py:69] task.train.vn_std : 0.0 I0422 04:50:54.382390 140630240360192 base_runner.py:69] task.vn.global_vn : False I0422 04:50:54.382462 140630240360192 base_runner.py:69] task.vn.per_step_vn : False I0422 04:50:54.382535 140630240360192 base_runner.py:69] task.vn.scale : NoneType I0422 04:50:54.382608 140630240360192 base_runner.py:69] task.vn.seed : NoneType I0422 04:50:54.382680 140630240360192 base_runner.py:69] train.early_stop.metric_history.jobname : 'eval_dev' I0422 04:50:54.382752 140630240360192 base_runner.py:69] train.early_stop.metric_history.local_filesystem : False I0422 04:50:54.382826 140630240360192 base_runner.py:69] train.early_stop.metric_history.logdir : '' I0422 04:50:54.382906 140630240360192 base_runner.py:69] train.early_stop.metric_history.metric : 'log_pplx' I0422 04:50:54.382981 140630240360192 base_runner.py:69] train.early_stop.metric_history.minimize : True I0422 04:50:54.383054 140630240360192 base_runner.py:69] train.early_stop.metric_history.name : 'MetricHistory' I0422 04:50:54.383128 140630240360192 base_runner.py:69] train.early_stop.metric_history.tfevent_file : False I0422 04:50:54.383202 140630240360192 base_runner.py:69] train.early_stop.name : 'EarlyStop' I0422 04:50:54.383274 140630240360192 base_runner.py:69] train.early_stop.tolerance : 0.0 I0422 04:50:54.383347 140630240360192 base_runner.py:69] train.early_stop.verbose : True I0422 04:50:54.383419 140630240360192 base_runner.py:69] train.early_stop.window : 0 I0422 04:50:54.383492 140630240360192 base_runner.py:69] train.ema_decay : 0.0 I0422 04:50:54.383567 140630240360192 base_runner.py:69] train.init_from_checkpoint_rules : {} I0422 04:50:54.383641 140630240360192 base_runner.py:69] train.max_steps : 4000000 I0422 04:50:54.383713 140630240360192 base_runner.py:69] train.save_interval_seconds : 600 I0422 04:50:54.383785 140630240360192 base_runner.py:69] train.start_up_delay_steps : 200 I0422 04:50:54.383860 140630240360192 base_runner.py:69] train.summary_interval_steps : 100 I0422 04:50:54.383933 140630240360192 base_runner.py:69] train.tpu_steps_per_loop : 100 I0422 04:50:54.384006 140630240360192 base_runner.py:69] vn.global_vn : False I0422 04:50:54.384078 140630240360192 base_runner.py:69] vn.per_step_vn : False I0422 04:50:54.384150 140630240360192 base_runner.py:69] vn.scale : NoneType I0422 04:50:54.384223 140630240360192 base_runner.py:69] vn.seed : NoneType I0422 04:50:54.384295 140630240360192 base_runner.py:69] I0422 04:50:54.384386 140630240360192 base_runner.py:70] ============================================================ I0422 04:50:54.386785 140630240360192 base_runner.py:115] Starting ... W0422 04:50:54.387017 140630240360192 deprecation_wrapper.py:119] From /tmp/lingvo/bazel-bin/lingvo/trainer.runfiles/__main__/lingvo/trainer.py:186: The name tf.gfile.MakeDirs is deprecated. Please use tf.io.gfile.makedirs instead. W0422 04:50:54.387264 140630240360192 deprecation_wrapper.py:119] From /tmp/lingvo/bazel-bin/lingvo/trainer.runfiles/__main__/lingvo/base_runner.py:324: The name tf.summary.FileWriter is deprecated. Please use tf.compat.v1.summary.FileWriter instead. W0422 04:50:54.388385 140630240360192 deprecation_wrapper.py:119] From /tmp/lingvo/bazel-bin/lingvo/trainer.runfiles/__main__/lingvo/trainer.py:192: The name tf.container is deprecated. Please use tf.compat.v1.container instead. I0422 04:50:54.388861 140630240360192 cluster.py:429] _LeastLoadedPlacer : ['/job:local/replica:0/task:0/device:CPU:0'] W0422 04:50:54.406366 140630240360192 deprecation_wrapper.py:119] From /tmp/lingvo/bazel-bin/lingvo/trainer.runfiles/__main__/lingvo/core/py_utils.py:1258: The name tf.variable_scope is deprecated. Please use tf.compat.v1.variable_scope instead. W0422 04:50:54.407196 140630240360192 deprecation_wrapper.py:119] From /tmp/lingvo/bazel-bin/lingvo/trainer.runfiles/__main__/lingvo/core/py_utils.py:1260: The name tf.train.get_or_create_global_step is deprecated. Please use tf.compat.v1.train.get_or_create_global_step instead. I0422 04:50:54.411585 140630240360192 cluster.py:447] Place variable global_step on /job:local/replica:0/task:0/device:CPU:0 8 W0422 04:50:54.434184 140630240360192 deprecation_wrapper.py:119] From /tmp/lingvo/bazel-bin/lingvo/trainer.runfiles/__main__/lingvo/core/py_utils.py:1250: The name tf.train.get_global_step is deprecated. Please use tf.compat.v1.train.get_global_step instead. I0422 04:50:54.434437 140630240360192 base_model.py:1116] Training parameters for : { early_stop: { metric_history: { "eval_dev" local_filesystem: False "/tmp/mnist/log" "log_pplx" minimize: True "MetricHistory" tfevent_file: False } "EarlyStop" tolerance: 0.0 verbose: True window: 0 } ema_decay: 0.0 init_from_checkpoint_rules: {} max_steps: 4000000 save_interval_seconds: 600 start_up_delay_steps: 200 summary_interval_steps: 100 tpu_steps_per_loop: 100 } I0422 04:50:54.462551 140630240360192 base_input_generator.py:510] bucket_batch_limit [8] W0422 04:50:54.487662 140630240360192 deprecation_wrapper.py:119] From /tmp/lingvo/bazel-bin/lingvo/trainer.runfiles/__main__/lingvo/tasks/lm/input_generator.py:49: The name tf.summary.histogram is deprecated. Please use tf.compat.v1.summary.histogram instead. W0422 04:50:54.492938 140630240360192 deprecation_wrapper.py:119] From /tmp/lingvo/bazel-bin/lingvo/trainer.runfiles/__main__/lingvo/core/base_model.py:257: The name tf.get_collection is deprecated. Please use tf.compat.v1.get_collection instead. W0422 04:50:54.738212 140630240360192 deprecation_wrapper.py:119] From /tmp/lingvo/bazel-bin/lingvo/trainer.runfiles/__main__/lingvo/core/quant_utils.py:364: The name tf.random_uniform is deprecated. Please use tf.random.uniform instead. I0422 04:50:54.747112 140630240360192 cluster.py:447] Place variable 1bwds_wpm_level_lm/transformerlm/encoder_0/multihead_self_atten/multihead_atten/source_proj/var on /job:local/replica:0/task:0/device:CPU:0 16777224 I0422 04:50:54.750199 140630240360192 py_utils.py:1220] Creating var 1bwds_wpm_level_lm/transformerlm/encoder_0/multihead_self_atten/multihead_atten/source_proj/var:0 shape=(2048, 2048) on device /job:local/replica:0/task:0/device:CPU:0 I0422 04:50:54.753550 140630240360192 cluster.py:447] Place variable 1bwds_wpm_level_lm/transformerlm/encoder_0/multihead_self_atten/multihead_atten/source_proj_b/var on /job:local/replica:0/task:0/device:CPU:0 16785416 I0422 04:50:54.755562 140630240360192 py_utils.py:1220] Creating var 1bwds_wpm_level_lm/transformerlm/encoder_0/multihead_self_atten/multihead_atten/source_proj_b/var:0 shape=(2048,) on device /job:local/replica:0/task:0/device:CPU:0 I0422 04:50:54.764199 140630240360192 cluster.py:447] Place variable 1bwds_wpm_level_lm/transformerlm/encoder_0/multihead_self_atten/multihead_atten/query_proj/var on /job:local/replica:0/task:0/device:CPU:0 33562632 I0422 04:50:54.766542 140630240360192 py_utils.py:1220] Creating var 1bwds_wpm_level_lm/transformerlm/encoder_0/multihead_self_atten/multihead_atten/query_proj/var:0 shape=(2048, 2048) on device /job:local/replica:0/task:0/device:CPU:0 I0422 04:50:54.769810 140630240360192 cluster.py:447] Place variable 1bwds_wpm_level_lm/transformerlm/encoder_0/multihead_self_atten/multihead_atten/query_proj_b/var on /job:local/replica:0/task:0/device:CPU:0 33570824 I0422 04:50:54.771819 140630240360192 py_utils.py:1220] Creating var 1bwds_wpm_level_lm/transformerlm/encoder_0/multihead_self_atten/multihead_atten/query_proj_b/var:0 shape=(2048,) on device /job:local/replica:0/task:0/device:CPU:0 I0422 04:50:54.780476 140630240360192 cluster.py:447] Place variable 1bwds_wpm_level_lm/transformerlm/encoder_0/multihead_self_atten/multihead_atten/ctx_proj/var on /job:local/replica:0/task:0/device:CPU:0 50348040 I0422 04:50:54.782843 140630240360192 py_utils.py:1220] Creating var 1bwds_wpm_level_lm/transformerlm/encoder_0/multihead_self_atten/multihead_atten/ctx_proj/var:0 shape=(2048, 2048) on device /job:local/replica:0/task:0/device:CPU:0 I0422 04:50:54.786111 140630240360192 cluster.py:447] Place variable 1bwds_wpm_level_lm/transformerlm/encoder_0/multihead_self_atten/multihead_atten/ctx_proj_b/var on /job:local/replica:0/task:0/device:CPU:0 50356232 I0422 04:50:54.788263 140630240360192 py_utils.py:1220] Creating var 1bwds_wpm_level_lm/transformerlm/encoder_0/multihead_self_atten/multihead_atten/ctx_proj_b/var:0 shape=(2048,) on device /job:local/replica:0/task:0/device:CPU:0 I0422 04:50:54.796864 140630240360192 cluster.py:447] Place variable 1bwds_wpm_level_lm/transformerlm/encoder_0/multihead_self_atten/multihead_atten/ctx_post_proj/var on /job:local/replica:0/task:0/device:CPU:0 67133448 I0422 04:50:54.799236 140630240360192 py_utils.py:1220] Creating var 1bwds_wpm_level_lm/transformerlm/encoder_0/multihead_self_atten/multihead_atten/ctx_post_proj/var:0 shape=(2048, 2048) on device /job:local/replica:0/task:0/device:CPU:0 I0422 04:50:54.802671 140630240360192 cluster.py:447] Place variable 1bwds_wpm_level_lm/transformerlm/encoder_0/multihead_self_atten/multihead_atten/ctx_post_proj_b/var on /job:local/replica:0/task:0/device:CPU:0 67141640 I0422 04:50:54.804682 140630240360192 py_utils.py:1220] Creating var 1bwds_wpm_level_lm/transformerlm/encoder_0/multihead_self_atten/multihead_atten/ctx_post_proj_b/var:0 shape=(2048,) on device /job:local/replica:0/task:0/device:CPU:0 I0422 04:50:54.809387 140630240360192 cluster.py:447] Place variable 1bwds_wpm_level_lm/transformerlm/encoder_0/multihead_self_atten/multihead_atten/inner_att/per_dim_scale/var on /job:local/replica:0/task:0/device:CPU:0 67142152 I0422 04:50:54.811418 140630240360192 py_utils.py:1220] Creating var 1bwds_wpm_level_lm/transformerlm/encoder_0/multihead_self_atten/multihead_atten/inner_att/per_dim_scale/var:0 shape=(128,) on device /job:local/replica:0/task:0/device:CPU:0 I0422 04:50:54.816246 140630240360192 cluster.py:447] Place variable 1bwds_wpm_level_lm/transformerlm/encoder_0/multihead_self_atten/atten_ln/bias/var on /job:local/replica:0/task:0/device:CPU:0 67150344 I0422 04:50:54.818223 140630240360192 py_utils.py:1220] Creating var 1bwds_wpm_level_lm/transformerlm/encoder_0/multihead_self_atten/atten_ln/bias/var:0 shape=(2048,) on device /job:local/replica:0/task:0/device:CPU:0 I0422 04:50:54.821652 140630240360192 cluster.py:447] Place variable 1bwds_wpm_level_lm/transformerlm/encoder_0/multihead_self_atten/atten_ln/scale/var on /job:local/replica:0/task:0/device:CPU:0 67158536 I0422 04:50:54.823678 140630240360192 py_utils.py:1220] Creating var 1bwds_wpm_level_lm/transformerlm/encoder_0/multihead_self_atten/atten_ln/scale/var:0 shape=(2048,) on device /job:local/replica:0/task:0/device:CPU:0 W0422 04:50:54.835653 140630240360192 deprecation_wrapper.py:119] From /tmp/lingvo/bazel-bin/lingvo/trainer.runfiles/__main__/lingvo/core/py_utils.py:1062: The name tf.logging.warning is deprecated. Please use tf.compat.v1.logging.warning instead. W0422 04:50:54.835769 140630240360192 py_utils.py:1064] WARNING!!! var w is using the default xavier initializer. Make sure this is intended. I0422 04:50:54.843506 140630240360192 cluster.py:447] Place variable 1bwds_wpm_level_lm/transformerlm/encoder_0/tr_fflayer/fflayer/fflayer_0/w/var on /job:local/replica:0/task:0/device:CPU:0 134267400 I0422 04:50:54.845873 140630240360192 py_utils.py:1220] Creating var 1bwds_wpm_level_lm/transformerlm/encoder_0/tr_fflayer/fflayer/fflayer_0/w/var:0 shape=(2048, 8192) on device /job:local/replica:0/task:0/device:CPU:0 I0422 04:50:54.849131 140630240360192 cluster.py:447] Place variable 1bwds_wpm_level_lm/transformerlm/encoder_0/tr_fflayer/fflayer/fflayer_0/b/var on /job:local/replica:0/task:0/device:CPU:0 134300168 I0422 04:50:54.851147 140630240360192 py_utils.py:1220] Creating var 1bwds_wpm_level_lm/transformerlm/encoder_0/tr_fflayer/fflayer/fflayer_0/b/var:0 shape=(8192,) on device /job:local/replica:0/task:0/device:CPU:0 W0422 04:50:54.853646 140630240360192 py_utils.py:1064] WARNING!!! var w is using the default xavier initializer. Make sure this is intended. I0422 04:50:54.861335 140630240360192 cluster.py:447] Place variable 1bwds_wpm_level_lm/transformerlm/encoder_0/tr_fflayer/fflayer/fflayer_1/w/var on /job:local/replica:0/task:0/device:CPU:0 201409032 I0422 04:50:54.863692 140630240360192 py_utils.py:1220] Creating var 1bwds_wpm_level_lm/transformerlm/encoder_0/tr_fflayer/fflayer/fflayer_1/w/var:0 shape=(8192, 2048) on device /job:local/replica:0/task:0/device:CPU:0 I0422 04:50:54.867098 140630240360192 cluster.py:447] Place variable 1bwds_wpm_level_lm/transformerlm/encoder_0/tr_fflayer/fflayer/fflayer_1/b/var on /job:local/replica:0/task:0/device:CPU:0 201417224 I0422 04:50:54.869093 140630240360192 py_utils.py:1220] Creating var 1bwds_wpm_level_lm/transformerlm/encoder_0/tr_fflayer/fflayer/fflayer_1/b/var:0 shape=(2048,) on device /job:local/replica:0/task:0/device:CPU:0 I0422 04:50:54.875037 140630240360192 cluster.py:447] Place variable 1bwds_wpm_level_lm/transformerlm/encoder_0/tr_fflayer/fflayer_ln/bias/var on /job:local/replica:0/task:0/device:CPU:0 201425416 I0422 04:50:54.877285 140630240360192 py_utils.py:1220] Creating var 1bwds_wpm_level_lm/transformerlm/encoder_0/tr_fflayer/fflayer_ln/bias/var:0 shape=(2048,) on device /job:local/replica:0/task:0/device:CPU:0 I0422 04:50:54.880546 140630240360192 cluster.py:447] Place variable 1bwds_wpm_level_lm/transformerlm/encoder_0/tr_fflayer/fflayer_ln/scale/var on /job:local/replica:0/task:0/device:CPU:0 201433608 I0422 04:50:54.882546 140630240360192 py_utils.py:1220] Creating var 1bwds_wpm_level_lm/transformerlm/encoder_0/tr_fflayer/fflayer_ln/scale/var:0 shape=(2048,) on device /job:local/replica:0/task:0/device:CPU:0 I0422 04:50:54.908543 140630240360192 cluster.py:447] Place variable 1bwds_wpm_level_lm/transformerlm/encoder_1/multihead_self_atten/multihead_atten/source_proj/var on /job:local/replica:0/task:0/device:CPU:0 218210824 I0422 04:50:54.910922 140630240360192 py_utils.py:1220] Creating var 1bwds_wpm_level_lm/transformerlm/encoder_1/multihead_self_atten/multihead_atten/source_proj/var:0 shape=(2048, 2048) on device /job:local/replica:0/task:0/device:CPU:0 I0422 04:50:54.914330 140630240360192 cluster.py:447] Place variable 1bwds_wpm_level_lm/transformerlm/encoder_1/multihead_self_atten/multihead_atten/source_proj_b/var on /job:local/replica:0/task:0/device:CPU:0 218219016 I0422 04:50:54.916358 140630240360192 py_utils.py:1220] Creating var 1bwds_wpm_level_lm/transformerlm/encoder_1/multihead_self_atten/multihead_atten/source_proj_b/var:0 shape=(2048,) on device /job:local/replica:0/task:0/device:CPU:0 I0422 04:50:54.924988 140630240360192 cluster.py:447] Place variable 1bwds_wpm_level_lm/transformerlm/encoder_1/multihead_self_atten/multihead_atten/query_proj/var on /job:local/replica:0/task:0/device:CPU:0 234996232 I0422 04:50:54.927372 140630240360192 py_utils.py:1220] Creating var 1bwds_wpm_level_lm/transformerlm/encoder_1/multihead_self_atten/multihead_atten/query_proj/var:0 shape=(2048, 2048) on device /job:local/replica:0/task:0/device:CPU:0 I0422 04:50:54.930773 140630240360192 cluster.py:447] Place variable 1bwds_wpm_level_lm/transformerlm/encoder_1/multihead_self_atten/multihead_atten/query_proj_b/var on /job:local/replica:0/task:0/device:CPU:0 235004424 I0422 04:50:54.932823 140630240360192 py_utils.py:1220] Creating var 1bwds_wpm_level_lm/transformerlm/encoder_1/multihead_self_atten/multihead_atten/query_proj_b/var:0 shape=(2048,) on device /job:local/replica:0/task:0/device:CPU:0 I0422 04:50:54.941521 140630240360192 cluster.py:447] Place variable 1bwds_wpm_level_lm/transformerlm/encoder_1/multihead_self_atten/multihead_atten/ctx_proj/var on /job:local/replica:0/task:0/device:CPU:0 251781640 I0422 04:50:54.943988 140630240360192 py_utils.py:1220] Creating var 1bwds_wpm_level_lm/transformerlm/encoder_1/multihead_self_atten/multihead_atten/ctx_proj/var:0 shape=(2048, 2048) on device /job:local/replica:0/task:0/device:CPU:0 I0422 04:50:54.947274 140630240360192 cluster.py:447] Place variable 1bwds_wpm_level_lm/transformerlm/encoder_1/multihead_self_atten/multihead_atten/ctx_proj_b/var on /job:local/replica:0/task:0/device:CPU:0 251789832 I0422 04:50:54.949286 140630240360192 py_utils.py:1220] Creating var 1bwds_wpm_level_lm/transformerlm/encoder_1/multihead_self_atten/multihead_atten/ctx_proj_b/var:0 shape=(2048,) on device /job:local/replica:0/task:0/device:CPU:0 I0422 04:50:54.958019 140630240360192 cluster.py:447] Place variable 1bwds_wpm_level_lm/transformerlm/encoder_1/multihead_self_atten/multihead_atten/ctx_post_proj/var on /job:local/replica:0/task:0/device:CPU:0 268567048 I0422 04:50:54.960479 140630240360192 py_utils.py:1220] Creating var 1bwds_wpm_level_lm/transformerlm/encoder_1/multihead_self_atten/multihead_atten/ctx_post_proj/var:0 shape=(2048, 2048) on device /job:local/replica:0/task:0/device:CPU:0 I0422 04:50:54.963809 140630240360192 cluster.py:447] Place variable 1bwds_wpm_level_lm/transformerlm/encoder_1/multihead_self_atten/multihead_atten/ctx_post_proj_b/var on /job:local/replica:0/task:0/device:CPU:0 268575240 I0422 04:50:54.965837 140630240360192 py_utils.py:1220] Creating var 1bwds_wpm_level_lm/transformerlm/encoder_1/multihead_self_atten/multihead_atten/ctx_post_proj_b/var:0 shape=(2048,) on device /job:local/replica:0/task:0/device:CPU:0 I0422 04:50:54.970503 140630240360192 cluster.py:447] Place variable 1bwds_wpm_level_lm/transformerlm/encoder_1/multihead_self_atten/multihead_atten/inner_att/per_dim_scale/var on /job:local/replica:0/task:0/device:CPU:0 268575752 I0422 04:50:54.973289 140630240360192 py_utils.py:1220] Creating var 1bwds_wpm_level_lm/transformerlm/encoder_1/multihead_self_atten/multihead_atten/inner_att/per_dim_scale/var:0 shape=(128,) on device /job:local/replica:0/task:0/device:CPU:0 I0422 04:50:54.978142 140630240360192 cluster.py:447] Place variable 1bwds_wpm_level_lm/transformerlm/encoder_1/multihead_self_atten/atten_ln/bias/var on /job:local/replica:0/task:0/device:CPU:0 268583944 I0422 04:50:54.980201 140630240360192 py_utils.py:1220] Creating var 1bwds_wpm_level_lm/transformerlm/encoder_1/multihead_self_atten/atten_ln/bias/var:0 shape=(2048,) on device /job:local/replica:0/task:0/device:CPU:0 I0422 04:50:54.983654 140630240360192 cluster.py:447] Place variable 1bwds_wpm_level_lm/transformerlm/encoder_1/multihead_self_atten/atten_ln/scale/var on /job:local/replica:0/task:0/device:CPU:0 268592136 I0422 04:50:54.985686 140630240360192 py_utils.py:1220] Creating var 1bwds_wpm_level_lm/transformerlm/encoder_1/multihead_self_atten/atten_ln/scale/var:0 shape=(2048,) on device /job:local/replica:0/task:0/device:CPU:0 W0422 04:50:54.996972 140630240360192 py_utils.py:1064] WARNING!!! var w is using the default xavier initializer. Make sure this is intended. I0422 04:50:55.004829 140630240360192 cluster.py:447] Place variable 1bwds_wpm_level_lm/transformerlm/encoder_1/tr_fflayer/fflayer/fflayer_0/w/var on /job:local/replica:0/task:0/device:CPU:0 335701000 I0422 04:50:55.007215 140630240360192 py_utils.py:1220] Creating var 1bwds_wpm_level_lm/transformerlm/encoder_1/tr_fflayer/fflayer/fflayer_0/w/var:0 shape=(2048, 8192) on device /job:local/replica:0/task:0/device:CPU:0 I0422 04:50:55.010516 140630240360192 cluster.py:447] Place variable 1bwds_wpm_level_lm/transformerlm/encoder_1/tr_fflayer/fflayer/fflayer_0/b/var on /job:local/replica:0/task:0/device:CPU:0 335733768 I0422 04:50:55.012556 140630240360192 py_utils.py:1220] Creating var 1bwds_wpm_level_lm/transformerlm/encoder_1/tr_fflayer/fflayer/fflayer_0/b/var:0 shape=(8192,) on device /job:local/replica:0/task:0/device:CPU:0 W0422 04:50:55.015074 140630240360192 py_utils.py:1064] WARNING!!! var w is using the default xavier initializer. Make sure this is intended. I0422 04:50:55.022810 140630240360192 cluster.py:447] Place variable 1bwds_wpm_level_lm/transformerlm/encoder_1/tr_fflayer/fflayer/fflayer_1/w/var on /job:local/replica:0/task:0/device:CPU:0 402842632 I0422 04:50:55.025194 140630240360192 py_utils.py:1220] Creating var 1bwds_wpm_level_lm/transformerlm/encoder_1/tr_fflayer/fflayer/fflayer_1/w/var:0 shape=(8192, 2048) on device /job:local/replica:0/task:0/device:CPU:0 I0422 04:50:55.028635 140630240360192 cluster.py:447] Place variable 1bwds_wpm_level_lm/transformerlm/encoder_1/tr_fflayer/fflayer/fflayer_1/b/var on /job:local/replica:0/task:0/device:CPU:0 402850824 I0422 04:50:55.030659 140630240360192 py_utils.py:1220] Creating var 1bwds_wpm_level_lm/transformerlm/encoder_1/tr_fflayer/fflayer/fflayer_1/b/var:0 shape=(2048,) on device /job:local/replica:0/task:0/device:CPU:0 I0422 04:50:55.036617 140630240360192 cluster.py:447] Place variable 1bwds_wpm_level_lm/transformerlm/encoder_1/tr_fflayer/fflayer_ln/bias/var on /job:local/replica:0/task:0/device:CPU:0 402859016 I0422 04:50:55.038887 140630240360192 py_utils.py:1220] Creating var 1bwds_wpm_level_lm/transformerlm/encoder_1/tr_fflayer/fflayer_ln/bias/var:0 shape=(2048,) on device /job:local/replica:0/task:0/device:CPU:0 I0422 04:50:55.042166 140630240360192 cluster.py:447] Place variable 1bwds_wpm_level_lm/transformerlm/encoder_1/tr_fflayer/fflayer_ln/scale/var on /job:local/replica:0/task:0/device:CPU:0 402867208 I0422 04:50:55.044192 140630240360192 py_utils.py:1220] Creating var 1bwds_wpm_level_lm/transformerlm/encoder_1/tr_fflayer/fflayer_ln/scale/var:0 shape=(2048,) on device /job:local/replica:0/task:0/device:CPU:0 I0422 04:50:55.069844 140630240360192 cluster.py:447] Place variable 1bwds_wpm_level_lm/transformerlm/encoder_2/multihead_self_atten/multihead_atten/source_proj/var on /job:local/replica:0/task:0/device:CPU:0 419644424 I0422 04:50:55.072247 140630240360192 py_utils.py:1220] Creating var 1bwds_wpm_level_lm/transformerlm/encoder_2/multihead_self_atten/multihead_atten/source_proj/var:0 shape=(2048, 2048) on device /job:local/replica:0/task:0/device:CPU:0 I0422 04:50:55.075659 140630240360192 cluster.py:447] Place variable 1bwds_wpm_level_lm/transformerlm/encoder_2/multihead_self_atten/multihead_atten/source_proj_b/var on /job:local/replica:0/task:0/device:CPU:0 419652616 I0422 04:50:55.077321 140630240360192 py_utils.py:1220] Creating var 1bwds_wpm_level_lm/transformerlm/encoder_2/multihead_self_atten/multihead_atten/source_proj_b/var:0 shape=(2048,) on device /job:local/replica:0/task:0/device:CPU:0 I0422 04:50:55.083909 140630240360192 cluster.py:447] Place variable 1bwds_wpm_level_lm/transformerlm/encoder_2/multihead_self_atten/multihead_atten/query_proj/var on /job:local/replica:0/task:0/device:CPU:0 436429832 I0422 04:50:55.085725 140630240360192 py_utils.py:1220] Creating var 1bwds_wpm_level_lm/transformerlm/encoder_2/multihead_self_atten/multihead_atten/query_proj/var:0 shape=(2048, 2048) on device /job:local/replica:0/task:0/device:CPU:0 I0422 04:50:55.088422 140630240360192 cluster.py:447] Place variable 1bwds_wpm_level_lm/transformerlm/encoder_2/multihead_self_atten/multihead_atten/query_proj_b/var on /job:local/replica:0/task:0/device:CPU:0 436438024 I0422 04:50:55.089953 140630240360192 py_utils.py:1220] Creating var 1bwds_wpm_level_lm/transformerlm/encoder_2/multihead_self_atten/multihead_atten/query_proj_b/var:0 shape=(2048,) on device /job:local/replica:0/task:0/device:CPU:0 I0422 04:50:55.096468 140630240360192 cluster.py:447] Place variable 1bwds_wpm_level_lm/transformerlm/encoder_2/multihead_self_atten/multihead_atten/ctx_proj/var on /job:local/replica:0/task:0/device:CPU:0 453215240 I0422 04:50:55.098789 140630240360192 py_utils.py:1220] Creating var 1bwds_wpm_level_lm/transformerlm/encoder_2/multihead_self_atten/multihead_atten/ctx_proj/var:0 shape=(2048, 2048) on device /job:local/replica:0/task:0/device:CPU:0 I0422 04:50:55.101243 140630240360192 cluster.py:447] Place variable 1bwds_wpm_level_lm/transformerlm/encoder_2/multihead_self_atten/multihead_atten/ctx_proj_b/var on /job:local/replica:0/task:0/device:CPU:0 453223432 I0422 04:50:55.102752 140630240360192 py_utils.py:1220] Creating var 1bwds_wpm_level_lm/transformerlm/encoder_2/multihead_self_atten/multihead_atten/ctx_proj_b/var:0 shape=(2048,) on device /job:local/replica:0/task:0/device:CPU:0 I0422 04:50:55.109296 140630240360192 cluster.py:447] Place variable 1bwds_wpm_level_lm/transformerlm/encoder_2/multihead_self_atten/multihead_atten/ctx_post_proj/var on /job:local/replica:0/task:0/device:CPU:0 470000648 I0422 04:50:55.111073 140630240360192 py_utils.py:1220] Creating var 1bwds_wpm_level_lm/transformerlm/encoder_2/multihead_self_atten/multihead_atten/ctx_post_proj/var:0 shape=(2048, 2048) on device /job:local/replica:0/task:0/device:CPU:0 I0422 04:50:55.113611 140630240360192 cluster.py:447] Place variable 1bwds_wpm_level_lm/transformerlm/encoder_2/multihead_self_atten/multihead_atten/ctx_post_proj_b/var on /job:local/replica:0/task:0/device:CPU:0 470008840 I0422 04:50:55.115133 140630240360192 py_utils.py:1220] Creating var 1bwds_wpm_level_lm/transformerlm/encoder_2/multihead_self_atten/multihead_atten/ctx_post_proj_b/var:0 shape=(2048,) on device /job:local/replica:0/task:0/device:CPU:0 I0422 04:50:55.118529 140630240360192 cluster.py:447] Place variable 1bwds_wpm_level_lm/transformerlm/encoder_2/multihead_self_atten/multihead_atten/inner_att/per_dim_scale/var on /job:local/replica:0/task:0/device:CPU:0 470009352 I0422 04:50:55.120131 140630240360192 py_utils.py:1220] Creating var 1bwds_wpm_level_lm/transformerlm/encoder_2/multihead_self_atten/multihead_atten/inner_att/per_dim_scale/var:0 shape=(128,) on device /job:local/replica:0/task:0/device:CPU:0 I0422 04:50:55.123673 140630240360192 cluster.py:447] Place variable 1bwds_wpm_level_lm/transformerlm/encoder_2/multihead_self_atten/atten_ln/bias/var on /job:local/replica:0/task:0/device:CPU:0 470017544 I0422 04:50:55.125143 140630240360192 py_utils.py:1220] Creating var 1bwds_wpm_level_lm/transformerlm/encoder_2/multihead_self_atten/atten_ln/bias/var:0 shape=(2048,) on device /job:local/replica:0/task:0/device:CPU:0 I0422 04:50:55.127723 140630240360192 cluster.py:447] Place variable 1bwds_wpm_level_lm/transformerlm/encoder_2/multihead_self_atten/atten_ln/scale/var on /job:local/replica:0/task:0/device:CPU:0 470025736 I0422 04:50:55.129199 140630240360192 py_utils.py:1220] Creating var 1bwds_wpm_level_lm/transformerlm/encoder_2/multihead_self_atten/atten_ln/scale/var:0 shape=(2048,) on device /job:local/replica:0/task:0/device:CPU:0 W0422 04:50:55.137305 140630240360192 py_utils.py:1064] WARNING!!! var w is using the default xavier initializer. Make sure this is intended. I0422 04:50:55.143070 140630240360192 cluster.py:447] Place variable 1bwds_wpm_level_lm/transformerlm/encoder_2/tr_fflayer/fflayer/fflayer_0/w/var on /job:local/replica:0/task:0/device:CPU:0 537134600 I0422 04:50:55.144798 140630240360192 py_utils.py:1220] Creating var 1bwds_wpm_level_lm/transformerlm/encoder_2/tr_fflayer/fflayer/fflayer_0/w/var:0 shape=(2048, 8192) on device /job:local/replica:0/task:0/device:CPU:0 I0422 04:50:55.147186 140630240360192 cluster.py:447] Place variable 1bwds_wpm_level_lm/transformerlm/encoder_2/tr_fflayer/fflayer/fflayer_0/b/var on /job:local/replica:0/task:0/device:CPU:0 537167368 I0422 04:50:55.148667 140630240360192 py_utils.py:1220] Creating var 1bwds_wpm_level_lm/transformerlm/encoder_2/tr_fflayer/fflayer/fflayer_0/b/var:0 shape=(8192,) on device /job:local/replica:0/task:0/device:CPU:0 W0422 04:50:55.150489 140630240360192 py_utils.py:1064] WARNING!!! var w is using the default xavier initializer. Make sure this is intended. I0422 04:50:55.156601 140630240360192 cluster.py:447] Place variable 1bwds_wpm_level_lm/transformerlm/encoder_2/tr_fflayer/fflayer/fflayer_1/w/var on /job:local/replica:0/task:0/device:CPU:0 604276232 I0422 04:50:55.158313 140630240360192 py_utils.py:1220] Creating var 1bwds_wpm_level_lm/transformerlm/encoder_2/tr_fflayer/fflayer/fflayer_1/w/var:0 shape=(8192, 2048) on device /job:local/replica:0/task:0/device:CPU:0 I0422 04:50:55.160806 140630240360192 cluster.py:447] Place variable 1bwds_wpm_level_lm/transformerlm/encoder_2/tr_fflayer/fflayer/fflayer_1/b/var on /job:local/replica:0/task:0/device:CPU:0 604284424 I0422 04:50:55.162322 140630240360192 py_utils.py:1220] Creating var 1bwds_wpm_level_lm/transformerlm/encoder_2/tr_fflayer/fflayer/fflayer_1/b/var:0 shape=(2048,) on device /job:local/replica:0/task:0/device:CPU:0 I0422 04:50:55.166656 140630240360192 cluster.py:447] Place variable 1bwds_wpm_level_lm/transformerlm/encoder_2/tr_fflayer/fflayer_ln/bias/var on /job:local/replica:0/task:0/device:CPU:0 604292616 I0422 04:50:55.168306 140630240360192 py_utils.py:1220] Creating var 1bwds_wpm_level_lm/transformerlm/encoder_2/tr_fflayer/fflayer_ln/bias/var:0 shape=(2048,) on device /job:local/replica:0/task:0/device:CPU:0 I0422 04:50:55.170676 140630240360192 cluster.py:447] Place variable 1bwds_wpm_level_lm/transformerlm/encoder_2/tr_fflayer/fflayer_ln/scale/var on /job:local/replica:0/task:0/device:CPU:0 604300808 I0422 04:50:55.172171 140630240360192 py_utils.py:1220] Creating var 1bwds_wpm_level_lm/transformerlm/encoder_2/tr_fflayer/fflayer_ln/scale/var:0 shape=(2048,) on device /job:local/replica:0/task:0/device:CPU:0 I0422 04:50:55.190345 140630240360192 cluster.py:447] Place variable 1bwds_wpm_level_lm/transformerlm/encoder_3/multihead_self_atten/multihead_atten/source_proj/var on /job:local/replica:0/task:0/device:CPU:0 621078024 I0422 04:50:55.192101 140630240360192 py_utils.py:1220] Creating var 1bwds_wpm_level_lm/transformerlm/encoder_3/multihead_self_atten/multihead_atten/source_proj/var:0 shape=(2048, 2048) on device /job:local/replica:0/task:0/device:CPU:0 I0422 04:50:55.195012 140630240360192 cluster.py:447] Place variable 1bwds_wpm_level_lm/transformerlm/encoder_3/multihead_self_atten/multihead_atten/source_proj_b/var on /job:local/replica:0/task:0/device:CPU:0 621086216 I0422 04:50:55.196474 140630240360192 py_utils.py:1220] Creating var 1bwds_wpm_level_lm/transformerlm/encoder_3/multihead_self_atten/multihead_atten/source_proj_b/var:0 shape=(2048,) on device /job:local/replica:0/task:0/device:CPU:0 I0422 04:50:55.298295 140630240360192 cluster.py:447] Place variable 1bwds_wpm_level_lm/transformerlm/encoder_3/multihead_self_atten/multihead_atten/query_proj/var on /job:local/replica:0/task:0/device:CPU:0 637863432 I0422 04:50:55.300360 140630240360192 py_utils.py:1220] Creating var 1bwds_wpm_level_lm/transformerlm/encoder_3/multihead_self_atten/multihead_atten/query_proj/var:0 shape=(2048, 2048) on device /job:local/replica:0/task:0/device:CPU:0 I0422 04:50:55.302906 140630240360192 cluster.py:447] Place variable 1bwds_wpm_level_lm/transformerlm/encoder_3/multihead_self_atten/multihead_atten/query_proj_b/var on /job:local/replica:0/task:0/device:CPU:0 637871624 I0422 04:50:55.304397 140630240360192 py_utils.py:1220] Creating var 1bwds_wpm_level_lm/transformerlm/encoder_3/multihead_self_atten/multihead_atten/query_proj_b/var:0 shape=(2048,) on device /job:local/replica:0/task:0/device:CPU:0 I0422 04:50:55.310797 140630240360192 cluster.py:447] Place variable 1bwds_wpm_level_lm/transformerlm/encoder_3/multihead_self_atten/multihead_atten/ctx_proj/var on /job:local/replica:0/task:0/device:CPU:0 654648840 I0422 04:50:55.312556 140630240360192 py_utils.py:1220] Creating var 1bwds_wpm_level_lm/transformerlm/encoder_3/multihead_self_atten/multihead_atten/ctx_proj/var:0 shape=(2048, 2048) on device /job:local/replica:0/task:0/device:CPU:0 I0422 04:50:55.314958 140630240360192 cluster.py:447] Place variable 1bwds_wpm_level_lm/transformerlm/encoder_3/multihead_self_atten/multihead_atten/ctx_proj_b/var on /job:local/replica:0/task:0/device:CPU:0 654657032 I0422 04:50:55.316617 140630240360192 py_utils.py:1220] Creating var 1bwds_wpm_level_lm/transformerlm/encoder_3/multihead_self_atten/multihead_atten/ctx_proj_b/var:0 shape=(2048,) on device /job:local/replica:0/task:0/device:CPU:0 I0422 04:50:55.322993 140630240360192 cluster.py:447] Place variable 1bwds_wpm_level_lm/transformerlm/encoder_3/multihead_self_atten/multihead_atten/ctx_post_proj/var on /job:local/replica:0/task:0/device:CPU:0 671434248 I0422 04:50:55.324742 140630240360192 py_utils.py:1220] Creating var 1bwds_wpm_level_lm/transformerlm/encoder_3/multihead_self_atten/multihead_atten/ctx_post_proj/var:0 shape=(2048, 2048) on device /job:local/replica:0/task:0/device:CPU:0 I0422 04:50:55.327171 140630240360192 cluster.py:447] Place variable 1bwds_wpm_level_lm/transformerlm/encoder_3/multihead_self_atten/multihead_atten/ctx_post_proj_b/var on /job:local/replica:0/task:0/device:CPU:0 671442440 I0422 04:50:55.328784 140630240360192 py_utils.py:1220] Creating var 1bwds_wpm_level_lm/transformerlm/encoder_3/multihead_self_atten/multihead_atten/ctx_post_proj_b/var:0 shape=(2048,) on device /job:local/replica:0/task:0/device:CPU:0 I0422 04:50:55.332283 140630240360192 cluster.py:447] Place variable 1bwds_wpm_level_lm/transformerlm/encoder_3/multihead_self_atten/multihead_atten/inner_att/per_dim_scale/var on /job:local/replica:0/task:0/device:CPU:0 671442952 I0422 04:50:55.333941 140630240360192 py_utils.py:1220] Creating var 1bwds_wpm_level_lm/transformerlm/encoder_3/multihead_self_atten/multihead_atten/inner_att/per_dim_scale/var:0 shape=(128,) on device /job:local/replica:0/task:0/device:CPU:0 I0422 04:50:55.337538 140630240360192 cluster.py:447] Place variable 1bwds_wpm_level_lm/transformerlm/encoder_3/multihead_self_atten/atten_ln/bias/var on /job:local/replica:0/task:0/device:CPU:0 671451144 I0422 04:50:55.339023 140630240360192 py_utils.py:1220] Creating var 1bwds_wpm_level_lm/transformerlm/encoder_3/multihead_self_atten/atten_ln/bias/var:0 shape=(2048,) on device /job:local/replica:0/task:0/device:CPU:0 I0422 04:50:55.341628 140630240360192 cluster.py:447] Place variable 1bwds_wpm_level_lm/transformerlm/encoder_3/multihead_self_atten/atten_ln/scale/var on /job:local/replica:0/task:0/device:CPU:0 671459336 I0422 04:50:55.343126 140630240360192 py_utils.py:1220] Creating var 1bwds_wpm_level_lm/transformerlm/encoder_3/multihead_self_atten/atten_ln/scale/var:0 shape=(2048,) on device /job:local/replica:0/task:0/device:CPU:0 W0422 04:50:55.351222 140630240360192 py_utils.py:1064] WARNING!!! var w is using the default xavier initializer. Make sure this is intended. I0422 04:50:55.357409 140630240360192 cluster.py:447] Place variable 1bwds_wpm_level_lm/transformerlm/encoder_3/tr_fflayer/fflayer/fflayer_0/w/var on /job:local/replica:0/task:0/device:CPU:0 738568200 I0422 04:50:55.359149 140630240360192 py_utils.py:1220] Creating var 1bwds_wpm_level_lm/transformerlm/encoder_3/tr_fflayer/fflayer/fflayer_0/w/var:0 shape=(2048, 8192) on device /job:local/replica:0/task:0/device:CPU:0 I0422 04:50:55.361529 140630240360192 cluster.py:447] Place variable 1bwds_wpm_level_lm/transformerlm/encoder_3/tr_fflayer/fflayer/fflayer_0/b/var on /job:local/replica:0/task:0/device:CPU:0 738600968 I0422 04:50:55.363121 140630240360192 py_utils.py:1220] Creating var 1bwds_wpm_level_lm/transformerlm/encoder_3/tr_fflayer/fflayer/fflayer_0/b/var:0 shape=(8192,) on device /job:local/replica:0/task:0/device:CPU:0 W0422 04:50:55.364871 140630240360192 py_utils.py:1064] WARNING!!! var w is using the default xavier initializer. Make sure this is intended. I0422 04:50:55.370706 140630240360192 cluster.py:447] Place variable 1bwds_wpm_level_lm/transformerlm/encoder_3/tr_fflayer/fflayer/fflayer_1/w/var on /job:local/replica:0/task:0/device:CPU:0 805709832 I0422 04:50:55.372519 140630240360192 py_utils.py:1220] Creating var 1bwds_wpm_level_lm/transformerlm/encoder_3/tr_fflayer/fflayer/fflayer_1/w/var:0 shape=(8192, 2048) on device /job:local/replica:0/task:0/device:CPU:0 I0422 04:50:55.374927 140630240360192 cluster.py:447] Place variable 1bwds_wpm_level_lm/transformerlm/encoder_3/tr_fflayer/fflayer/fflayer_1/b/var on /job:local/replica:0/task:0/device:CPU:0 805718024 I0422 04:50:55.376421 140630240360192 py_utils.py:1220] Creating var 1bwds_wpm_level_lm/transformerlm/encoder_3/tr_fflayer/fflayer/fflayer_1/b/var:0 shape=(2048,) on device /job:local/replica:0/task:0/device:CPU:0 I0422 04:50:55.380909 140630240360192 cluster.py:447] Place variable 1bwds_wpm_level_lm/transformerlm/encoder_3/tr_fflayer/fflayer_ln/bias/var on /job:local/replica:0/task:0/device:CPU:0 805726216 I0422 04:50:55.382380 140630240360192 py_utils.py:1220] Creating var 1bwds_wpm_level_lm/transformerlm/encoder_3/tr_fflayer/fflayer_ln/bias/var:0 shape=(2048,) on device /job:local/replica:0/task:0/device:CPU:0 I0422 04:50:55.384773 140630240360192 cluster.py:447] Place variable 1bwds_wpm_level_lm/transformerlm/encoder_3/tr_fflayer/fflayer_ln/scale/var on /job:local/replica:0/task:0/device:CPU:0 805734408 I0422 04:50:55.386260 140630240360192 py_utils.py:1220] Creating var 1bwds_wpm_level_lm/transformerlm/encoder_3/tr_fflayer/fflayer_ln/scale/var:0 shape=(2048,) on device /job:local/replica:0/task:0/device:CPU:0 I0422 04:50:55.404966 140630240360192 cluster.py:447] Place variable 1bwds_wpm_level_lm/transformerlm/encoder_4/multihead_self_atten/multihead_atten/source_proj/var on /job:local/replica:0/task:0/device:CPU:0 822511624 I0422 04:50:55.406703 140630240360192 py_utils.py:1220] Creating var 1bwds_wpm_level_lm/transformerlm/encoder_4/multihead_self_atten/multihead_atten/source_proj/var:0 shape=(2048, 2048) on device /job:local/replica:0/task:0/device:CPU:0 I0422 04:50:55.409187 140630240360192 cluster.py:447] Place variable 1bwds_wpm_level_lm/transformerlm/encoder_4/multihead_self_atten/multihead_atten/source_proj_b/var on /job:local/replica:0/task:0/device:CPU:0 822519816 I0422 04:50:55.410675 140630240360192 py_utils.py:1220] Creating var 1bwds_wpm_level_lm/transformerlm/encoder_4/multihead_self_atten/multihead_atten/source_proj_b/var:0 shape=(2048,) on device /job:local/replica:0/task:0/device:CPU:0 I0422 04:50:55.417826 140630240360192 cluster.py:447] Place variable 1bwds_wpm_level_lm/transformerlm/encoder_4/multihead_self_atten/multihead_atten/query_proj/var on /job:local/replica:0/task:0/device:CPU:0 839297032 I0422 04:50:55.419656 140630240360192 py_utils.py:1220] Creating var 1bwds_wpm_level_lm/transformerlm/encoder_4/multihead_self_atten/multihead_atten/query_proj/var:0 shape=(2048, 2048) on device /job:local/replica:0/task:0/device:CPU:0 I0422 04:50:55.422049 140630240360192 cluster.py:447] Place variable 1bwds_wpm_level_lm/transformerlm/encoder_4/multihead_self_atten/multihead_atten/query_proj_b/var on /job:local/replica:0/task:0/device:CPU:0 839305224 I0422 04:50:55.423544 140630240360192 py_utils.py:1220] Creating var 1bwds_wpm_level_lm/transformerlm/encoder_4/multihead_self_atten/multihead_atten/query_proj_b/var:0 shape=(2048,) on device /job:local/replica:0/task:0/device:CPU:0 I0422 04:50:55.429893 140630240360192 cluster.py:447] Place variable 1bwds_wpm_level_lm/transformerlm/encoder_4/multihead_self_atten/multihead_atten/ctx_proj/var on /job:local/replica:0/task:0/device:CPU:0 856082440 I0422 04:50:55.431648 140630240360192 py_utils.py:1220] Creating var 1bwds_wpm_level_lm/transformerlm/encoder_4/multihead_self_atten/multihead_atten/ctx_proj/var:0 shape=(2048, 2048) on device /job:local/replica:0/task:0/device:CPU:0 I0422 04:50:55.434041 140630240360192 cluster.py:447] Place variable 1bwds_wpm_level_lm/transformerlm/encoder_4/multihead_self_atten/multihead_atten/ctx_proj_b/var on /job:local/replica:0/task:0/device:CPU:0 856090632 I0422 04:50:55.435530 140630240360192 py_utils.py:1220] Creating var 1bwds_wpm_level_lm/transformerlm/encoder_4/multihead_self_atten/multihead_atten/ctx_proj_b/var:0 shape=(2048,) on device /job:local/replica:0/task:0/device:CPU:0 I0422 04:50:55.441880 140630240360192 cluster.py:447] Place variable 1bwds_wpm_level_lm/transformerlm/encoder_4/multihead_self_atten/multihead_atten/ctx_post_proj/var on /job:local/replica:0/task:0/device:CPU:0 872867848 I0422 04:50:55.443716 140630240360192 py_utils.py:1220] Creating var 1bwds_wpm_level_lm/transformerlm/encoder_4/multihead_self_atten/multihead_atten/ctx_post_proj/var:0 shape=(2048, 2048) on device /job:local/replica:0/task:0/device:CPU:0 I0422 04:50:55.446146 140630240360192 cluster.py:447] Place variable 1bwds_wpm_level_lm/transformerlm/encoder_4/multihead_self_atten/multihead_atten/ctx_post_proj_b/var on /job:local/replica:0/task:0/device:CPU:0 872876040 I0422 04:50:55.447757 140630240360192 py_utils.py:1220] Creating var 1bwds_wpm_level_lm/transformerlm/encoder_4/multihead_self_atten/multihead_atten/ctx_post_proj_b/var:0 shape=(2048,) on device /job:local/replica:0/task:0/device:CPU:0 I0422 04:50:55.451224 140630240360192 cluster.py:447] Place variable 1bwds_wpm_level_lm/transformerlm/encoder_4/multihead_self_atten/multihead_atten/inner_att/per_dim_scale/var on /job:local/replica:0/task:0/device:CPU:0 872876552 I0422 04:50:55.452739 140630240360192 py_utils.py:1220] Creating var 1bwds_wpm_level_lm/transformerlm/encoder_4/multihead_self_atten/multihead_atten/inner_att/per_dim_scale/var:0 shape=(128,) on device /job:local/replica:0/task:0/device:CPU:0 I0422 04:50:55.456291 140630240360192 cluster.py:447] Place variable 1bwds_wpm_level_lm/transformerlm/encoder_4/multihead_self_atten/atten_ln/bias/var on /job:local/replica:0/task:0/device:CPU:0 872884744 I0422 04:50:55.457772 140630240360192 py_utils.py:1220] Creating var 1bwds_wpm_level_lm/transformerlm/encoder_4/multihead_self_atten/atten_ln/bias/var:0 shape=(2048,) on device /job:local/replica:0/task:0/device:CPU:0 I0422 04:50:55.460695 140630240360192 cluster.py:447] Place variable 1bwds_wpm_level_lm/transformerlm/encoder_4/multihead_self_atten/atten_ln/scale/var on /job:local/replica:0/task:0/device:CPU:0 872892936 I0422 04:50:55.462178 140630240360192 py_utils.py:1220] Creating var 1bwds_wpm_level_lm/transformerlm/encoder_4/multihead_self_atten/atten_ln/scale/var:0 shape=(2048,) on device /job:local/replica:0/task:0/device:CPU:0 W0422 04:50:55.470352 140630240360192 py_utils.py:1064] WARNING!!! var w is using the default xavier initializer. Make sure this is intended. I0422 04:50:55.476075 140630240360192 cluster.py:447] Place variable 1bwds_wpm_level_lm/transformerlm/encoder_4/tr_fflayer/fflayer/fflayer_0/w/var on /job:local/replica:0/task:0/device:CPU:0 940001800 I0422 04:50:55.477832 140630240360192 py_utils.py:1220] Creating var 1bwds_wpm_level_lm/transformerlm/encoder_4/tr_fflayer/fflayer/fflayer_0/w/var:0 shape=(2048, 8192) on device /job:local/replica:0/task:0/device:CPU:0 I0422 04:50:55.480249 140630240360192 cluster.py:447] Place variable 1bwds_wpm_level_lm/transformerlm/encoder_4/tr_fflayer/fflayer/fflayer_0/b/var on /job:local/replica:0/task:0/device:CPU:0 940034568 I0422 04:50:55.481745 140630240360192 py_utils.py:1220] Creating var 1bwds_wpm_level_lm/transformerlm/encoder_4/tr_fflayer/fflayer/fflayer_0/b/var:0 shape=(8192,) on device /job:local/replica:0/task:0/device:CPU:0 W0422 04:50:55.483597 140630240360192 py_utils.py:1064] WARNING!!! var w is using the default xavier initializer. Make sure this is intended. I0422 04:50:55.489240 140630240360192 cluster.py:447] Place variable 1bwds_wpm_level_lm/transformerlm/encoder_4/tr_fflayer/fflayer/fflayer_1/w/var on /job:local/replica:0/task:0/device:CPU:0 1007143432 I0422 04:50:55.491003 140630240360192 py_utils.py:1220] Creating var 1bwds_wpm_level_lm/transformerlm/encoder_4/tr_fflayer/fflayer/fflayer_1/w/var:0 shape=(8192, 2048) on device /job:local/replica:0/task:0/device:CPU:0 I0422 04:50:55.493582 140630240360192 cluster.py:447] Place variable 1bwds_wpm_level_lm/transformerlm/encoder_4/tr_fflayer/fflayer/fflayer_1/b/var on /job:local/replica:0/task:0/device:CPU:0 1007151624 I0422 04:50:55.495079 140630240360192 py_utils.py:1220] Creating var 1bwds_wpm_level_lm/transformerlm/encoder_4/tr_fflayer/fflayer/fflayer_1/b/var:0 shape=(2048,) on device /job:local/replica:0/task:0/device:CPU:0 I0422 04:50:55.499404 140630240360192 cluster.py:447] Place variable 1bwds_wpm_level_lm/transformerlm/encoder_4/tr_fflayer/fflayer_ln/bias/var on /job:local/replica:0/task:0/device:CPU:0 1007159816 I0422 04:50:55.501059 140630240360192 py_utils.py:1220] Creating var 1bwds_wpm_level_lm/transformerlm/encoder_4/tr_fflayer/fflayer_ln/bias/var:0 shape=(2048,) on device /job:local/replica:0/task:0/device:CPU:0 I0422 04:50:55.503443 140630240360192 cluster.py:447] Place variable 1bwds_wpm_level_lm/transformerlm/encoder_4/tr_fflayer/fflayer_ln/scale/var on /job:local/replica:0/task:0/device:CPU:0 1007168008 I0422 04:50:55.504952 140630240360192 py_utils.py:1220] Creating var 1bwds_wpm_level_lm/transformerlm/encoder_4/tr_fflayer/fflayer_ln/scale/var:0 shape=(2048,) on device /job:local/replica:0/task:0/device:CPU:0 I0422 04:50:55.523534 140630240360192 cluster.py:447] Place variable 1bwds_wpm_level_lm/transformerlm/encoder_5/multihead_self_atten/multihead_atten/source_proj/var on /job:local/replica:0/task:0/device:CPU:0 1023945224 I0422 04:50:55.525291 140630240360192 py_utils.py:1220] Creating var 1bwds_wpm_level_lm/transformerlm/encoder_5/multihead_self_atten/multihead_atten/source_proj/var:0 shape=(2048, 2048) on device /job:local/replica:0/task:0/device:CPU:0 I0422 04:50:55.527803 140630240360192 cluster.py:447] Place variable 1bwds_wpm_level_lm/transformerlm/encoder_5/multihead_self_atten/multihead_atten/source_proj_b/var on /job:local/replica:0/task:0/device:CPU:0 1023953416 I0422 04:50:55.529289 140630240360192 py_utils.py:1220] Creating var 1bwds_wpm_level_lm/transformerlm/encoder_5/multihead_self_atten/multihead_atten/source_proj_b/var:0 shape=(2048,) on device /job:local/replica:0/task:0/device:CPU:0 I0422 04:50:55.535583 140630240360192 cluster.py:447] Place variable 1bwds_wpm_level_lm/transformerlm/encoder_5/multihead_self_atten/multihead_atten/query_proj/var on /job:local/replica:0/task:0/device:CPU:0 1040730632 I0422 04:50:55.537386 140630240360192 py_utils.py:1220] Creating var 1bwds_wpm_level_lm/transformerlm/encoder_5/multihead_self_atten/multihead_atten/query_proj/var:0 shape=(2048, 2048) on device /job:local/replica:0/task:0/device:CPU:0 I0422 04:50:55.539808 140630240360192 cluster.py:447] Place variable 1bwds_wpm_level_lm/transformerlm/encoder_5/multihead_self_atten/multihead_atten/query_proj_b/var on /job:local/replica:0/task:0/device:CPU:0 1040738824 I0422 04:50:55.541316 140630240360192 py_utils.py:1220] Creating var 1bwds_wpm_level_lm/transformerlm/encoder_5/multihead_self_atten/multihead_atten/query_proj_b/var:0 shape=(2048,) on device /job:local/replica:0/task:0/device:CPU:0 I0422 04:50:55.547764 140630240360192 cluster.py:447] Place variable 1bwds_wpm_level_lm/transformerlm/encoder_5/multihead_self_atten/multihead_atten/ctx_proj/var on /job:local/replica:0/task:0/device:CPU:0 1057516040 I0422 04:50:55.549510 140630240360192 py_utils.py:1220] Creating var 1bwds_wpm_level_lm/transformerlm/encoder_5/multihead_self_atten/multihead_atten/ctx_proj/var:0 shape=(2048, 2048) on device /job:local/replica:0/task:0/device:CPU:0 I0422 04:50:55.551933 140630240360192 cluster.py:447] Place variable 1bwds_wpm_level_lm/transformerlm/encoder_5/multihead_self_atten/multihead_atten/ctx_proj_b/var on /job:local/replica:0/task:0/device:CPU:0 1057524232 I0422 04:50:55.553431 140630240360192 py_utils.py:1220] Creating var 1bwds_wpm_level_lm/transformerlm/encoder_5/multihead_self_atten/multihead_atten/ctx_proj_b/var:0 shape=(2048,) on device /job:local/replica:0/task:0/device:CPU:0 I0422 04:50:55.560203 140630240360192 cluster.py:447] Place variable 1bwds_wpm_level_lm/transformerlm/encoder_5/multihead_self_atten/multihead_atten/ctx_post_proj/var on /job:local/replica:0/task:0/device:CPU:0 1074301448 I0422 04:50:55.561952 140630240360192 py_utils.py:1220] Creating var 1bwds_wpm_level_lm/transformerlm/encoder_5/multihead_self_atten/multihead_atten/ctx_post_proj/var:0 shape=(2048, 2048) on device /job:local/replica:0/task:0/device:CPU:0 I0422 04:50:55.564400 140630240360192 cluster.py:447] Place variable 1bwds_wpm_level_lm/transformerlm/encoder_5/multihead_self_atten/multihead_atten/ctx_post_proj_b/var on /job:local/replica:0/task:0/device:CPU:0 1074309640 I0422 04:50:55.566005 140630240360192 py_utils.py:1220] Creating var 1bwds_wpm_level_lm/transformerlm/encoder_5/multihead_self_atten/multihead_atten/ctx_post_proj_b/var:0 shape=(2048,) on device /job:local/replica:0/task:0/device:CPU:0 I0422 04:50:55.569520 140630240360192 cluster.py:447] Place variable 1bwds_wpm_level_lm/transformerlm/encoder_5/multihead_self_atten/multihead_atten/inner_att/per_dim_scale/var on /job:local/replica:0/task:0/device:CPU:0 1074310152 I0422 04:50:55.571083 140630240360192 py_utils.py:1220] Creating var 1bwds_wpm_level_lm/transformerlm/encoder_5/multihead_self_atten/multihead_atten/inner_att/per_dim_scale/var:0 shape=(128,) on device /job:local/replica:0/task:0/device:CPU:0 I0422 04:50:55.574671 140630240360192 cluster.py:447] Place variable 1bwds_wpm_level_lm/transformerlm/encoder_5/multihead_self_atten/atten_ln/bias/var on /job:local/replica:0/task:0/device:CPU:0 1074318344 I0422 04:50:55.576179 140630240360192 py_utils.py:1220] Creating var 1bwds_wpm_level_lm/transformerlm/encoder_5/multihead_self_atten/atten_ln/bias/var:0 shape=(2048,) on device /job:local/replica:0/task:0/device:CPU:0 I0422 04:50:55.578716 140630240360192 cluster.py:447] Place variable 1bwds_wpm_level_lm/transformerlm/encoder_5/multihead_self_atten/atten_ln/scale/var on /job:local/replica:0/task:0/device:CPU:0 1074326536 I0422 04:50:55.580224 140630240360192 py_utils.py:1220] Creating var 1bwds_wpm_level_lm/transformerlm/encoder_5/multihead_self_atten/atten_ln/scale/var:0 shape=(2048,) on device /job:local/replica:0/task:0/device:CPU:0 W0422 04:50:55.588320 140630240360192 py_utils.py:1064] WARNING!!! var w is using the default xavier initializer. Make sure this is intended. I0422 04:50:55.594094 140630240360192 cluster.py:447] Place variable 1bwds_wpm_level_lm/transformerlm/encoder_5/tr_fflayer/fflayer/fflayer_0/w/var on /job:local/replica:0/task:0/device:CPU:0 1141435400 I0422 04:50:55.595982 140630240360192 py_utils.py:1220] Creating var 1bwds_wpm_level_lm/transformerlm/encoder_5/tr_fflayer/fflayer/fflayer_0/w/var:0 shape=(2048, 8192) on device /job:local/replica:0/task:0/device:CPU:0 I0422 04:50:55.598387 140630240360192 cluster.py:447] Place variable 1bwds_wpm_level_lm/transformerlm/encoder_5/tr_fflayer/fflayer/fflayer_0/b/var on /job:local/replica:0/task:0/device:CPU:0 1141468168 I0422 04:50:55.599898 140630240360192 py_utils.py:1220] Creating var 1bwds_wpm_level_lm/transformerlm/encoder_5/tr_fflayer/fflayer/fflayer_0/b/var:0 shape=(8192,) on device /job:local/replica:0/task:0/device:CPU:0 W0422 04:50:55.601749 140630240360192 py_utils.py:1064] WARNING!!! var w is using the default xavier initializer. Make sure this is intended. I0422 04:50:55.607449 140630240360192 cluster.py:447] Place variable 1bwds_wpm_level_lm/transformerlm/encoder_5/tr_fflayer/fflayer/fflayer_1/w/var on /job:local/replica:0/task:0/device:CPU:0 1208577032 I0422 04:50:55.609194 140630240360192 py_utils.py:1220] Creating var 1bwds_wpm_level_lm/transformerlm/encoder_5/tr_fflayer/fflayer/fflayer_1/w/var:0 shape=(8192, 2048) on device /job:local/replica:0/task:0/device:CPU:0 I0422 04:50:55.612205 140630240360192 cluster.py:447] Place variable 1bwds_wpm_level_lm/transformerlm/encoder_5/tr_fflayer/fflayer/fflayer_1/b/var on /job:local/replica:0/task:0/device:CPU:0 1208585224 I0422 04:50:55.613711 140630240360192 py_utils.py:1220] Creating var 1bwds_wpm_level_lm/transformerlm/encoder_5/tr_fflayer/fflayer/fflayer_1/b/var:0 shape=(2048,) on device /job:local/replica:0/task:0/device:CPU:0 I0422 04:50:55.618057 140630240360192 cluster.py:447] Place variable 1bwds_wpm_level_lm/transformerlm/encoder_5/tr_fflayer/fflayer_ln/bias/var on /job:local/replica:0/task:0/device:CPU:0 1208593416 I0422 04:50:55.619735 140630240360192 py_utils.py:1220] Creating var 1bwds_wpm_level_lm/transformerlm/encoder_5/tr_fflayer/fflayer_ln/bias/var:0 shape=(2048,) on device /job:local/replica:0/task:0/device:CPU:0 I0422 04:50:55.622220 140630240360192 cluster.py:447] Place variable 1bwds_wpm_level_lm/transformerlm/encoder_5/tr_fflayer/fflayer_ln/scale/var on /job:local/replica:0/task:0/device:CPU:0 1208601608 I0422 04:50:55.623727 140630240360192 py_utils.py:1220] Creating var 1bwds_wpm_level_lm/transformerlm/encoder_5/tr_fflayer/fflayer_ln/scale/var:0 shape=(2048,) on device /job:local/replica:0/task:0/device:CPU:0 W0422 04:50:55.642831 140630240360192 py_utils.py:1064] WARNING!!! var weight_0 is using the default xavier initializer. Make sure this is intended. I0422 04:50:55.648403 140630240360192 cluster.py:447] Place variable 1bwds_wpm_level_lm/transformerlm/softmax/weight_0/var on /job:local/replica:0/task:0/device:CPU:0 1224985608 I0422 04:50:55.650094 140630240360192 py_utils.py:1220] Creating var 1bwds_wpm_level_lm/transformerlm/softmax/weight_0/var:0 shape=(2048, 2000) on device /job:local/replica:0/task:0/device:CPU:0 W0422 04:50:55.650830 140630240360192 py_utils.py:1064] WARNING!!! var weight_1 is using the default xavier initializer. Make sure this is intended. I0422 04:50:55.656203 140630240360192 cluster.py:447] Place variable 1bwds_wpm_level_lm/transformerlm/softmax/weight_1/var on /job:local/replica:0/task:0/device:CPU:0 1241369608 I0422 04:50:55.657888 140630240360192 py_utils.py:1220] Creating var 1bwds_wpm_level_lm/transformerlm/softmax/weight_1/var:0 shape=(2048, 2000) on device /job:local/replica:0/task:0/device:CPU:0 W0422 04:50:55.658621 140630240360192 py_utils.py:1064] WARNING!!! var weight_2 is using the default xavier initializer. Make sure this is intended. I0422 04:50:55.664046 140630240360192 cluster.py:447] Place variable 1bwds_wpm_level_lm/transformerlm/softmax/weight_2/var on /job:local/replica:0/task:0/device:CPU:0 1257753608 I0422 04:50:55.665736 140630240360192 py_utils.py:1220] Creating var 1bwds_wpm_level_lm/transformerlm/softmax/weight_2/var:0 shape=(2048, 2000) on device /job:local/replica:0/task:0/device:CPU:0 W0422 04:50:55.666467 140630240360192 py_utils.py:1064] WARNING!!! var weight_3 is using the default xavier initializer. Make sure this is intended. I0422 04:50:55.671964 140630240360192 cluster.py:447] Place variable 1bwds_wpm_level_lm/transformerlm/softmax/weight_3/var on /job:local/replica:0/task:0/device:CPU:0 1274137608 I0422 04:50:55.673664 140630240360192 py_utils.py:1220] Creating var 1bwds_wpm_level_lm/transformerlm/softmax/weight_3/var:0 shape=(2048, 2000) on device /job:local/replica:0/task:0/device:CPU:0 W0422 04:50:55.674397 140630240360192 py_utils.py:1064] WARNING!!! var weight_4 is using the default xavier initializer. Make sure this is intended. I0422 04:50:55.679850 140630240360192 cluster.py:447] Place variable 1bwds_wpm_level_lm/transformerlm/softmax/weight_4/var on /job:local/replica:0/task:0/device:CPU:0 1290521608 I0422 04:50:55.681550 140630240360192 py_utils.py:1220] Creating var 1bwds_wpm_level_lm/transformerlm/softmax/weight_4/var:0 shape=(2048, 2000) on device /job:local/replica:0/task:0/device:CPU:0 W0422 04:50:55.682286 140630240360192 py_utils.py:1064] WARNING!!! var weight_5 is using the default xavier initializer. Make sure this is intended. I0422 04:50:55.687738 140630240360192 cluster.py:447] Place variable 1bwds_wpm_level_lm/transformerlm/softmax/weight_5/var on /job:local/replica:0/task:0/device:CPU:0 1306905608 I0422 04:50:55.689441 140630240360192 py_utils.py:1220] Creating var 1bwds_wpm_level_lm/transformerlm/softmax/weight_5/var:0 shape=(2048, 2000) on device /job:local/replica:0/task:0/device:CPU:0 W0422 04:50:55.690177 140630240360192 py_utils.py:1064] WARNING!!! var weight_6 is using the default xavier initializer. Make sure this is intended. I0422 04:50:55.696134 140630240360192 cluster.py:447] Place variable 1bwds_wpm_level_lm/transformerlm/softmax/weight_6/var on /job:local/replica:0/task:0/device:CPU:0 1323289608 I0422 04:50:55.697928 140630240360192 py_utils.py:1220] Creating var 1bwds_wpm_level_lm/transformerlm/softmax/weight_6/var:0 shape=(2048, 2000) on device /job:local/replica:0/task:0/device:CPU:0 W0422 04:50:55.698664 140630240360192 py_utils.py:1064] WARNING!!! var weight_7 is using the default xavier initializer. Make sure this is intended. I0422 04:50:55.704111 140630240360192 cluster.py:447] Place variable 1bwds_wpm_level_lm/transformerlm/softmax/weight_7/var on /job:local/replica:0/task:0/device:CPU:0 1339673608 I0422 04:50:55.705825 140630240360192 py_utils.py:1220] Creating var 1bwds_wpm_level_lm/transformerlm/softmax/weight_7/var:0 shape=(2048, 2000) on device /job:local/replica:0/task:0/device:CPU:0 W0422 04:50:55.706562 140630240360192 py_utils.py:1064] WARNING!!! var weight_8 is using the default xavier initializer. Make sure this is intended. I0422 04:50:55.712048 140630240360192 cluster.py:447] Place variable 1bwds_wpm_level_lm/transformerlm/softmax/weight_8/var on /job:local/replica:0/task:0/device:CPU:0 1356057608 I0422 04:50:55.713749 140630240360192 py_utils.py:1220] Creating var 1bwds_wpm_level_lm/transformerlm/softmax/weight_8/var:0 shape=(2048, 2000) on device /job:local/replica:0/task:0/device:CPU:0 W0422 04:50:55.714601 140630240360192 py_utils.py:1064] WARNING!!! var weight_9 is using the default xavier initializer. Make sure this is intended. I0422 04:50:55.720016 140630240360192 cluster.py:447] Place variable 1bwds_wpm_level_lm/transformerlm/softmax/weight_9/var on /job:local/replica:0/task:0/device:CPU:0 1372441608 I0422 04:50:55.721721 140630240360192 py_utils.py:1220] Creating var 1bwds_wpm_level_lm/transformerlm/softmax/weight_9/var:0 shape=(2048, 2000) on device /job:local/replica:0/task:0/device:CPU:0 W0422 04:50:55.722543 140630240360192 py_utils.py:1064] WARNING!!! var weight_10 is using the default xavier initializer. Make sure this is intended. I0422 04:50:55.727982 140630240360192 cluster.py:447] Place variable 1bwds_wpm_level_lm/transformerlm/softmax/weight_10/var on /job:local/replica:0/task:0/device:CPU:0 1388825608 I0422 04:50:55.729754 140630240360192 py_utils.py:1220] Creating var 1bwds_wpm_level_lm/transformerlm/softmax/weight_10/var:0 shape=(2048, 2000) on device /job:local/replica:0/task:0/device:CPU:0 W0422 04:50:55.730490 140630240360192 py_utils.py:1064] WARNING!!! var weight_11 is using the default xavier initializer. Make sure this is intended. I0422 04:50:55.735930 140630240360192 cluster.py:447] Place variable 1bwds_wpm_level_lm/transformerlm/softmax/weight_11/var on /job:local/replica:0/task:0/device:CPU:0 1405209608 I0422 04:50:55.737634 140630240360192 py_utils.py:1220] Creating var 1bwds_wpm_level_lm/transformerlm/softmax/weight_11/var:0 shape=(2048, 2000) on device /job:local/replica:0/task:0/device:CPU:0 W0422 04:50:55.738367 140630240360192 py_utils.py:1064] WARNING!!! var weight_12 is using the default xavier initializer. Make sure this is intended. I0422 04:50:55.743828 140630240360192 cluster.py:447] Place variable 1bwds_wpm_level_lm/transformerlm/softmax/weight_12/var on /job:local/replica:0/task:0/device:CPU:0 1421593608 I0422 04:50:55.745522 140630240360192 py_utils.py:1220] Creating var 1bwds_wpm_level_lm/transformerlm/softmax/weight_12/var:0 shape=(2048, 2000) on device /job:local/replica:0/task:0/device:CPU:0 W0422 04:50:55.746253 140630240360192 py_utils.py:1064] WARNING!!! var weight_13 is using the default xavier initializer. Make sure this is intended. I0422 04:50:55.751787 140630240360192 cluster.py:447] Place variable 1bwds_wpm_level_lm/transformerlm/softmax/weight_13/var on /job:local/replica:0/task:0/device:CPU:0 1437977608 I0422 04:50:55.753489 140630240360192 py_utils.py:1220] Creating var 1bwds_wpm_level_lm/transformerlm/softmax/weight_13/var:0 shape=(2048, 2000) on device /job:local/replica:0/task:0/device:CPU:0 W0422 04:50:55.754220 140630240360192 py_utils.py:1064] WARNING!!! var weight_14 is using the default xavier initializer. Make sure this is intended. I0422 04:50:55.759649 140630240360192 cluster.py:447] Place variable 1bwds_wpm_level_lm/transformerlm/softmax/weight_14/var on /job:local/replica:0/task:0/device:CPU:0 1454361608 I0422 04:50:55.761415 140630240360192 py_utils.py:1220] Creating var 1bwds_wpm_level_lm/transformerlm/softmax/weight_14/var:0 shape=(2048, 2000) on device /job:local/replica:0/task:0/device:CPU:0 W0422 04:50:55.762151 140630240360192 py_utils.py:1064] WARNING!!! var weight_15 is using the default xavier initializer. Make sure this is intended. I0422 04:50:55.767653 140630240360192 cluster.py:447] Place variable 1bwds_wpm_level_lm/transformerlm/softmax/weight_15/var on /job:local/replica:0/task:0/device:CPU:0 1470745608 I0422 04:50:55.769365 140630240360192 py_utils.py:1220] Creating var 1bwds_wpm_level_lm/transformerlm/softmax/weight_15/var:0 shape=(2048, 2000) on device /job:local/replica:0/task:0/device:CPU:0 I0422 04:50:55.771805 140630240360192 cluster.py:447] Place variable 1bwds_wpm_level_lm/transformerlm/softmax/bias_0/var on /job:local/replica:0/task:0/device:CPU:0 1470753608 I0422 04:50:55.773370 140630240360192 py_utils.py:1220] Creating var 1bwds_wpm_level_lm/transformerlm/softmax/bias_0/var:0 shape=(2000,) on device /job:local/replica:0/task:0/device:CPU:0 I0422 04:50:55.776310 140630240360192 cluster.py:447] Place variable 1bwds_wpm_level_lm/transformerlm/softmax/bias_1/var on /job:local/replica:0/task:0/device:CPU:0 1470761608 I0422 04:50:55.777797 140630240360192 py_utils.py:1220] Creating var 1bwds_wpm_level_lm/transformerlm/softmax/bias_1/var:0 shape=(2000,) on device /job:local/replica:0/task:0/device:CPU:0 I0422 04:50:55.780194 140630240360192 cluster.py:447] Place variable 1bwds_wpm_level_lm/transformerlm/softmax/bias_2/var on /job:local/replica:0/task:0/device:CPU:0 1470769608 I0422 04:50:55.781655 140630240360192 py_utils.py:1220] Creating var 1bwds_wpm_level_lm/transformerlm/softmax/bias_2/var:0 shape=(2000,) on device /job:local/replica:0/task:0/device:CPU:0 I0422 04:50:55.784158 140630240360192 cluster.py:447] Place variable 1bwds_wpm_level_lm/transformerlm/softmax/bias_3/var on /job:local/replica:0/task:0/device:CPU:0 1470777608 I0422 04:50:55.785635 140630240360192 py_utils.py:1220] Creating var 1bwds_wpm_level_lm/transformerlm/softmax/bias_3/var:0 shape=(2000,) on device /job:local/replica:0/task:0/device:CPU:0 I0422 04:50:55.788031 140630240360192 cluster.py:447] Place variable 1bwds_wpm_level_lm/transformerlm/softmax/bias_4/var on /job:local/replica:0/task:0/device:CPU:0 1470785608 I0422 04:50:55.789505 140630240360192 py_utils.py:1220] Creating var 1bwds_wpm_level_lm/transformerlm/softmax/bias_4/var:0 shape=(2000,) on device /job:local/replica:0/task:0/device:CPU:0 I0422 04:50:55.791901 140630240360192 cluster.py:447] Place variable 1bwds_wpm_level_lm/transformerlm/softmax/bias_5/var on /job:local/replica:0/task:0/device:CPU:0 1470793608 I0422 04:50:55.793478 140630240360192 py_utils.py:1220] Creating var 1bwds_wpm_level_lm/transformerlm/softmax/bias_5/var:0 shape=(2000,) on device /job:local/replica:0/task:0/device:CPU:0 I0422 04:50:55.795880 140630240360192 cluster.py:447] Place variable 1bwds_wpm_level_lm/transformerlm/softmax/bias_6/var on /job:local/replica:0/task:0/device:CPU:0 1470801608 I0422 04:50:55.797353 140630240360192 py_utils.py:1220] Creating var 1bwds_wpm_level_lm/transformerlm/softmax/bias_6/var:0 shape=(2000,) on device /job:local/replica:0/task:0/device:CPU:0 I0422 04:50:55.799817 140630240360192 cluster.py:447] Place variable 1bwds_wpm_level_lm/transformerlm/softmax/bias_7/var on /job:local/replica:0/task:0/device:CPU:0 1470809608 I0422 04:50:55.801409 140630240360192 py_utils.py:1220] Creating var 1bwds_wpm_level_lm/transformerlm/softmax/bias_7/var:0 shape=(2000,) on device /job:local/replica:0/task:0/device:CPU:0 I0422 04:50:55.803802 140630240360192 cluster.py:447] Place variable 1bwds_wpm_level_lm/transformerlm/softmax/bias_8/var on /job:local/replica:0/task:0/device:CPU:0 1470817608 I0422 04:50:55.805267 140630240360192 py_utils.py:1220] Creating var 1bwds_wpm_level_lm/transformerlm/softmax/bias_8/var:0 shape=(2000,) on device /job:local/replica:0/task:0/device:CPU:0 I0422 04:50:55.807662 140630240360192 cluster.py:447] Place variable 1bwds_wpm_level_lm/transformerlm/softmax/bias_9/var on /job:local/replica:0/task:0/device:CPU:0 1470825608 I0422 04:50:55.809248 140630240360192 py_utils.py:1220] Creating var 1bwds_wpm_level_lm/transformerlm/softmax/bias_9/var:0 shape=(2000,) on device /job:local/replica:0/task:0/device:CPU:0 I0422 04:50:55.811642 140630240360192 cluster.py:447] Place variable 1bwds_wpm_level_lm/transformerlm/softmax/bias_10/var on /job:local/replica:0/task:0/device:CPU:0 1470833608 I0422 04:50:55.813117 140630240360192 py_utils.py:1220] Creating var 1bwds_wpm_level_lm/transformerlm/softmax/bias_10/var:0 shape=(2000,) on device /job:local/replica:0/task:0/device:CPU:0 I0422 04:50:55.815506 140630240360192 cluster.py:447] Place variable 1bwds_wpm_level_lm/transformerlm/softmax/bias_11/var on /job:local/replica:0/task:0/device:CPU:0 1470841608 I0422 04:50:55.817137 140630240360192 py_utils.py:1220] Creating var 1bwds_wpm_level_lm/transformerlm/softmax/bias_11/var:0 shape=(2000,) on device /job:local/replica:0/task:0/device:CPU:0 I0422 04:50:55.819528 140630240360192 cluster.py:447] Place variable 1bwds_wpm_level_lm/transformerlm/softmax/bias_12/var on /job:local/replica:0/task:0/device:CPU:0 1470849608 I0422 04:50:55.821022 140630240360192 py_utils.py:1220] Creating var 1bwds_wpm_level_lm/transformerlm/softmax/bias_12/var:0 shape=(2000,) on device /job:local/replica:0/task:0/device:CPU:0 I0422 04:50:55.823412 140630240360192 cluster.py:447] Place variable 1bwds_wpm_level_lm/transformerlm/softmax/bias_13/var on /job:local/replica:0/task:0/device:CPU:0 1470857608 I0422 04:50:55.825120 140630240360192 py_utils.py:1220] Creating var 1bwds_wpm_level_lm/transformerlm/softmax/bias_13/var:0 shape=(2000,) on device /job:local/replica:0/task:0/device:CPU:0 I0422 04:50:55.827517 140630240360192 cluster.py:447] Place variable 1bwds_wpm_level_lm/transformerlm/softmax/bias_14/var on /job:local/replica:0/task:0/device:CPU:0 1470865608 I0422 04:50:55.828989 140630240360192 py_utils.py:1220] Creating var 1bwds_wpm_level_lm/transformerlm/softmax/bias_14/var:0 shape=(2000,) on device /job:local/replica:0/task:0/device:CPU:0 I0422 04:50:55.831370 140630240360192 cluster.py:447] Place variable 1bwds_wpm_level_lm/transformerlm/softmax/bias_15/var on /job:local/replica:0/task:0/device:CPU:0 1470873608 I0422 04:50:55.832948 140630240360192 py_utils.py:1220] Creating var 1bwds_wpm_level_lm/transformerlm/softmax/bias_15/var:0 shape=(2000,) on device /job:local/replica:0/task:0/device:CPU:0 I0422 04:50:55.847032 140630240360192 cluster.py:447] Place variable 1bwds_wpm_level_lm/transformerlm/emb/wm/var on /job:local/replica:0/task:0/device:CPU:0 1733017608 I0422 04:50:55.848649 140630240360192 py_utils.py:1220] Creating var 1bwds_wpm_level_lm/transformerlm/emb/wm/var:0 shape=(32000, 2048) on device /job:local/replica:0/task:0/device:CPU:0 I0422 04:50:55.967817 140630240360192 py_utils.py:1277] === worker 0 === I0422 04:50:55.970683 140630240360192 py_utils.py:1267] worker 0: global_step /job:local/replica:0/task:0/device:CPU:0 -> /job:local/replica:0/task:0/device:CPU:0 I0422 04:50:55.970763 140630240360192 py_utils.py:1267] worker 0: input._tokenizer_default.global_step /job:local/replica:0/task:0/device:CPU:0 -> /job:local/replica:0/task:0/device:CPU:0 I0422 04:50:55.970824 140630240360192 py_utils.py:1267] worker 0: input.global_step /job:local/replica:0/task:0/device:CPU:0 -> /job:local/replica:0/task:0/device:CPU:0 I0422 04:50:55.970886 140630240360192 py_utils.py:1267] worker 0: lm.emb.global_step /job:local/replica:0/task:0/device:CPU:0 -> /job:local/replica:0/task:0/device:CPU:0 I0422 04:50:55.970943 140630240360192 py_utils.py:1267] worker 0: lm.emb.wm /job:local/replica:0/task:0/device:CPU:0 -> /job:local/replica:0/task:0/device:CPU:0 I0422 04:50:55.970993 140630240360192 py_utils.py:1267] worker 0: lm.global_step /job:local/replica:0/task:0/device:CPU:0 -> /job:local/replica:0/task:0/device:CPU:0 I0422 04:50:55.971043 140630240360192 py_utils.py:1267] worker 0: lm.input_dropout.global_step /job:local/replica:0/task:0/device:CPU:0 -> /job:local/replica:0/task:0/device:CPU:0 I0422 04:50:55.971092 140630240360192 py_utils.py:1267] worker 0: lm.position_emb.global_step /job:local/replica:0/task:0/device:CPU:0 -> /job:local/replica:0/task:0/device:CPU:0 I0422 04:50:55.971142 140630240360192 py_utils.py:1267] worker 0: lm.softmax.bias_0 /job:local/replica:0/task:0/device:CPU:0 -> /job:local/replica:0/task:0/device:CPU:0 I0422 04:50:55.971190 140630240360192 py_utils.py:1267] worker 0: lm.softmax.bias_1 /job:local/replica:0/task:0/device:CPU:0 -> /job:local/replica:0/task:0/device:CPU:0 I0422 04:50:55.971240 140630240360192 py_utils.py:1267] worker 0: lm.softmax.bias_10 /job:local/replica:0/task:0/device:CPU:0 -> /job:local/replica:0/task:0/device:CPU:0 I0422 04:50:55.971293 140630240360192 py_utils.py:1267] worker 0: lm.softmax.bias_11 /job:local/replica:0/task:0/device:CPU:0 -> /job:local/replica:0/task:0/device:CPU:0 I0422 04:50:55.971344 140630240360192 py_utils.py:1267] worker 0: lm.softmax.bias_12 /job:local/replica:0/task:0/device:CPU:0 -> /job:local/replica:0/task:0/device:CPU:0 I0422 04:50:55.971393 140630240360192 py_utils.py:1267] worker 0: lm.softmax.bias_13 /job:local/replica:0/task:0/device:CPU:0 -> /job:local/replica:0/task:0/device:CPU:0 I0422 04:50:55.971441 140630240360192 py_utils.py:1267] worker 0: lm.softmax.bias_14 /job:local/replica:0/task:0/device:CPU:0 -> /job:local/replica:0/task:0/device:CPU:0 I0422 04:50:55.971489 140630240360192 py_utils.py:1267] worker 0: lm.softmax.bias_15 /job:local/replica:0/task:0/device:CPU:0 -> /job:local/replica:0/task:0/device:CPU:0 I0422 04:50:55.971539 140630240360192 py_utils.py:1267] worker 0: lm.softmax.bias_2 /job:local/replica:0/task:0/device:CPU:0 -> /job:local/replica:0/task:0/device:CPU:0 I0422 04:50:55.971586 140630240360192 py_utils.py:1267] worker 0: lm.softmax.bias_3 /job:local/replica:0/task:0/device:CPU:0 -> /job:local/replica:0/task:0/device:CPU:0 I0422 04:50:55.971635 140630240360192 py_utils.py:1267] worker 0: lm.softmax.bias_4 /job:local/replica:0/task:0/device:CPU:0 -> /job:local/replica:0/task:0/device:CPU:0 I0422 04:50:55.971683 140630240360192 py_utils.py:1267] worker 0: lm.softmax.bias_5 /job:local/replica:0/task:0/device:CPU:0 -> /job:local/replica:0/task:0/device:CPU:0 I0422 04:50:55.971731 140630240360192 py_utils.py:1267] worker 0: lm.softmax.bias_6 /job:local/replica:0/task:0/device:CPU:0 -> /job:local/replica:0/task:0/device:CPU:0 I0422 04:50:55.971780 140630240360192 py_utils.py:1267] worker 0: lm.softmax.bias_7 /job:local/replica:0/task:0/device:CPU:0 -> /job:local/replica:0/task:0/device:CPU:0 I0422 04:50:55.971828 140630240360192 py_utils.py:1267] worker 0: lm.softmax.bias_8 /job:local/replica:0/task:0/device:CPU:0 -> /job:local/replica:0/task:0/device:CPU:0 I0422 04:50:55.971878 140630240360192 py_utils.py:1267] worker 0: lm.softmax.bias_9 /job:local/replica:0/task:0/device:CPU:0 -> /job:local/replica:0/task:0/device:CPU:0 I0422 04:50:55.971925 140630240360192 py_utils.py:1267] worker 0: lm.softmax.global_step /job:local/replica:0/task:0/device:CPU:0 -> /job:local/replica:0/task:0/device:CPU:0 I0422 04:50:55.971975 140630240360192 py_utils.py:1267] worker 0: lm.softmax.weight_0 /job:local/replica:0/task:0/device:CPU:0 -> /job:local/replica:0/task:0/device:CPU:0 I0422 04:50:55.972023 140630240360192 py_utils.py:1267] worker 0: lm.softmax.weight_1 /job:local/replica:0/task:0/device:CPU:0 -> /job:local/replica:0/task:0/device:CPU:0 I0422 04:50:55.972070 140630240360192 py_utils.py:1267] worker 0: lm.softmax.weight_10 /job:local/replica:0/task:0/device:CPU:0 -> /job:local/replica:0/task:0/device:CPU:0 I0422 04:50:55.972120 140630240360192 py_utils.py:1267] worker 0: lm.softmax.weight_11 /job:local/replica:0/task:0/device:CPU:0 -> /job:local/replica:0/task:0/device:CPU:0 I0422 04:50:55.972172 140630240360192 py_utils.py:1267] worker 0: lm.softmax.weight_12 /job:local/replica:0/task:0/device:CPU:0 -> /job:local/replica:0/task:0/device:CPU:0 I0422 04:50:55.972222 140630240360192 py_utils.py:1267] worker 0: lm.softmax.weight_13 /job:local/replica:0/task:0/device:CPU:0 -> /job:local/replica:0/task:0/device:CPU:0 I0422 04:50:55.972270 140630240360192 py_utils.py:1267] worker 0: lm.softmax.weight_14 /job:local/replica:0/task:0/device:CPU:0 -> /job:local/replica:0/task:0/device:CPU:0 I0422 04:50:55.972317 140630240360192 py_utils.py:1267] worker 0: lm.softmax.weight_15 /job:local/replica:0/task:0/device:CPU:0 -> /job:local/replica:0/task:0/device:CPU:0 I0422 04:50:55.972367 140630240360192 py_utils.py:1267] worker 0: lm.softmax.weight_2 /job:local/replica:0/task:0/device:CPU:0 -> /job:local/replica:0/task:0/device:CPU:0 I0422 04:50:55.972414 140630240360192 py_utils.py:1267] worker 0: lm.softmax.weight_3 /job:local/replica:0/task:0/device:CPU:0 -> /job:local/replica:0/task:0/device:CPU:0 I0422 04:50:55.972462 140630240360192 py_utils.py:1267] worker 0: lm.softmax.weight_4 /job:local/replica:0/task:0/device:CPU:0 -> /job:local/replica:0/task:0/device:CPU:0 I0422 04:50:55.972511 140630240360192 py_utils.py:1267] worker 0: lm.softmax.weight_5 /job:local/replica:0/task:0/device:CPU:0 -> /job:local/replica:0/task:0/device:CPU:0 I0422 04:50:55.972558 140630240360192 py_utils.py:1267] worker 0: lm.softmax.weight_6 /job:local/replica:0/task:0/device:CPU:0 -> /job:local/replica:0/task:0/device:CPU:0 I0422 04:50:55.972606 140630240360192 py_utils.py:1267] worker 0: lm.softmax.weight_7 /job:local/replica:0/task:0/device:CPU:0 -> /job:local/replica:0/task:0/device:CPU:0 I0422 04:50:55.972656 140630240360192 py_utils.py:1267] worker 0: lm.softmax.weight_8 /job:local/replica:0/task:0/device:CPU:0 -> /job:local/replica:0/task:0/device:CPU:0 I0422 04:50:55.972702 140630240360192 py_utils.py:1267] worker 0: lm.softmax.weight_9 /job:local/replica:0/task:0/device:CPU:0 -> /job:local/replica:0/task:0/device:CPU:0 I0422 04:50:55.972750 140630240360192 py_utils.py:1267] worker 0: lm.stack.cell_0.encoder_0.fflayer.fflayer.dropout[0].global_step /job:local/replica:0/task:0/device:CPU:0 -> /job:local/replica:0/task:0/device:CPU:0 I0422 04:50:55.972799 140630240360192 py_utils.py:1267] worker 0: lm.stack.cell_0.encoder_0.fflayer.fflayer.dropout[1].global_step /job:local/replica:0/task:0/device:CPU:0 -> /job:local/replica:0/task:0/device:CPU:0 I0422 04:50:55.972846 140630240360192 py_utils.py:1267] worker 0: lm.stack.cell_0.encoder_0.fflayer.fflayer.fc[0].b /job:local/replica:0/task:0/device:CPU:0 -> /job:local/replica:0/task:0/device:CPU:0 I0422 04:50:55.972894 140630240360192 py_utils.py:1267] worker 0: lm.stack.cell_0.encoder_0.fflayer.fflayer.fc[0].global_step /job:local/replica:0/task:0/device:CPU:0 -> /job:local/replica:0/task:0/device:CPU:0 I0422 04:50:55.972943 140630240360192 py_utils.py:1267] worker 0: lm.stack.cell_0.encoder_0.fflayer.fflayer.fc[0].w /job:local/replica:0/task:0/device:CPU:0 -> /job:local/replica:0/task:0/device:CPU:0 I0422 04:50:55.972991 140630240360192 py_utils.py:1267] worker 0: lm.stack.cell_0.encoder_0.fflayer.fflayer.fc[1].b /job:local/replica:0/task:0/device:CPU:0 -> /job:local/replica:0/task:0/device:CPU:0 I0422 04:50:55.973040 140630240360192 py_utils.py:1267] worker 0: lm.stack.cell_0.encoder_0.fflayer.fflayer.fc[1].global_step /job:local/replica:0/task:0/device:CPU:0 -> /job:local/replica:0/task:0/device:CPU:0 I0422 04:50:55.973092 140630240360192 py_utils.py:1267] worker 0: lm.stack.cell_0.encoder_0.fflayer.fflayer.fc[1].w /job:local/replica:0/task:0/device:CPU:0 -> /job:local/replica:0/task:0/device:CPU:0 I0422 04:50:55.973140 140630240360192 py_utils.py:1267] worker 0: lm.stack.cell_0.encoder_0.fflayer.fflayer.global_step /job:local/replica:0/task:0/device:CPU:0 -> /job:local/replica:0/task:0/device:CPU:0 I0422 04:50:55.973189 140630240360192 py_utils.py:1267] worker 0: lm.stack.cell_0.encoder_0.fflayer.global_step /job:local/replica:0/task:0/device:CPU:0 -> /job:local/replica:0/task:0/device:CPU:0 I0422 04:50:55.973237 140630240360192 py_utils.py:1267] worker 0: lm.stack.cell_0.encoder_0.fflayer.layer_norm.bias /job:local/replica:0/task:0/device:CPU:0 -> /job:local/replica:0/task:0/device:CPU:0 I0422 04:50:55.973284 140630240360192 py_utils.py:1267] worker 0: lm.stack.cell_0.encoder_0.fflayer.layer_norm.global_step /job:local/replica:0/task:0/device:CPU:0 -> /job:local/replica:0/task:0/device:CPU:0 I0422 04:50:55.973332 140630240360192 py_utils.py:1267] worker 0: lm.stack.cell_0.encoder_0.fflayer.layer_norm.scale /job:local/replica:0/task:0/device:CPU:0 -> /job:local/replica:0/task:0/device:CPU:0 I0422 04:50:55.973381 140630240360192 py_utils.py:1267] worker 0: lm.stack.cell_0.encoder_0.fflayer.residual_dropout.global_step /job:local/replica:0/task:0/device:CPU:0 -> /job:local/replica:0/task:0/device:CPU:0 I0422 04:50:55.973429 140630240360192 py_utils.py:1267] worker 0: lm.stack.cell_0.encoder_0.global_step /job:local/replica:0/task:0/device:CPU:0 -> /job:local/replica:0/task:0/device:CPU:0 I0422 04:50:55.973478 140630240360192 py_utils.py:1267] worker 0: lm.stack.cell_0.encoder_0.self_atten.atten.atten.global_step /job:local/replica:0/task:0/device:CPU:0 -> /job:local/replica:0/task:0/device:CPU:0 I0422 04:50:55.973526 140630240360192 py_utils.py:1267] worker 0: lm.stack.cell_0.encoder_0.self_atten.atten.atten.per_dim_scale /job:local/replica:0/task:0/device:CPU:0 -> /job:local/replica:0/task:0/device:CPU:0 I0422 04:50:55.973573 140630240360192 py_utils.py:1267] worker 0: lm.stack.cell_0.encoder_0.self_atten.atten.ctx_post_proj /job:local/replica:0/task:0/device:CPU:0 -> /job:local/replica:0/task:0/device:CPU:0 I0422 04:50:55.973622 140630240360192 py_utils.py:1267] worker 0: lm.stack.cell_0.encoder_0.self_atten.atten.ctx_post_proj_b /job:local/replica:0/task:0/device:CPU:0 -> /job:local/replica:0/task:0/device:CPU:0 I0422 04:50:55.973670 140630240360192 py_utils.py:1267] worker 0: lm.stack.cell_0.encoder_0.self_atten.atten.ctx_proj /job:local/replica:0/task:0/device:CPU:0 -> /job:local/replica:0/task:0/device:CPU:0 I0422 04:50:55.973717 140630240360192 py_utils.py:1267] worker 0: lm.stack.cell_0.encoder_0.self_atten.atten.ctx_proj_b /job:local/replica:0/task:0/device:CPU:0 -> /job:local/replica:0/task:0/device:CPU:0 I0422 04:50:55.973766 140630240360192 py_utils.py:1267] worker 0: lm.stack.cell_0.encoder_0.self_atten.atten.global_step /job:local/replica:0/task:0/device:CPU:0 -> /job:local/replica:0/task:0/device:CPU:0 I0422 04:50:55.973814 140630240360192 py_utils.py:1267] worker 0: lm.stack.cell_0.encoder_0.self_atten.atten.query_proj /job:local/replica:0/task:0/device:CPU:0 -> /job:local/replica:0/task:0/device:CPU:0 I0422 04:50:55.973861 140630240360192 py_utils.py:1267] worker 0: lm.stack.cell_0.encoder_0.self_atten.atten.query_proj_b /job:local/replica:0/task:0/device:CPU:0 -> /job:local/replica:0/task:0/device:CPU:0 I0422 04:50:55.973910 140630240360192 py_utils.py:1267] worker 0: lm.stack.cell_0.encoder_0.self_atten.atten.source_proj /job:local/replica:0/task:0/device:CPU:0 -> /job:local/replica:0/task:0/device:CPU:0 I0422 04:50:55.973962 140630240360192 py_utils.py:1267] worker 0: lm.stack.cell_0.encoder_0.self_atten.atten.source_proj_b /job:local/replica:0/task:0/device:CPU:0 -> /job:local/replica:0/task:0/device:CPU:0 I0422 04:50:55.974009 140630240360192 py_utils.py:1267] worker 0: lm.stack.cell_0.encoder_0.self_atten.global_step /job:local/replica:0/task:0/device:CPU:0 -> /job:local/replica:0/task:0/device:CPU:0 I0422 04:50:55.974059 140630240360192 py_utils.py:1267] worker 0: lm.stack.cell_0.encoder_0.self_atten.layer_norm.bias /job:local/replica:0/task:0/device:CPU:0 -> /job:local/replica:0/task:0/device:CPU:0 I0422 04:50:55.974107 140630240360192 py_utils.py:1267] worker 0: lm.stack.cell_0.encoder_0.self_atten.layer_norm.global_step /job:local/replica:0/task:0/device:CPU:0 -> /job:local/replica:0/task:0/device:CPU:0 I0422 04:50:55.974154 140630240360192 py_utils.py:1267] worker 0: lm.stack.cell_0.encoder_0.self_atten.layer_norm.scale /job:local/replica:0/task:0/device:CPU:0 -> /job:local/replica:0/task:0/device:CPU:0 I0422 04:50:55.974203 140630240360192 py_utils.py:1267] worker 0: lm.stack.cell_0.encoder_0.self_atten.residual_dropout.global_step /job:local/replica:0/task:0/device:CPU:0 -> /job:local/replica:0/task:0/device:CPU:0 I0422 04:50:55.974250 140630240360192 py_utils.py:1267] worker 0: lm.stack.cell_0.encoder_1.fflayer.fflayer.dropout[0].global_step /job:local/replica:0/task:0/device:CPU:0 -> /job:local/replica:0/task:0/device:CPU:0 I0422 04:50:55.974298 140630240360192 py_utils.py:1267] worker 0: lm.stack.cell_0.encoder_1.fflayer.fflayer.dropout[1].global_step /job:local/replica:0/task:0/device:CPU:0 -> /job:local/replica:0/task:0/device:CPU:0 I0422 04:50:55.974345 140630240360192 py_utils.py:1267] worker 0: lm.stack.cell_0.encoder_1.fflayer.fflayer.fc[0].b /job:local/replica:0/task:0/device:CPU:0 -> /job:local/replica:0/task:0/device:CPU:0 I0422 04:50:55.974394 140630240360192 py_utils.py:1267] worker 0: lm.stack.cell_0.encoder_1.fflayer.fflayer.fc[0].global_step /job:local/replica:0/task:0/device:CPU:0 -> /job:local/replica:0/task:0/device:CPU:0 I0422 04:50:55.974441 140630240360192 py_utils.py:1267] worker 0: lm.stack.cell_0.encoder_1.fflayer.fflayer.fc[0].w /job:local/replica:0/task:0/device:CPU:0 -> /job:local/replica:0/task:0/device:CPU:0 I0422 04:50:55.974488 140630240360192 py_utils.py:1267] worker 0: lm.stack.cell_0.encoder_1.fflayer.fflayer.fc[1].b /job:local/replica:0/task:0/device:CPU:0 -> /job:local/replica:0/task:0/device:CPU:0 I0422 04:50:55.974536 140630240360192 py_utils.py:1267] worker 0: lm.stack.cell_0.encoder_1.fflayer.fflayer.fc[1].global_step /job:local/replica:0/task:0/device:CPU:0 -> /job:local/replica:0/task:0/device:CPU:0 I0422 04:50:55.974584 140630240360192 py_utils.py:1267] worker 0: lm.stack.cell_0.encoder_1.fflayer.fflayer.fc[1].w /job:local/replica:0/task:0/device:CPU:0 -> /job:local/replica:0/task:0/device:CPU:0 I0422 04:50:55.974632 140630240360192 py_utils.py:1267] worker 0: lm.stack.cell_0.encoder_1.fflayer.fflayer.global_step /job:local/replica:0/task:0/device:CPU:0 -> /job:local/replica:0/task:0/device:CPU:0 I0422 04:50:55.974679 140630240360192 py_utils.py:1267] worker 0: lm.stack.cell_0.encoder_1.fflayer.global_step /job:local/replica:0/task:0/device:CPU:0 -> /job:local/replica:0/task:0/device:CPU:0 I0422 04:50:55.974726 140630240360192 py_utils.py:1267] worker 0: lm.stack.cell_0.encoder_1.fflayer.layer_norm.bias /job:local/replica:0/task:0/device:CPU:0 -> /job:local/replica:0/task:0/device:CPU:0 I0422 04:50:55.974773 140630240360192 py_utils.py:1267] worker 0: lm.stack.cell_0.encoder_1.fflayer.layer_norm.global_step /job:local/replica:0/task:0/device:CPU:0 -> /job:local/replica:0/task:0/device:CPU:0 I0422 04:50:55.974822 140630240360192 py_utils.py:1267] worker 0: lm.stack.cell_0.encoder_1.fflayer.layer_norm.scale /job:local/replica:0/task:0/device:CPU:0 -> /job:local/replica:0/task:0/device:CPU:0 I0422 04:50:55.974880 140630240360192 py_utils.py:1267] worker 0: lm.stack.cell_0.encoder_1.fflayer.residual_dropout.global_step /job:local/replica:0/task:0/device:CPU:0 -> /job:local/replica:0/task:0/device:CPU:0 I0422 04:50:55.974932 140630240360192 py_utils.py:1267] worker 0: lm.stack.cell_0.encoder_1.global_step /job:local/replica:0/task:0/device:CPU:0 -> /job:local/replica:0/task:0/device:CPU:0 I0422 04:50:55.974982 140630240360192 py_utils.py:1267] worker 0: lm.stack.cell_0.encoder_1.self_atten.atten.atten.global_step /job:local/replica:0/task:0/device:CPU:0 -> /job:local/replica:0/task:0/device:CPU:0 I0422 04:50:55.975029 140630240360192 py_utils.py:1267] worker 0: lm.stack.cell_0.encoder_1.self_atten.atten.atten.per_dim_scale /job:local/replica:0/task:0/device:CPU:0 -> /job:local/replica:0/task:0/device:CPU:0 I0422 04:50:55.975078 140630240360192 py_utils.py:1267] worker 0: lm.stack.cell_0.encoder_1.self_atten.atten.ctx_post_proj /job:local/replica:0/task:0/device:CPU:0 -> /job:local/replica:0/task:0/device:CPU:0 I0422 04:50:55.975125 140630240360192 py_utils.py:1267] worker 0: lm.stack.cell_0.encoder_1.self_atten.atten.ctx_post_proj_b /job:local/replica:0/task:0/device:CPU:0 -> /job:local/replica:0/task:0/device:CPU:0 I0422 04:50:55.975172 140630240360192 py_utils.py:1267] worker 0: lm.stack.cell_0.encoder_1.self_atten.atten.ctx_proj /job:local/replica:0/task:0/device:CPU:0 -> /job:local/replica:0/task:0/device:CPU:0 I0422 04:50:55.975219 140630240360192 py_utils.py:1267] worker 0: lm.stack.cell_0.encoder_1.self_atten.atten.ctx_proj_b /job:local/replica:0/task:0/device:CPU:0 -> /job:local/replica:0/task:0/device:CPU:0 I0422 04:50:55.975269 140630240360192 py_utils.py:1267] worker 0: lm.stack.cell_0.encoder_1.self_atten.atten.global_step /job:local/replica:0/task:0/device:CPU:0 -> /job:local/replica:0/task:0/device:CPU:0 I0422 04:50:55.975317 140630240360192 py_utils.py:1267] worker 0: lm.stack.cell_0.encoder_1.self_atten.atten.query_proj /job:local/replica:0/task:0/device:CPU:0 -> /job:local/replica:0/task:0/device:CPU:0 I0422 04:50:55.975364 140630240360192 py_utils.py:1267] worker 0: lm.stack.cell_0.encoder_1.self_atten.atten.query_proj_b /job:local/replica:0/task:0/device:CPU:0 -> /job:local/replica:0/task:0/device:CPU:0 I0422 04:50:55.975413 140630240360192 py_utils.py:1267] worker 0: lm.stack.cell_0.encoder_1.self_atten.atten.source_proj /job:local/replica:0/task:0/device:CPU:0 -> /job:local/replica:0/task:0/device:CPU:0 I0422 04:50:55.975461 140630240360192 py_utils.py:1267] worker 0: lm.stack.cell_0.encoder_1.self_atten.atten.source_proj_b /job:local/replica:0/task:0/device:CPU:0 -> /job:local/replica:0/task:0/device:CPU:0 I0422 04:50:55.975508 140630240360192 py_utils.py:1267] worker 0: lm.stack.cell_0.encoder_1.self_atten.global_step /job:local/replica:0/task:0/device:CPU:0 -> /job:local/replica:0/task:0/device:CPU:0 I0422 04:50:55.975557 140630240360192 py_utils.py:1267] worker 0: lm.stack.cell_0.encoder_1.self_atten.layer_norm.bias /job:local/replica:0/task:0/device:CPU:0 -> /job:local/replica:0/task:0/device:CPU:0 I0422 04:50:55.975605 140630240360192 py_utils.py:1267] worker 0: lm.stack.cell_0.encoder_1.self_atten.layer_norm.global_step /job:local/replica:0/task:0/device:CPU:0 -> /job:local/replica:0/task:0/device:CPU:0 I0422 04:50:55.975652 140630240360192 py_utils.py:1267] worker 0: lm.stack.cell_0.encoder_1.self_atten.layer_norm.scale /job:local/replica:0/task:0/device:CPU:0 -> /job:local/replica:0/task:0/device:CPU:0 I0422 04:50:55.975699 140630240360192 py_utils.py:1267] worker 0: lm.stack.cell_0.encoder_1.self_atten.residual_dropout.global_step /job:local/replica:0/task:0/device:CPU:0 -> /job:local/replica:0/task:0/device:CPU:0 I0422 04:50:55.975749 140630240360192 py_utils.py:1267] worker 0: lm.stack.cell_0.encoder_2.fflayer.fflayer.dropout[0].global_step /job:local/replica:0/task:0/device:CPU:0 -> /job:local/replica:0/task:0/device:CPU:0 I0422 04:50:55.975800 140630240360192 py_utils.py:1267] worker 0: lm.stack.cell_0.encoder_2.fflayer.fflayer.dropout[1].global_step /job:local/replica:0/task:0/device:CPU:0 -> /job:local/replica:0/task:0/device:CPU:0 I0422 04:50:55.975847 140630240360192 py_utils.py:1267] worker 0: lm.stack.cell_0.encoder_2.fflayer.fflayer.fc[0].b /job:local/replica:0/task:0/device:CPU:0 -> /job:local/replica:0/task:0/device:CPU:0 I0422 04:50:55.975895 140630240360192 py_utils.py:1267] worker 0: lm.stack.cell_0.encoder_2.fflayer.fflayer.fc[0].global_step /job:local/replica:0/task:0/device:CPU:0 -> /job:local/replica:0/task:0/device:CPU:0 I0422 04:50:55.975944 140630240360192 py_utils.py:1267] worker 0: lm.stack.cell_0.encoder_2.fflayer.fflayer.fc[0].w /job:local/replica:0/task:0/device:CPU:0 -> /job:local/replica:0/task:0/device:CPU:0 I0422 04:50:55.975991 140630240360192 py_utils.py:1267] worker 0: lm.stack.cell_0.encoder_2.fflayer.fflayer.fc[1].b /job:local/replica:0/task:0/device:CPU:0 -> /job:local/replica:0/task:0/device:CPU:0 I0422 04:50:55.976039 140630240360192 py_utils.py:1267] worker 0: lm.stack.cell_0.encoder_2.fflayer.fflayer.fc[1].global_step /job:local/replica:0/task:0/device:CPU:0 -> /job:local/replica:0/task:0/device:CPU:0 I0422 04:50:55.976090 140630240360192 py_utils.py:1267] worker 0: lm.stack.cell_0.encoder_2.fflayer.fflayer.fc[1].w /job:local/replica:0/task:0/device:CPU:0 -> /job:local/replica:0/task:0/device:CPU:0 I0422 04:50:55.976171 140630240360192 py_utils.py:1267] worker 0: lm.stack.cell_0.encoder_2.fflayer.fflayer.global_step /job:local/replica:0/task:0/device:CPU:0 -> /job:local/replica:0/task:0/device:CPU:0 I0422 04:50:55.976224 140630240360192 py_utils.py:1267] worker 0: lm.stack.cell_0.encoder_2.fflayer.global_step /job:local/replica:0/task:0/device:CPU:0 -> /job:local/replica:0/task:0/device:CPU:0 I0422 04:50:55.976277 140630240360192 py_utils.py:1267] worker 0: lm.stack.cell_0.encoder_2.fflayer.layer_norm.bias /job:local/replica:0/task:0/device:CPU:0 -> /job:local/replica:0/task:0/device:CPU:0 I0422 04:50:55.976329 140630240360192 py_utils.py:1267] worker 0: lm.stack.cell_0.encoder_2.fflayer.layer_norm.global_step /job:local/replica:0/task:0/device:CPU:0 -> /job:local/replica:0/task:0/device:CPU:0 I0422 04:50:55.976381 140630240360192 py_utils.py:1267] worker 0: lm.stack.cell_0.encoder_2.fflayer.layer_norm.scale /job:local/replica:0/task:0/device:CPU:0 -> /job:local/replica:0/task:0/device:CPU:0 I0422 04:50:55.976447 140630240360192 py_utils.py:1267] worker 0: lm.stack.cell_0.encoder_2.fflayer.residual_dropout.global_step /job:local/replica:0/task:0/device:CPU:0 -> /job:local/replica:0/task:0/device:CPU:0 I0422 04:50:55.976510 140630240360192 py_utils.py:1267] worker 0: lm.stack.cell_0.encoder_2.global_step /job:local/replica:0/task:0/device:CPU:0 -> /job:local/replica:0/task:0/device:CPU:0 I0422 04:50:55.976557 140630240360192 py_utils.py:1267] worker 0: lm.stack.cell_0.encoder_2.self_atten.atten.atten.global_step /job:local/replica:0/task:0/device:CPU:0 -> /job:local/replica:0/task:0/device:CPU:0 I0422 04:50:55.976605 140630240360192 py_utils.py:1267] worker 0: lm.stack.cell_0.encoder_2.self_atten.atten.atten.per_dim_scale /job:local/replica:0/task:0/device:CPU:0 -> /job:local/replica:0/task:0/device:CPU:0 I0422 04:50:55.976654 140630240360192 py_utils.py:1267] worker 0: lm.stack.cell_0.encoder_2.self_atten.atten.ctx_post_proj /job:local/replica:0/task:0/device:CPU:0 -> /job:local/replica:0/task:0/device:CPU:0 I0422 04:50:55.976701 140630240360192 py_utils.py:1267] worker 0: lm.stack.cell_0.encoder_2.self_atten.atten.ctx_post_proj_b /job:local/replica:0/task:0/device:CPU:0 -> /job:local/replica:0/task:0/device:CPU:0 I0422 04:50:55.976752 140630240360192 py_utils.py:1267] worker 0: lm.stack.cell_0.encoder_2.self_atten.atten.ctx_proj /job:local/replica:0/task:0/device:CPU:0 -> /job:local/replica:0/task:0/device:CPU:0 I0422 04:50:55.976802 140630240360192 py_utils.py:1267] worker 0: lm.stack.cell_0.encoder_2.self_atten.atten.ctx_proj_b /job:local/replica:0/task:0/device:CPU:0 -> /job:local/replica:0/task:0/device:CPU:0 I0422 04:50:55.976850 140630240360192 py_utils.py:1267] worker 0: lm.stack.cell_0.encoder_2.self_atten.atten.global_step /job:local/replica:0/task:0/device:CPU:0 -> /job:local/replica:0/task:0/device:CPU:0 I0422 04:50:55.976897 140630240360192 py_utils.py:1267] worker 0: lm.stack.cell_0.encoder_2.self_atten.atten.query_proj /job:local/replica:0/task:0/device:CPU:0 -> /job:local/replica:0/task:0/device:CPU:0 I0422 04:50:55.976944 140630240360192 py_utils.py:1267] worker 0: lm.stack.cell_0.encoder_2.self_atten.atten.query_proj_b /job:local/replica:0/task:0/device:CPU:0 -> /job:local/replica:0/task:0/device:CPU:0 I0422 04:50:55.976991 140630240360192 py_utils.py:1267] worker 0: lm.stack.cell_0.encoder_2.self_atten.atten.source_proj /job:local/replica:0/task:0/device:CPU:0 -> /job:local/replica:0/task:0/device:CPU:0 I0422 04:50:55.977039 140630240360192 py_utils.py:1267] worker 0: lm.stack.cell_0.encoder_2.self_atten.atten.source_proj_b /job:local/replica:0/task:0/device:CPU:0 -> /job:local/replica:0/task:0/device:CPU:0 I0422 04:50:55.977087 140630240360192 py_utils.py:1267] worker 0: lm.stack.cell_0.encoder_2.self_atten.global_step /job:local/replica:0/task:0/device:CPU:0 -> /job:local/replica:0/task:0/device:CPU:0 I0422 04:50:55.977134 140630240360192 py_utils.py:1267] worker 0: lm.stack.cell_0.encoder_2.self_atten.layer_norm.bias /job:local/replica:0/task:0/device:CPU:0 -> /job:local/replica:0/task:0/device:CPU:0 I0422 04:50:55.977183 140630240360192 py_utils.py:1267] worker 0: lm.stack.cell_0.encoder_2.self_atten.layer_norm.global_step /job:local/replica:0/task:0/device:CPU:0 -> /job:local/replica:0/task:0/device:CPU:0 I0422 04:50:55.977231 140630240360192 py_utils.py:1267] worker 0: lm.stack.cell_0.encoder_2.self_atten.layer_norm.scale /job:local/replica:0/task:0/device:CPU:0 -> /job:local/replica:0/task:0/device:CPU:0 I0422 04:50:55.977277 140630240360192 py_utils.py:1267] worker 0: lm.stack.cell_0.encoder_2.self_atten.residual_dropout.global_step /job:local/replica:0/task:0/device:CPU:0 -> /job:local/replica:0/task:0/device:CPU:0 I0422 04:50:55.977325 140630240360192 py_utils.py:1267] worker 0: lm.stack.cell_0.encoder_3.fflayer.fflayer.dropout[0].global_step /job:local/replica:0/task:0/device:CPU:0 -> /job:local/replica:0/task:0/device:CPU:0 I0422 04:50:55.977374 140630240360192 py_utils.py:1267] worker 0: lm.stack.cell_0.encoder_3.fflayer.fflayer.dropout[1].global_step /job:local/replica:0/task:0/device:CPU:0 -> /job:local/replica:0/task:0/device:CPU:0 I0422 04:50:55.977421 140630240360192 py_utils.py:1267] worker 0: lm.stack.cell_0.encoder_3.fflayer.fflayer.fc[0].b /job:local/replica:0/task:0/device:CPU:0 -> /job:local/replica:0/task:0/device:CPU:0 I0422 04:50:55.977468 140630240360192 py_utils.py:1267] worker 0: lm.stack.cell_0.encoder_3.fflayer.fflayer.fc[0].global_step /job:local/replica:0/task:0/device:CPU:0 -> /job:local/replica:0/task:0/device:CPU:0 I0422 04:50:55.977515 140630240360192 py_utils.py:1267] worker 0: lm.stack.cell_0.encoder_3.fflayer.fflayer.fc[0].w /job:local/replica:0/task:0/device:CPU:0 -> /job:local/replica:0/task:0/device:CPU:0 I0422 04:50:55.977564 140630240360192 py_utils.py:1267] worker 0: lm.stack.cell_0.encoder_3.fflayer.fflayer.fc[1].b /job:local/replica:0/task:0/device:CPU:0 -> /job:local/replica:0/task:0/device:CPU:0 I0422 04:50:55.977612 140630240360192 py_utils.py:1267] worker 0: lm.stack.cell_0.encoder_3.fflayer.fflayer.fc[1].global_step /job:local/replica:0/task:0/device:CPU:0 -> /job:local/replica:0/task:0/device:CPU:0 I0422 04:50:55.977665 140630240360192 py_utils.py:1267] worker 0: lm.stack.cell_0.encoder_3.fflayer.fflayer.fc[1].w /job:local/replica:0/task:0/device:CPU:0 -> /job:local/replica:0/task:0/device:CPU:0 I0422 04:50:55.977714 140630240360192 py_utils.py:1267] worker 0: lm.stack.cell_0.encoder_3.fflayer.fflayer.global_step /job:local/replica:0/task:0/device:CPU:0 -> /job:local/replica:0/task:0/device:CPU:0 I0422 04:50:55.977761 140630240360192 py_utils.py:1267] worker 0: lm.stack.cell_0.encoder_3.fflayer.global_step /job:local/replica:0/task:0/device:CPU:0 -> /job:local/replica:0/task:0/device:CPU:0 I0422 04:50:55.977809 140630240360192 py_utils.py:1267] worker 0: lm.stack.cell_0.encoder_3.fflayer.layer_norm.bias /job:local/replica:0/task:0/device:CPU:0 -> /job:local/replica:0/task:0/device:CPU:0 I0422 04:50:55.977858 140630240360192 py_utils.py:1267] worker 0: lm.stack.cell_0.encoder_3.fflayer.layer_norm.global_step /job:local/replica:0/task:0/device:CPU:0 -> /job:local/replica:0/task:0/device:CPU:0 I0422 04:50:55.977905 140630240360192 py_utils.py:1267] worker 0: lm.stack.cell_0.encoder_3.fflayer.layer_norm.scale /job:local/replica:0/task:0/device:CPU:0 -> /job:local/replica:0/task:0/device:CPU:0 I0422 04:50:55.977953 140630240360192 py_utils.py:1267] worker 0: lm.stack.cell_0.encoder_3.fflayer.residual_dropout.global_step /job:local/replica:0/task:0/device:CPU:0 -> /job:local/replica:0/task:0/device:CPU:0 I0422 04:50:55.978002 140630240360192 py_utils.py:1267] worker 0: lm.stack.cell_0.encoder_3.global_step /job:local/replica:0/task:0/device:CPU:0 -> /job:local/replica:0/task:0/device:CPU:0 I0422 04:50:55.978049 140630240360192 py_utils.py:1267] worker 0: lm.stack.cell_0.encoder_3.self_atten.atten.atten.global_step /job:local/replica:0/task:0/device:CPU:0 -> /job:local/replica:0/task:0/device:CPU:0 I0422 04:50:55.978096 140630240360192 py_utils.py:1267] worker 0: lm.stack.cell_0.encoder_3.self_atten.atten.atten.per_dim_scale /job:local/replica:0/task:0/device:CPU:0 -> /job:local/replica:0/task:0/device:CPU:0 I0422 04:50:55.978144 140630240360192 py_utils.py:1267] worker 0: lm.stack.cell_0.encoder_3.self_atten.atten.ctx_post_proj /job:local/replica:0/task:0/device:CPU:0 -> /job:local/replica:0/task:0/device:CPU:0 I0422 04:50:55.978192 140630240360192 py_utils.py:1267] worker 0: lm.stack.cell_0.encoder_3.self_atten.atten.ctx_post_proj_b /job:local/replica:0/task:0/device:CPU:0 -> /job:local/replica:0/task:0/device:CPU:0 I0422 04:50:55.978240 140630240360192 py_utils.py:1267] worker 0: lm.stack.cell_0.encoder_3.self_atten.atten.ctx_proj /job:local/replica:0/task:0/device:CPU:0 -> /job:local/replica:0/task:0/device:CPU:0 I0422 04:50:55.978287 140630240360192 py_utils.py:1267] worker 0: lm.stack.cell_0.encoder_3.self_atten.atten.ctx_proj_b /job:local/replica:0/task:0/device:CPU:0 -> /job:local/replica:0/task:0/device:CPU:0 I0422 04:50:55.978336 140630240360192 py_utils.py:1267] worker 0: lm.stack.cell_0.encoder_3.self_atten.atten.global_step /job:local/replica:0/task:0/device:CPU:0 -> /job:local/replica:0/task:0/device:CPU:0 I0422 04:50:55.978383 140630240360192 py_utils.py:1267] worker 0: lm.stack.cell_0.encoder_3.self_atten.atten.query_proj /job:local/replica:0/task:0/device:CPU:0 -> /job:local/replica:0/task:0/device:CPU:0 I0422 04:50:55.978430 140630240360192 py_utils.py:1267] worker 0: lm.stack.cell_0.encoder_3.self_atten.atten.query_proj_b /job:local/replica:0/task:0/device:CPU:0 -> /job:local/replica:0/task:0/device:CPU:0 I0422 04:50:55.978477 140630240360192 py_utils.py:1267] worker 0: lm.stack.cell_0.encoder_3.self_atten.atten.source_proj /job:local/replica:0/task:0/device:CPU:0 -> /job:local/replica:0/task:0/device:CPU:0 I0422 04:50:55.978524 140630240360192 py_utils.py:1267] worker 0: lm.stack.cell_0.encoder_3.self_atten.atten.source_proj_b /job:local/replica:0/task:0/device:CPU:0 -> /job:local/replica:0/task:0/device:CPU:0 I0422 04:50:55.978575 140630240360192 py_utils.py:1267] worker 0: lm.stack.cell_0.encoder_3.self_atten.global_step /job:local/replica:0/task:0/device:CPU:0 -> /job:local/replica:0/task:0/device:CPU:0 I0422 04:50:55.978625 140630240360192 py_utils.py:1267] worker 0: lm.stack.cell_0.encoder_3.self_atten.layer_norm.bias /job:local/replica:0/task:0/device:CPU:0 -> /job:local/replica:0/task:0/device:CPU:0 I0422 04:50:55.978672 140630240360192 py_utils.py:1267] worker 0: lm.stack.cell_0.encoder_3.self_atten.layer_norm.global_step /job:local/replica:0/task:0/device:CPU:0 -> /job:local/replica:0/task:0/device:CPU:0 I0422 04:50:55.978720 140630240360192 py_utils.py:1267] worker 0: lm.stack.cell_0.encoder_3.self_atten.layer_norm.scale /job:local/replica:0/task:0/device:CPU:0 -> /job:local/replica:0/task:0/device:CPU:0 I0422 04:50:55.978769 140630240360192 py_utils.py:1267] worker 0: lm.stack.cell_0.encoder_3.self_atten.residual_dropout.global_step /job:local/replica:0/task:0/device:CPU:0 -> /job:local/replica:0/task:0/device:CPU:0 I0422 04:50:55.978816 140630240360192 py_utils.py:1267] worker 0: lm.stack.cell_0.encoder_4.fflayer.fflayer.dropout[0].global_step /job:local/replica:0/task:0/device:CPU:0 -> /job:local/replica:0/task:0/device:CPU:0 I0422 04:50:55.978864 140630240360192 py_utils.py:1267] worker 0: lm.stack.cell_0.encoder_4.fflayer.fflayer.dropout[1].global_step /job:local/replica:0/task:0/device:CPU:0 -> /job:local/replica:0/task:0/device:CPU:0 I0422 04:50:55.978929 140630240360192 py_utils.py:1267] worker 0: lm.stack.cell_0.encoder_4.fflayer.fflayer.fc[0].b /job:local/replica:0/task:0/device:CPU:0 -> /job:local/replica:0/task:0/device:CPU:0 I0422 04:50:55.978980 140630240360192 py_utils.py:1267] worker 0: lm.stack.cell_0.encoder_4.fflayer.fflayer.fc[0].global_step /job:local/replica:0/task:0/device:CPU:0 -> /job:local/replica:0/task:0/device:CPU:0 I0422 04:50:55.979027 140630240360192 py_utils.py:1267] worker 0: lm.stack.cell_0.encoder_4.fflayer.fflayer.fc[0].w /job:local/replica:0/task:0/device:CPU:0 -> /job:local/replica:0/task:0/device:CPU:0 I0422 04:50:55.979075 140630240360192 py_utils.py:1267] worker 0: lm.stack.cell_0.encoder_4.fflayer.fflayer.fc[1].b /job:local/replica:0/task:0/device:CPU:0 -> /job:local/replica:0/task:0/device:CPU:0 I0422 04:50:55.979123 140630240360192 py_utils.py:1267] worker 0: lm.stack.cell_0.encoder_4.fflayer.fflayer.fc[1].global_step /job:local/replica:0/task:0/device:CPU:0 -> /job:local/replica:0/task:0/device:CPU:0 I0422 04:50:55.979171 140630240360192 py_utils.py:1267] worker 0: lm.stack.cell_0.encoder_4.fflayer.fflayer.fc[1].w /job:local/replica:0/task:0/device:CPU:0 -> /job:local/replica:0/task:0/device:CPU:0 I0422 04:50:55.979218 140630240360192 py_utils.py:1267] worker 0: lm.stack.cell_0.encoder_4.fflayer.fflayer.global_step /job:local/replica:0/task:0/device:CPU:0 -> /job:local/replica:0/task:0/device:CPU:0 I0422 04:50:55.979266 140630240360192 py_utils.py:1267] worker 0: lm.stack.cell_0.encoder_4.fflayer.global_step /job:local/replica:0/task:0/device:CPU:0 -> /job:local/replica:0/task:0/device:CPU:0 I0422 04:50:55.979315 140630240360192 py_utils.py:1267] worker 0: lm.stack.cell_0.encoder_4.fflayer.layer_norm.bias /job:local/replica:0/task:0/device:CPU:0 -> /job:local/replica:0/task:0/device:CPU:0 I0422 04:50:55.979363 140630240360192 py_utils.py:1267] worker 0: lm.stack.cell_0.encoder_4.fflayer.layer_norm.global_step /job:local/replica:0/task:0/device:CPU:0 -> /job:local/replica:0/task:0/device:CPU:0 I0422 04:50:55.979412 140630240360192 py_utils.py:1267] worker 0: lm.stack.cell_0.encoder_4.fflayer.layer_norm.scale /job:local/replica:0/task:0/device:CPU:0 -> /job:local/replica:0/task:0/device:CPU:0 I0422 04:50:55.979464 140630240360192 py_utils.py:1267] worker 0: lm.stack.cell_0.encoder_4.fflayer.residual_dropout.global_step /job:local/replica:0/task:0/device:CPU:0 -> /job:local/replica:0/task:0/device:CPU:0 I0422 04:50:55.979512 140630240360192 py_utils.py:1267] worker 0: lm.stack.cell_0.encoder_4.global_step /job:local/replica:0/task:0/device:CPU:0 -> /job:local/replica:0/task:0/device:CPU:0 I0422 04:50:55.979561 140630240360192 py_utils.py:1267] worker 0: lm.stack.cell_0.encoder_4.self_atten.atten.atten.global_step /job:local/replica:0/task:0/device:CPU:0 -> /job:local/replica:0/task:0/device:CPU:0 I0422 04:50:55.979608 140630240360192 py_utils.py:1267] worker 0: lm.stack.cell_0.encoder_4.self_atten.atten.atten.per_dim_scale /job:local/replica:0/task:0/device:CPU:0 -> /job:local/replica:0/task:0/device:CPU:0 I0422 04:50:55.979655 140630240360192 py_utils.py:1267] worker 0: lm.stack.cell_0.encoder_4.self_atten.atten.ctx_post_proj /job:local/replica:0/task:0/device:CPU:0 -> /job:local/replica:0/task:0/device:CPU:0 I0422 04:50:55.979702 140630240360192 py_utils.py:1267] worker 0: lm.stack.cell_0.encoder_4.self_atten.atten.ctx_post_proj_b /job:local/replica:0/task:0/device:CPU:0 -> /job:local/replica:0/task:0/device:CPU:0 I0422 04:50:55.979751 140630240360192 py_utils.py:1267] worker 0: lm.stack.cell_0.encoder_4.self_atten.atten.ctx_proj /job:local/replica:0/task:0/device:CPU:0 -> /job:local/replica:0/task:0/device:CPU:0 I0422 04:50:55.979798 140630240360192 py_utils.py:1267] worker 0: lm.stack.cell_0.encoder_4.self_atten.atten.ctx_proj_b /job:local/replica:0/task:0/device:CPU:0 -> /job:local/replica:0/task:0/device:CPU:0 I0422 04:50:55.979846 140630240360192 py_utils.py:1267] worker 0: lm.stack.cell_0.encoder_4.self_atten.atten.global_step /job:local/replica:0/task:0/device:CPU:0 -> /job:local/replica:0/task:0/device:CPU:0 I0422 04:50:55.979893 140630240360192 py_utils.py:1267] worker 0: lm.stack.cell_0.encoder_4.self_atten.atten.query_proj /job:local/replica:0/task:0/device:CPU:0 -> /job:local/replica:0/task:0/device:CPU:0 I0422 04:50:55.979940 140630240360192 py_utils.py:1267] worker 0: lm.stack.cell_0.encoder_4.self_atten.atten.query_proj_b /job:local/replica:0/task:0/device:CPU:0 -> /job:local/replica:0/task:0/device:CPU:0 I0422 04:50:55.979989 140630240360192 py_utils.py:1267] worker 0: lm.stack.cell_0.encoder_4.self_atten.atten.source_proj /job:local/replica:0/task:0/device:CPU:0 -> /job:local/replica:0/task:0/device:CPU:0 I0422 04:50:55.980036 140630240360192 py_utils.py:1267] worker 0: lm.stack.cell_0.encoder_4.self_atten.atten.source_proj_b /job:local/replica:0/task:0/device:CPU:0 -> /job:local/replica:0/task:0/device:CPU:0 I0422 04:50:55.980083 140630240360192 py_utils.py:1267] worker 0: lm.stack.cell_0.encoder_4.self_atten.global_step /job:local/replica:0/task:0/device:CPU:0 -> /job:local/replica:0/task:0/device:CPU:0 I0422 04:50:55.980132 140630240360192 py_utils.py:1267] worker 0: lm.stack.cell_0.encoder_4.self_atten.layer_norm.bias /job:local/replica:0/task:0/device:CPU:0 -> /job:local/replica:0/task:0/device:CPU:0 I0422 04:50:55.980179 140630240360192 py_utils.py:1267] worker 0: lm.stack.cell_0.encoder_4.self_atten.layer_norm.global_step /job:local/replica:0/task:0/device:CPU:0 -> /job:local/replica:0/task:0/device:CPU:0 I0422 04:50:55.980226 140630240360192 py_utils.py:1267] worker 0: lm.stack.cell_0.encoder_4.self_atten.layer_norm.scale /job:local/replica:0/task:0/device:CPU:0 -> /job:local/replica:0/task:0/device:CPU:0 I0422 04:50:55.980274 140630240360192 py_utils.py:1267] worker 0: lm.stack.cell_0.encoder_4.self_atten.residual_dropout.global_step /job:local/replica:0/task:0/device:CPU:0 -> /job:local/replica:0/task:0/device:CPU:0 I0422 04:50:55.980321 140630240360192 py_utils.py:1267] worker 0: lm.stack.cell_0.encoder_5.fflayer.fflayer.dropout[0].global_step /job:local/replica:0/task:0/device:CPU:0 -> /job:local/replica:0/task:0/device:CPU:0 I0422 04:50:55.980374 140630240360192 py_utils.py:1267] worker 0: lm.stack.cell_0.encoder_5.fflayer.fflayer.dropout[1].global_step /job:local/replica:0/task:0/device:CPU:0 -> /job:local/replica:0/task:0/device:CPU:0 I0422 04:50:55.980422 140630240360192 py_utils.py:1267] worker 0: lm.stack.cell_0.encoder_5.fflayer.fflayer.fc[0].b /job:local/replica:0/task:0/device:CPU:0 -> /job:local/replica:0/task:0/device:CPU:0 I0422 04:50:55.980469 140630240360192 py_utils.py:1267] worker 0: lm.stack.cell_0.encoder_5.fflayer.fflayer.fc[0].global_step /job:local/replica:0/task:0/device:CPU:0 -> /job:local/replica:0/task:0/device:CPU:0 I0422 04:50:55.980518 140630240360192 py_utils.py:1267] worker 0: lm.stack.cell_0.encoder_5.fflayer.fflayer.fc[0].w /job:local/replica:0/task:0/device:CPU:0 -> /job:local/replica:0/task:0/device:CPU:0 I0422 04:50:55.980566 140630240360192 py_utils.py:1267] worker 0: lm.stack.cell_0.encoder_5.fflayer.fflayer.fc[1].b /job:local/replica:0/task:0/device:CPU:0 -> /job:local/replica:0/task:0/device:CPU:0 I0422 04:50:55.980612 140630240360192 py_utils.py:1267] worker 0: lm.stack.cell_0.encoder_5.fflayer.fflayer.fc[1].global_step /job:local/replica:0/task:0/device:CPU:0 -> /job:local/replica:0/task:0/device:CPU:0 I0422 04:50:55.980660 140630240360192 py_utils.py:1267] worker 0: lm.stack.cell_0.encoder_5.fflayer.fflayer.fc[1].w /job:local/replica:0/task:0/device:CPU:0 -> /job:local/replica:0/task:0/device:CPU:0 I0422 04:50:55.980709 140630240360192 py_utils.py:1267] worker 0: lm.stack.cell_0.encoder_5.fflayer.fflayer.global_step /job:local/replica:0/task:0/device:CPU:0 -> /job:local/replica:0/task:0/device:CPU:0 I0422 04:50:55.980756 140630240360192 py_utils.py:1267] worker 0: lm.stack.cell_0.encoder_5.fflayer.global_step /job:local/replica:0/task:0/device:CPU:0 -> /job:local/replica:0/task:0/device:CPU:0 I0422 04:50:55.980803 140630240360192 py_utils.py:1267] worker 0: lm.stack.cell_0.encoder_5.fflayer.layer_norm.bias /job:local/replica:0/task:0/device:CPU:0 -> /job:local/replica:0/task:0/device:CPU:0 I0422 04:50:55.980851 140630240360192 py_utils.py:1267] worker 0: lm.stack.cell_0.encoder_5.fflayer.layer_norm.global_step /job:local/replica:0/task:0/device:CPU:0 -> /job:local/replica:0/task:0/device:CPU:0 I0422 04:50:55.980900 140630240360192 py_utils.py:1267] worker 0: lm.stack.cell_0.encoder_5.fflayer.layer_norm.scale /job:local/replica:0/task:0/device:CPU:0 -> /job:local/replica:0/task:0/device:CPU:0 I0422 04:50:55.980947 140630240360192 py_utils.py:1267] worker 0: lm.stack.cell_0.encoder_5.fflayer.residual_dropout.global_step /job:local/replica:0/task:0/device:CPU:0 -> /job:local/replica:0/task:0/device:CPU:0 I0422 04:50:55.980994 140630240360192 py_utils.py:1267] worker 0: lm.stack.cell_0.encoder_5.global_step /job:local/replica:0/task:0/device:CPU:0 -> /job:local/replica:0/task:0/device:CPU:0 I0422 04:50:55.981043 140630240360192 py_utils.py:1267] worker 0: lm.stack.cell_0.encoder_5.self_atten.atten.atten.global_step /job:local/replica:0/task:0/device:CPU:0 -> /job:local/replica:0/task:0/device:CPU:0 I0422 04:50:55.981091 140630240360192 py_utils.py:1267] worker 0: lm.stack.cell_0.encoder_5.self_atten.atten.atten.per_dim_scale /job:local/replica:0/task:0/device:CPU:0 -> /job:local/replica:0/task:0/device:CPU:0 I0422 04:50:55.981137 140630240360192 py_utils.py:1267] worker 0: lm.stack.cell_0.encoder_5.self_atten.atten.ctx_post_proj /job:local/replica:0/task:0/device:CPU:0 -> /job:local/replica:0/task:0/device:CPU:0 I0422 04:50:55.981185 140630240360192 py_utils.py:1267] worker 0: lm.stack.cell_0.encoder_5.self_atten.atten.ctx_post_proj_b /job:local/replica:0/task:0/device:CPU:0 -> /job:local/replica:0/task:0/device:CPU:0 I0422 04:50:55.981235 140630240360192 py_utils.py:1267] worker 0: lm.stack.cell_0.encoder_5.self_atten.atten.ctx_proj /job:local/replica:0/task:0/device:CPU:0 -> /job:local/replica:0/task:0/device:CPU:0 I0422 04:50:55.981285 140630240360192 py_utils.py:1267] worker 0: lm.stack.cell_0.encoder_5.self_atten.atten.ctx_proj_b /job:local/replica:0/task:0/device:CPU:0 -> /job:local/replica:0/task:0/device:CPU:0 I0422 04:50:55.981332 140630240360192 py_utils.py:1267] worker 0: lm.stack.cell_0.encoder_5.self_atten.atten.global_step /job:local/replica:0/task:0/device:CPU:0 -> /job:local/replica:0/task:0/device:CPU:0 I0422 04:50:55.981379 140630240360192 py_utils.py:1267] worker 0: lm.stack.cell_0.encoder_5.self_atten.atten.query_proj /job:local/replica:0/task:0/device:CPU:0 -> /job:local/replica:0/task:0/device:CPU:0 I0422 04:50:55.981426 140630240360192 py_utils.py:1267] worker 0: lm.stack.cell_0.encoder_5.self_atten.atten.query_proj_b /job:local/replica:0/task:0/device:CPU:0 -> /job:local/replica:0/task:0/device:CPU:0 I0422 04:50:55.981475 140630240360192 py_utils.py:1267] worker 0: lm.stack.cell_0.encoder_5.self_atten.atten.source_proj /job:local/replica:0/task:0/device:CPU:0 -> /job:local/replica:0/task:0/device:CPU:0 I0422 04:50:55.981522 140630240360192 py_utils.py:1267] worker 0: lm.stack.cell_0.encoder_5.self_atten.atten.source_proj_b /job:local/replica:0/task:0/device:CPU:0 -> /job:local/replica:0/task:0/device:CPU:0 I0422 04:50:55.981570 140630240360192 py_utils.py:1267] worker 0: lm.stack.cell_0.encoder_5.self_atten.global_step /job:local/replica:0/task:0/device:CPU:0 -> /job:local/replica:0/task:0/device:CPU:0 I0422 04:50:55.981617 140630240360192 py_utils.py:1267] worker 0: lm.stack.cell_0.encoder_5.self_atten.layer_norm.bias /job:local/replica:0/task:0/device:CPU:0 -> /job:local/replica:0/task:0/device:CPU:0 I0422 04:50:55.981666 140630240360192 py_utils.py:1267] worker 0: lm.stack.cell_0.encoder_5.self_atten.layer_norm.global_step /job:local/replica:0/task:0/device:CPU:0 -> /job:local/replica:0/task:0/device:CPU:0 I0422 04:50:55.981713 140630240360192 py_utils.py:1267] worker 0: lm.stack.cell_0.encoder_5.self_atten.layer_norm.scale /job:local/replica:0/task:0/device:CPU:0 -> /job:local/replica:0/task:0/device:CPU:0 I0422 04:50:55.981760 140630240360192 py_utils.py:1267] worker 0: lm.stack.cell_0.encoder_5.self_atten.residual_dropout.global_step /job:local/replica:0/task:0/device:CPU:0 -> /job:local/replica:0/task:0/device:CPU:0 I0422 04:50:55.981807 140630240360192 py_utils.py:1267] worker 0: lm.stack.cell_0.global_step /job:local/replica:0/task:0/device:CPU:0 -> /job:local/replica:0/task:0/device:CPU:0 I0422 04:50:55.981856 140630240360192 py_utils.py:1267] worker 0: lm.stack.global_step /job:local/replica:0/task:0/device:CPU:0 -> /job:local/replica:0/task:0/device:CPU:0 I0422 04:50:55.981904 140630240360192 py_utils.py:1267] worker 0: lr_schedule.global_step /job:local/replica:0/task:0/device:CPU:0 -> /job:local/replica:0/task:0/device:CPU:0 I0422 04:50:55.981951 140630240360192 py_utils.py:1267] worker 0: optimizer.global_step /job:local/replica:0/task:0/device:CPU:0 -> /job:local/replica:0/task:0/device:CPU:0 I0422 04:50:55.982002 140630240360192 py_utils.py:1283] ========== I0422 04:50:56.112569 140630240360192 layers.py:1272] input shape = (1024, 8, 2048) I0422 04:50:56.143778 140630240360192 layers.py:1283] input_embs shape = (1024, 8, 2048) I0422 04:50:56.155425 140630240360192 gpipe.py:378] cell 0 input [, , None, None, None, None] I0422 04:50:57.801775 140630240360192 gpipe.py:469] pipeline output = [, , None, None, None, None] I0422 04:50:57.801980 140630240360192 layers_with_gpipe.py:643] Tensor("fprop/1bwds_wpm_level_lm/tower_0_0/Reshape_6:0", shape=(1024, 8, 2048), dtype=float32, device=/job:local/replica:0/task:0/device:CPU:0) I0422 04:50:57.802086 140630240360192 layers.py:1286] layer_out shape = (1024, 8, 2048) W0422 04:50:58.048001 140630240360192 deprecation_wrapper.py:119] From /tmp/lingvo/bazel-bin/lingvo/trainer.runfiles/__main__/lingvo/core/summary_utils.py:36: The name tf.summary.scalar is deprecated. Please use tf.compat.v1.summary.scalar instead. I0422 04:50:59.292220 140630240360192 py_utils.py:1730] AdjustGradientsWithLpLoss: lm.emb.wm: I0422 04:50:59.292478 140630240360192 py_utils.py:1730] AdjustGradientsWithLpLoss: lm.softmax.bias_0: I0422 04:50:59.292754 140630240360192 py_utils.py:1730] AdjustGradientsWithLpLoss: lm.softmax.bias_1: I0422 04:50:59.293147 140630240360192 py_utils.py:1730] AdjustGradientsWithLpLoss: lm.softmax.bias_10: I0422 04:50:59.293376 140630240360192 py_utils.py:1730] AdjustGradientsWithLpLoss: lm.softmax.bias_11: I0422 04:50:59.293589 140630240360192 py_utils.py:1730] AdjustGradientsWithLpLoss: lm.softmax.bias_12: I0422 04:50:59.293790 140630240360192 py_utils.py:1730] AdjustGradientsWithLpLoss: lm.softmax.bias_13: I0422 04:50:59.293983 140630240360192 py_utils.py:1730] AdjustGradientsWithLpLoss: lm.softmax.bias_14: I0422 04:50:59.294173 140630240360192 py_utils.py:1730] AdjustGradientsWithLpLoss: lm.softmax.bias_15: I0422 04:50:59.294362 140630240360192 py_utils.py:1730] AdjustGradientsWithLpLoss: lm.softmax.bias_2: I0422 04:50:59.294552 140630240360192 py_utils.py:1730] AdjustGradientsWithLpLoss: lm.softmax.bias_3: I0422 04:50:59.294739 140630240360192 py_utils.py:1730] AdjustGradientsWithLpLoss: lm.softmax.bias_4: I0422 04:50:59.294954 140630240360192 py_utils.py:1730] AdjustGradientsWithLpLoss: lm.softmax.bias_5: I0422 04:50:59.295146 140630240360192 py_utils.py:1730] AdjustGradientsWithLpLoss: lm.softmax.bias_6: I0422 04:50:59.295336 140630240360192 py_utils.py:1730] AdjustGradientsWithLpLoss: lm.softmax.bias_7: I0422 04:50:59.295525 140630240360192 py_utils.py:1730] AdjustGradientsWithLpLoss: lm.softmax.bias_8: I0422 04:50:59.295711 140630240360192 py_utils.py:1730] AdjustGradientsWithLpLoss: lm.softmax.bias_9: I0422 04:50:59.295909 140630240360192 py_utils.py:1730] AdjustGradientsWithLpLoss: lm.softmax.weight_0: I0422 04:50:59.296127 140630240360192 py_utils.py:1730] AdjustGradientsWithLpLoss: lm.softmax.weight_1: I0422 04:50:59.296328 140630240360192 py_utils.py:1730] AdjustGradientsWithLpLoss: lm.softmax.weight_10: I0422 04:50:59.296526 140630240360192 py_utils.py:1730] AdjustGradientsWithLpLoss: lm.softmax.weight_11: I0422 04:50:59.296725 140630240360192 py_utils.py:1730] AdjustGradientsWithLpLoss: lm.softmax.weight_12: I0422 04:50:59.296926 140630240360192 py_utils.py:1730] AdjustGradientsWithLpLoss: lm.softmax.weight_13: I0422 04:50:59.297127 140630240360192 py_utils.py:1730] AdjustGradientsWithLpLoss: lm.softmax.weight_14: I0422 04:50:59.297327 140630240360192 py_utils.py:1730] AdjustGradientsWithLpLoss: lm.softmax.weight_15: I0422 04:50:59.297527 140630240360192 py_utils.py:1730] AdjustGradientsWithLpLoss: lm.softmax.weight_2: I0422 04:50:59.297728 140630240360192 py_utils.py:1730] AdjustGradientsWithLpLoss: lm.softmax.weight_3: I0422 04:50:59.297926 140630240360192 py_utils.py:1730] AdjustGradientsWithLpLoss: lm.softmax.weight_4: I0422 04:50:59.298127 140630240360192 py_utils.py:1730] AdjustGradientsWithLpLoss: lm.softmax.weight_5: I0422 04:50:59.298326 140630240360192 py_utils.py:1730] AdjustGradientsWithLpLoss: lm.softmax.weight_6: I0422 04:50:59.298526 140630240360192 py_utils.py:1730] AdjustGradientsWithLpLoss: lm.softmax.weight_7: I0422 04:50:59.298724 140630240360192 py_utils.py:1730] AdjustGradientsWithLpLoss: lm.softmax.weight_8: I0422 04:50:59.298959 140630240360192 py_utils.py:1730] AdjustGradientsWithLpLoss: lm.softmax.weight_9: I0422 04:50:59.299211 140630240360192 py_utils.py:1730] AdjustGradientsWithLpLoss: lm.stack.cell_0.encoder_0.fflayer.fflayer.fc_0.b: I0422 04:50:59.299408 140630240360192 py_utils.py:1730] AdjustGradientsWithLpLoss: lm.stack.cell_0.encoder_0.fflayer.fflayer.fc_0.w: I0422 04:50:59.299622 140630240360192 py_utils.py:1730] AdjustGradientsWithLpLoss: lm.stack.cell_0.encoder_0.fflayer.fflayer.fc_1.b: I0422 04:50:59.299815 140630240360192 py_utils.py:1730] AdjustGradientsWithLpLoss: lm.stack.cell_0.encoder_0.fflayer.fflayer.fc_1.w: I0422 04:50:59.300018 140630240360192 py_utils.py:1730] AdjustGradientsWithLpLoss: lm.stack.cell_0.encoder_0.fflayer.layer_norm.bias: I0422 04:50:59.300208 140630240360192 py_utils.py:1730] AdjustGradientsWithLpLoss: lm.stack.cell_0.encoder_0.fflayer.layer_norm.scale: I0422 04:50:59.300395 140630240360192 py_utils.py:1730] AdjustGradientsWithLpLoss: lm.stack.cell_0.encoder_0.self_atten.atten.atten.per_dim_scale: I0422 04:50:59.300589 140630240360192 py_utils.py:1730] AdjustGradientsWithLpLoss: lm.stack.cell_0.encoder_0.self_atten.atten.ctx_post_proj: I0422 04:50:59.300795 140630240360192 py_utils.py:1730] AdjustGradientsWithLpLoss: lm.stack.cell_0.encoder_0.self_atten.atten.ctx_post_proj_b: I0422 04:50:59.300986 140630240360192 py_utils.py:1730] AdjustGradientsWithLpLoss: lm.stack.cell_0.encoder_0.self_atten.atten.ctx_proj: I0422 04:50:59.301188 140630240360192 py_utils.py:1730] AdjustGradientsWithLpLoss: lm.stack.cell_0.encoder_0.self_atten.atten.ctx_proj_b: I0422 04:50:59.301381 140630240360192 py_utils.py:1730] AdjustGradientsWithLpLoss: lm.stack.cell_0.encoder_0.self_atten.atten.query_proj: I0422 04:50:59.301584 140630240360192 py_utils.py:1730] AdjustGradientsWithLpLoss: lm.stack.cell_0.encoder_0.self_atten.atten.query_proj_b: I0422 04:50:59.301775 140630240360192 py_utils.py:1730] AdjustGradientsWithLpLoss: lm.stack.cell_0.encoder_0.self_atten.atten.source_proj: I0422 04:50:59.301976 140630240360192 py_utils.py:1730] AdjustGradientsWithLpLoss: lm.stack.cell_0.encoder_0.self_atten.atten.source_proj_b: I0422 04:50:59.302166 140630240360192 py_utils.py:1730] AdjustGradientsWithLpLoss: lm.stack.cell_0.encoder_0.self_atten.layer_norm.bias: I0422 04:50:59.302356 140630240360192 py_utils.py:1730] AdjustGradientsWithLpLoss: lm.stack.cell_0.encoder_0.self_atten.layer_norm.scale: I0422 04:50:59.302553 140630240360192 py_utils.py:1730] AdjustGradientsWithLpLoss: lm.stack.cell_0.encoder_1.fflayer.fflayer.fc_0.b: I0422 04:50:59.302745 140630240360192 py_utils.py:1730] AdjustGradientsWithLpLoss: lm.stack.cell_0.encoder_1.fflayer.fflayer.fc_0.w: I0422 04:50:59.302967 140630240360192 py_utils.py:1730] AdjustGradientsWithLpLoss: lm.stack.cell_0.encoder_1.fflayer.fflayer.fc_1.b: I0422 04:50:59.303159 140630240360192 py_utils.py:1730] AdjustGradientsWithLpLoss: lm.stack.cell_0.encoder_1.fflayer.fflayer.fc_1.w: I0422 04:50:59.303360 140630240360192 py_utils.py:1730] AdjustGradientsWithLpLoss: lm.stack.cell_0.encoder_1.fflayer.layer_norm.bias: I0422 04:50:59.303549 140630240360192 py_utils.py:1730] AdjustGradientsWithLpLoss: lm.stack.cell_0.encoder_1.fflayer.layer_norm.scale: I0422 04:50:59.303734 140630240360192 py_utils.py:1730] AdjustGradientsWithLpLoss: lm.stack.cell_0.encoder_1.self_atten.atten.atten.per_dim_scale: I0422 04:50:59.303925 140630240360192 py_utils.py:1730] AdjustGradientsWithLpLoss: lm.stack.cell_0.encoder_1.self_atten.atten.ctx_post_proj: I0422 04:50:59.304128 140630240360192 py_utils.py:1730] AdjustGradientsWithLpLoss: lm.stack.cell_0.encoder_1.self_atten.atten.ctx_post_proj_b: I0422 04:50:59.304318 140630240360192 py_utils.py:1730] AdjustGradientsWithLpLoss: lm.stack.cell_0.encoder_1.self_atten.atten.ctx_proj: I0422 04:50:59.304521 140630240360192 py_utils.py:1730] AdjustGradientsWithLpLoss: lm.stack.cell_0.encoder_1.self_atten.atten.ctx_proj_b: I0422 04:50:59.304713 140630240360192 py_utils.py:1730] AdjustGradientsWithLpLoss: lm.stack.cell_0.encoder_1.self_atten.atten.query_proj: I0422 04:50:59.304915 140630240360192 py_utils.py:1730] AdjustGradientsWithLpLoss: lm.stack.cell_0.encoder_1.self_atten.atten.query_proj_b: I0422 04:50:59.305107 140630240360192 py_utils.py:1730] AdjustGradientsWithLpLoss: lm.stack.cell_0.encoder_1.self_atten.atten.source_proj: I0422 04:50:59.305310 140630240360192 py_utils.py:1730] AdjustGradientsWithLpLoss: lm.stack.cell_0.encoder_1.self_atten.atten.source_proj_b: I0422 04:50:59.305515 140630240360192 py_utils.py:1730] AdjustGradientsWithLpLoss: lm.stack.cell_0.encoder_1.self_atten.layer_norm.bias: I0422 04:50:59.305708 140630240360192 py_utils.py:1730] AdjustGradientsWithLpLoss: lm.stack.cell_0.encoder_1.self_atten.layer_norm.scale: I0422 04:50:59.305897 140630240360192 py_utils.py:1730] AdjustGradientsWithLpLoss: lm.stack.cell_0.encoder_2.fflayer.fflayer.fc_0.b: I0422 04:50:59.306092 140630240360192 py_utils.py:1730] AdjustGradientsWithLpLoss: lm.stack.cell_0.encoder_2.fflayer.fflayer.fc_0.w: I0422 04:50:59.306297 140630240360192 py_utils.py:1730] AdjustGradientsWithLpLoss: lm.stack.cell_0.encoder_2.fflayer.fflayer.fc_1.b: I0422 04:50:59.306484 140630240360192 py_utils.py:1730] AdjustGradientsWithLpLoss: lm.stack.cell_0.encoder_2.fflayer.fflayer.fc_1.w: I0422 04:50:59.306684 140630240360192 py_utils.py:1730] AdjustGradientsWithLpLoss: lm.stack.cell_0.encoder_2.fflayer.layer_norm.bias: I0422 04:50:59.306885 140630240360192 py_utils.py:1730] AdjustGradientsWithLpLoss: lm.stack.cell_0.encoder_2.fflayer.layer_norm.scale: I0422 04:50:59.307085 140630240360192 py_utils.py:1730] AdjustGradientsWithLpLoss: lm.stack.cell_0.encoder_2.self_atten.atten.atten.per_dim_scale: I0422 04:50:59.307264 140630240360192 py_utils.py:1730] AdjustGradientsWithLpLoss: lm.stack.cell_0.encoder_2.self_atten.atten.ctx_post_proj: I0422 04:50:59.307454 140630240360192 py_utils.py:1730] AdjustGradientsWithLpLoss: lm.stack.cell_0.encoder_2.self_atten.atten.ctx_post_proj_b: I0422 04:50:59.307631 140630240360192 py_utils.py:1730] AdjustGradientsWithLpLoss: lm.stack.cell_0.encoder_2.self_atten.atten.ctx_proj: I0422 04:50:59.307817 140630240360192 py_utils.py:1730] AdjustGradientsWithLpLoss: lm.stack.cell_0.encoder_2.self_atten.atten.ctx_proj_b: I0422 04:50:59.307993 140630240360192 py_utils.py:1730] AdjustGradientsWithLpLoss: lm.stack.cell_0.encoder_2.self_atten.atten.query_proj: I0422 04:50:59.308181 140630240360192 py_utils.py:1730] AdjustGradientsWithLpLoss: lm.stack.cell_0.encoder_2.self_atten.atten.query_proj_b: I0422 04:50:59.308367 140630240360192 py_utils.py:1730] AdjustGradientsWithLpLoss: lm.stack.cell_0.encoder_2.self_atten.atten.source_proj: I0422 04:50:59.308554 140630240360192 py_utils.py:1730] AdjustGradientsWithLpLoss: lm.stack.cell_0.encoder_2.self_atten.atten.source_proj_b: I0422 04:50:59.308731 140630240360192 py_utils.py:1730] AdjustGradientsWithLpLoss: lm.stack.cell_0.encoder_2.self_atten.layer_norm.bias: I0422 04:50:59.308906 140630240360192 py_utils.py:1730] AdjustGradientsWithLpLoss: lm.stack.cell_0.encoder_2.self_atten.layer_norm.scale: I0422 04:50:59.309081 140630240360192 py_utils.py:1730] AdjustGradientsWithLpLoss: lm.stack.cell_0.encoder_3.fflayer.fflayer.fc_0.b: I0422 04:50:59.309254 140630240360192 py_utils.py:1730] AdjustGradientsWithLpLoss: lm.stack.cell_0.encoder_3.fflayer.fflayer.fc_0.w: I0422 04:50:59.309442 140630240360192 py_utils.py:1730] AdjustGradientsWithLpLoss: lm.stack.cell_0.encoder_3.fflayer.fflayer.fc_1.b: I0422 04:50:59.309617 140630240360192 py_utils.py:1730] AdjustGradientsWithLpLoss: lm.stack.cell_0.encoder_3.fflayer.fflayer.fc_1.w: I0422 04:50:59.309823 140630240360192 py_utils.py:1730] AdjustGradientsWithLpLoss: lm.stack.cell_0.encoder_3.fflayer.layer_norm.bias: I0422 04:50:59.309999 140630240360192 py_utils.py:1730] AdjustGradientsWithLpLoss: lm.stack.cell_0.encoder_3.fflayer.layer_norm.scale: I0422 04:50:59.310173 140630240360192 py_utils.py:1730] AdjustGradientsWithLpLoss: lm.stack.cell_0.encoder_3.self_atten.atten.atten.per_dim_scale: I0422 04:50:59.310352 140630240360192 py_utils.py:1730] AdjustGradientsWithLpLoss: lm.stack.cell_0.encoder_3.self_atten.atten.ctx_post_proj: I0422 04:50:59.310542 140630240360192 py_utils.py:1730] AdjustGradientsWithLpLoss: lm.stack.cell_0.encoder_3.self_atten.atten.ctx_post_proj_b: I0422 04:50:59.310728 140630240360192 py_utils.py:1730] AdjustGradientsWithLpLoss: lm.stack.cell_0.encoder_3.self_atten.atten.ctx_proj: I0422 04:50:59.311031 140630240360192 py_utils.py:1730] AdjustGradientsWithLpLoss: lm.stack.cell_0.encoder_3.self_atten.atten.ctx_proj_b: I0422 04:50:59.311266 140630240360192 py_utils.py:1730] AdjustGradientsWithLpLoss: lm.stack.cell_0.encoder_3.self_atten.atten.query_proj: I0422 04:50:59.311465 140630240360192 py_utils.py:1730] AdjustGradientsWithLpLoss: lm.stack.cell_0.encoder_3.self_atten.atten.query_proj_b: I0422 04:50:59.311645 140630240360192 py_utils.py:1730] AdjustGradientsWithLpLoss: lm.stack.cell_0.encoder_3.self_atten.atten.source_proj: I0422 04:50:59.311836 140630240360192 py_utils.py:1730] AdjustGradientsWithLpLoss: lm.stack.cell_0.encoder_3.self_atten.atten.source_proj_b: I0422 04:50:59.312012 140630240360192 py_utils.py:1730] AdjustGradientsWithLpLoss: lm.stack.cell_0.encoder_3.self_atten.layer_norm.bias: I0422 04:50:59.312190 140630240360192 py_utils.py:1730] AdjustGradientsWithLpLoss: lm.stack.cell_0.encoder_3.self_atten.layer_norm.scale: I0422 04:50:59.312365 140630240360192 py_utils.py:1730] AdjustGradientsWithLpLoss: lm.stack.cell_0.encoder_4.fflayer.fflayer.fc_0.b: I0422 04:50:59.312540 140630240360192 py_utils.py:1730] AdjustGradientsWithLpLoss: lm.stack.cell_0.encoder_4.fflayer.fflayer.fc_0.w: I0422 04:50:59.312726 140630240360192 py_utils.py:1730] AdjustGradientsWithLpLoss: lm.stack.cell_0.encoder_4.fflayer.fflayer.fc_1.b: I0422 04:50:59.312901 140630240360192 py_utils.py:1730] AdjustGradientsWithLpLoss: lm.stack.cell_0.encoder_4.fflayer.fflayer.fc_1.w: I0422 04:50:59.313087 140630240360192 py_utils.py:1730] AdjustGradientsWithLpLoss: lm.stack.cell_0.encoder_4.fflayer.layer_norm.bias: I0422 04:50:59.313262 140630240360192 py_utils.py:1730] AdjustGradientsWithLpLoss: lm.stack.cell_0.encoder_4.fflayer.layer_norm.scale: I0422 04:50:59.313436 140630240360192 py_utils.py:1730] AdjustGradientsWithLpLoss: lm.stack.cell_0.encoder_4.self_atten.atten.atten.per_dim_scale: I0422 04:50:59.313613 140630240360192 py_utils.py:1730] AdjustGradientsWithLpLoss: lm.stack.cell_0.encoder_4.self_atten.atten.ctx_post_proj: I0422 04:50:59.313812 140630240360192 py_utils.py:1730] AdjustGradientsWithLpLoss: lm.stack.cell_0.encoder_4.self_atten.atten.ctx_post_proj_b: I0422 04:50:59.313992 140630240360192 py_utils.py:1730] AdjustGradientsWithLpLoss: lm.stack.cell_0.encoder_4.self_atten.atten.ctx_proj: I0422 04:50:59.314182 140630240360192 py_utils.py:1730] AdjustGradientsWithLpLoss: lm.stack.cell_0.encoder_4.self_atten.atten.ctx_proj_b: I0422 04:50:59.314361 140630240360192 py_utils.py:1730] AdjustGradientsWithLpLoss: lm.stack.cell_0.encoder_4.self_atten.atten.query_proj: I0422 04:50:59.314548 140630240360192 py_utils.py:1730] AdjustGradientsWithLpLoss: lm.stack.cell_0.encoder_4.self_atten.atten.query_proj_b: I0422 04:50:59.314728 140630240360192 py_utils.py:1730] AdjustGradientsWithLpLoss: lm.stack.cell_0.encoder_4.self_atten.atten.source_proj: I0422 04:50:59.314930 140630240360192 py_utils.py:1730] AdjustGradientsWithLpLoss: lm.stack.cell_0.encoder_4.self_atten.atten.source_proj_b: I0422 04:50:59.315112 140630240360192 py_utils.py:1730] AdjustGradientsWithLpLoss: lm.stack.cell_0.encoder_4.self_atten.layer_norm.bias: I0422 04:50:59.315289 140630240360192 py_utils.py:1730] AdjustGradientsWithLpLoss: lm.stack.cell_0.encoder_4.self_atten.layer_norm.scale: I0422 04:50:59.315464 140630240360192 py_utils.py:1730] AdjustGradientsWithLpLoss: lm.stack.cell_0.encoder_5.fflayer.fflayer.fc_0.b: I0422 04:50:59.315638 140630240360192 py_utils.py:1730] AdjustGradientsWithLpLoss: lm.stack.cell_0.encoder_5.fflayer.fflayer.fc_0.w: I0422 04:50:59.315824 140630240360192 py_utils.py:1730] AdjustGradientsWithLpLoss: lm.stack.cell_0.encoder_5.fflayer.fflayer.fc_1.b: I0422 04:50:59.315999 140630240360192 py_utils.py:1730] AdjustGradientsWithLpLoss: lm.stack.cell_0.encoder_5.fflayer.fflayer.fc_1.w: I0422 04:50:59.316189 140630240360192 py_utils.py:1730] AdjustGradientsWithLpLoss: lm.stack.cell_0.encoder_5.fflayer.layer_norm.bias: I0422 04:50:59.316364 140630240360192 py_utils.py:1730] AdjustGradientsWithLpLoss: lm.stack.cell_0.encoder_5.fflayer.layer_norm.scale: I0422 04:50:59.316545 140630240360192 py_utils.py:1730] AdjustGradientsWithLpLoss: lm.stack.cell_0.encoder_5.self_atten.atten.atten.per_dim_scale: I0422 04:50:59.316724 140630240360192 py_utils.py:1730] AdjustGradientsWithLpLoss: lm.stack.cell_0.encoder_5.self_atten.atten.ctx_post_proj: I0422 04:50:59.316915 140630240360192 py_utils.py:1730] AdjustGradientsWithLpLoss: lm.stack.cell_0.encoder_5.self_atten.atten.ctx_post_proj_b: I0422 04:50:59.317092 140630240360192 py_utils.py:1730] AdjustGradientsWithLpLoss: lm.stack.cell_0.encoder_5.self_atten.atten.ctx_proj: I0422 04:50:59.317282 140630240360192 py_utils.py:1730] AdjustGradientsWithLpLoss: lm.stack.cell_0.encoder_5.self_atten.atten.ctx_proj_b: I0422 04:50:59.317461 140630240360192 py_utils.py:1730] AdjustGradientsWithLpLoss: lm.stack.cell_0.encoder_5.self_atten.atten.query_proj: I0422 04:50:59.317652 140630240360192 py_utils.py:1730] AdjustGradientsWithLpLoss: lm.stack.cell_0.encoder_5.self_atten.atten.query_proj_b: I0422 04:50:59.317830 140630240360192 py_utils.py:1730] AdjustGradientsWithLpLoss: lm.stack.cell_0.encoder_5.self_atten.atten.source_proj: I0422 04:50:59.318018 140630240360192 py_utils.py:1730] AdjustGradientsWithLpLoss: lm.stack.cell_0.encoder_5.self_atten.atten.source_proj_b: I0422 04:50:59.318193 140630240360192 py_utils.py:1730] AdjustGradientsWithLpLoss: lm.stack.cell_0.encoder_5.self_atten.layer_norm.bias: I0422 04:50:59.318371 140630240360192 py_utils.py:1730] AdjustGradientsWithLpLoss: lm.stack.cell_0.encoder_5.self_atten.layer_norm.scale: W0422 04:51:03.257190 140630240360192 deprecation_wrapper.py:119] From /tmp/lingvo/bazel-bin/lingvo/trainer.runfiles/__main__/lingvo/core/optimizer.py:179: The name tf.train.AdamOptimizer is deprecated. Please use tf.compat.v1.train.AdamOptimizer instead. I0422 04:51:03.259912 140630240360192 cluster.py:447] Place variable beta1_power on /job:local/replica:0/task:0/device:CPU:0 1733017612 I0422 04:51:03.262640 140630240360192 cluster.py:447] Place variable beta2_power on /job:local/replica:0/task:0/device:CPU:0 1733017616 I0422 04:51:04.736705 140630240360192 cluster.py:447] Place variable 1bwds_wpm_level_lm/total_samples/var on /job:local/replica:0/task:0/device:CPU:0 1733017624 I0422 04:51:04.738738 140630240360192 py_utils.py:1220] Creating var 1bwds_wpm_level_lm/total_samples/var:0 shape=() on device /job:local/replica:0/task:0/device:CPU:0 I0422 04:51:04.747314 140630240360192 cluster.py:447] Place variable total_nan_gradients/var on /job:local/replica:0/task:0/device:CPU:0 1733017632 I0422 04:51:04.749025 140630240360192 py_utils.py:1220] Creating var total_nan_gradients/var:0 shape=() on device /job:local/replica:0/task:0/device:CPU:0 W0422 04:51:04.785166 140630240360192 deprecation_wrapper.py:119] From /tmp/lingvo/bazel-bin/lingvo/trainer.runfiles/__main__/lingvo/base_runner.py:156: The name tf.train.Saver is deprecated. Please use tf.compat.v1.train.Saver instead. W0422 04:51:05.088041 140630240360192 deprecation_wrapper.py:119] From /tmp/lingvo/bazel-bin/lingvo/trainer.runfiles/__main__/lingvo/trainer.py:198: The name tf.summary.merge_all is deprecated. Please use tf.compat.v1.summary.merge_all instead. I0422 04:51:05.346396 140630240360192 py_utils.py:1267] MODEL ANALYSIS: I0422 04:51:05.346535 140630240360192 py_utils.py:1267] MODEL ANALYSIS: _task.lm.emb.wm (32000, 2048) 65536000 1bwds_wpm_level_lm/transformerlm/emb/wm/var I0422 04:51:05.346601 140630240360192 py_utils.py:1267] MODEL ANALYSIS: _task.lm.softmax.bias_0 (2000,) 2000 1bwds_wpm_level_lm/transformerlm/softmax/bias_0/var I0422 04:51:05.346657 140630240360192 py_utils.py:1267] MODEL ANALYSIS: _task.lm.softmax.bias_1 (2000,) 2000 1bwds_wpm_level_lm/transformerlm/softmax/bias_1/var I0422 04:51:05.346710 140630240360192 py_utils.py:1267] MODEL ANALYSIS: _task.lm.softmax.bias_10 (2000,) 2000 1bwds_wpm_level_lm/transformerlm/softmax/bias_10/var I0422 04:51:05.346764 140630240360192 py_utils.py:1267] MODEL ANALYSIS: _task.lm.softmax.bias_11 (2000,) 2000 1bwds_wpm_level_lm/transformerlm/softmax/bias_11/var I0422 04:51:05.346815 140630240360192 py_utils.py:1267] MODEL ANALYSIS: _task.lm.softmax.bias_12 (2000,) 2000 1bwds_wpm_level_lm/transformerlm/softmax/bias_12/var I0422 04:51:05.346867 140630240360192 py_utils.py:1267] MODEL ANALYSIS: _task.lm.softmax.bias_13 (2000,) 2000 1bwds_wpm_level_lm/transformerlm/softmax/bias_13/var I0422 04:51:05.346926 140630240360192 py_utils.py:1267] MODEL ANALYSIS: _task.lm.softmax.bias_14 (2000,) 2000 1bwds_wpm_level_lm/transformerlm/softmax/bias_14/var I0422 04:51:05.346977 140630240360192 py_utils.py:1267] MODEL ANALYSIS: _task.lm.softmax.bias_15 (2000,) 2000 1bwds_wpm_level_lm/transformerlm/softmax/bias_15/var I0422 04:51:05.347028 140630240360192 py_utils.py:1267] MODEL ANALYSIS: _task.lm.softmax.bias_2 (2000,) 2000 1bwds_wpm_level_lm/transformerlm/softmax/bias_2/var I0422 04:51:05.347079 140630240360192 py_utils.py:1267] MODEL ANALYSIS: _task.lm.softmax.bias_3 (2000,) 2000 1bwds_wpm_level_lm/transformerlm/softmax/bias_3/var I0422 04:51:05.347129 140630240360192 py_utils.py:1267] MODEL ANALYSIS: _task.lm.softmax.bias_4 (2000,) 2000 1bwds_wpm_level_lm/transformerlm/softmax/bias_4/var I0422 04:51:05.347178 140630240360192 py_utils.py:1267] MODEL ANALYSIS: _task.lm.softmax.bias_5 (2000,) 2000 1bwds_wpm_level_lm/transformerlm/softmax/bias_5/var I0422 04:51:05.347229 140630240360192 py_utils.py:1267] MODEL ANALYSIS: _task.lm.softmax.bias_6 (2000,) 2000 1bwds_wpm_level_lm/transformerlm/softmax/bias_6/var I0422 04:51:05.347280 140630240360192 py_utils.py:1267] MODEL ANALYSIS: _task.lm.softmax.bias_7 (2000,) 2000 1bwds_wpm_level_lm/transformerlm/softmax/bias_7/var I0422 04:51:05.347342 140630240360192 py_utils.py:1267] MODEL ANALYSIS: _task.lm.softmax.bias_8 (2000,) 2000 1bwds_wpm_level_lm/transformerlm/softmax/bias_8/var I0422 04:51:05.347393 140630240360192 py_utils.py:1267] MODEL ANALYSIS: _task.lm.softmax.bias_9 (2000,) 2000 1bwds_wpm_level_lm/transformerlm/softmax/bias_9/var I0422 04:51:05.347444 140630240360192 py_utils.py:1267] MODEL ANALYSIS: _task.lm.softmax.weight_0 (2048, 2000) 4096000 1bwds_wpm_level_lm/transformerlm/softmax/weight_0/var I0422 04:51:05.347493 140630240360192 py_utils.py:1267] MODEL ANALYSIS: _task.lm.softmax.weight_1 (2048, 2000) 4096000 1bwds_wpm_level_lm/transformerlm/softmax/weight_1/var I0422 04:51:05.347543 140630240360192 py_utils.py:1267] MODEL ANALYSIS: _task.lm.softmax.weight_10 (2048, 2000) 4096000 1bwds_wpm_level_lm/transformerlm/softmax/weight_10/var I0422 04:51:05.347593 140630240360192 py_utils.py:1267] MODEL ANALYSIS: _task.lm.softmax.weight_11 (2048, 2000) 4096000 1bwds_wpm_level_lm/transformerlm/softmax/weight_11/var I0422 04:51:05.347644 140630240360192 py_utils.py:1267] MODEL ANALYSIS: _task.lm.softmax.weight_12 (2048, 2000) 4096000 1bwds_wpm_level_lm/transformerlm/softmax/weight_12/var I0422 04:51:05.347693 140630240360192 py_utils.py:1267] MODEL ANALYSIS: _task.lm.softmax.weight_13 (2048, 2000) 4096000 1bwds_wpm_level_lm/transformerlm/softmax/weight_13/var I0422 04:51:05.347743 140630240360192 py_utils.py:1267] MODEL ANALYSIS: _task.lm.softmax.weight_14 (2048, 2000) 4096000 1bwds_wpm_level_lm/transformerlm/softmax/weight_14/var I0422 04:51:05.347795 140630240360192 py_utils.py:1267] MODEL ANALYSIS: _task.lm.softmax.weight_15 (2048, 2000) 4096000 1bwds_wpm_level_lm/transformerlm/softmax/weight_15/var I0422 04:51:05.347845 140630240360192 py_utils.py:1267] MODEL ANALYSIS: _task.lm.softmax.weight_2 (2048, 2000) 4096000 1bwds_wpm_level_lm/transformerlm/softmax/weight_2/var I0422 04:51:05.347896 140630240360192 py_utils.py:1267] MODEL ANALYSIS: _task.lm.softmax.weight_3 (2048, 2000) 4096000 1bwds_wpm_level_lm/transformerlm/softmax/weight_3/var I0422 04:51:05.347980 140630240360192 py_utils.py:1267] MODEL ANALYSIS: _task.lm.softmax.weight_4 (2048, 2000) 4096000 1bwds_wpm_level_lm/transformerlm/softmax/weight_4/var I0422 04:51:05.348037 140630240360192 py_utils.py:1267] MODEL ANALYSIS: _task.lm.softmax.weight_5 (2048, 2000) 4096000 1bwds_wpm_level_lm/transformerlm/softmax/weight_5/var I0422 04:51:05.348092 140630240360192 py_utils.py:1267] MODEL ANALYSIS: _task.lm.softmax.weight_6 (2048, 2000) 4096000 1bwds_wpm_level_lm/transformerlm/softmax/weight_6/var I0422 04:51:05.348146 140630240360192 py_utils.py:1267] MODEL ANALYSIS: _task.lm.softmax.weight_7 (2048, 2000) 4096000 1bwds_wpm_level_lm/transformerlm/softmax/weight_7/var I0422 04:51:05.348201 140630240360192 py_utils.py:1267] MODEL ANALYSIS: _task.lm.softmax.weight_8 (2048, 2000) 4096000 1bwds_wpm_level_lm/transformerlm/softmax/weight_8/var I0422 04:51:05.348261 140630240360192 py_utils.py:1267] MODEL ANALYSIS: _task.lm.softmax.weight_9 (2048, 2000) 4096000 1bwds_wpm_level_lm/transformerlm/softmax/weight_9/var I0422 04:51:05.348345 140630240360192 py_utils.py:1267] MODEL ANALYSIS: _task.lm.stack.cell_0.encoder_0.fflayer.fflayer.fc[0].b (8192,) 8192 1bwds_wpm_level_lm/transformerlm/encoder_0/tr_fflayer/fflayer/fflayer_0/b/var I0422 04:51:05.348396 140630240360192 py_utils.py:1267] MODEL ANALYSIS: _task.lm.stack.cell_0.encoder_0.fflayer.fflayer.fc[0].w (2048, 8192) 16777216 1bwds_wpm_level_lm/transformerlm/encoder_0/tr_fflayer/fflayer/fflayer_0/w/var I0422 04:51:05.348447 140630240360192 py_utils.py:1267] MODEL ANALYSIS: _task.lm.stack.cell_0.encoder_0.fflayer.fflayer.fc[1].b (2048,) 2048 1bwds_wpm_level_lm/transformerlm/encoder_0/tr_fflayer/fflayer/fflayer_1/b/var I0422 04:51:05.348496 140630240360192 py_utils.py:1267] MODEL ANALYSIS: _task.lm.stack.cell_0.encoder_0.fflayer.fflayer.fc[1].w (8192, 2048) 16777216 1bwds_wpm_level_lm/transformerlm/encoder_0/tr_fflayer/fflayer/fflayer_1/w/var I0422 04:51:05.348546 140630240360192 py_utils.py:1267] MODEL ANALYSIS: _task.lm.stack.cell_0.encoder_0.fflayer.layer_norm.bias (2048,) 2048 1bwds_wpm_level_lm/transformerlm/encoder_0/tr_fflayer/fflayer_ln/bias/var I0422 04:51:05.348597 140630240360192 py_utils.py:1267] MODEL ANALYSIS: _task.lm.stack.cell_0.encoder_0.fflayer.layer_norm.scale (2048,) 2048 1bwds_wpm_level_lm/transformerlm/encoder_0/tr_fflayer/fflayer_ln/scale/var I0422 04:51:05.348647 140630240360192 py_utils.py:1267] MODEL ANALYSIS: _task.lm.stack.cell_0.encoder_0.self_atten.atten.atten.per_dim_scale (128,) 128 1bwds_wpm_level_lm/transformerlm/encoder_0/multihead_self_atten/multihead_atten/inner_att/per_dim_scale/var I0422 04:51:05.348696 140630240360192 py_utils.py:1267] MODEL ANALYSIS: _task.lm.stack.cell_0.encoder_0.self_atten.atten.ctx_post_proj (2048, 2048) 4194304 1bwds_wpm_level_lm/transformerlm/encoder_0/multihead_self_atten/multihead_atten/ctx_post_proj/var I0422 04:51:05.348747 140630240360192 py_utils.py:1267] MODEL ANALYSIS: _task.lm.stack.cell_0.encoder_0.self_atten.atten.ctx_post_proj_b (2048,) 2048 1bwds_wpm_level_lm/transformerlm/encoder_0/multihead_self_atten/multihead_atten/ctx_post_proj_b/var I0422 04:51:05.348797 140630240360192 py_utils.py:1267] MODEL ANALYSIS: _task.lm.stack.cell_0.encoder_0.self_atten.atten.ctx_proj (2048, 2048) 4194304 1bwds_wpm_level_lm/transformerlm/encoder_0/multihead_self_atten/multihead_atten/ctx_proj/var I0422 04:51:05.348848 140630240360192 py_utils.py:1267] MODEL ANALYSIS: _task.lm.stack.cell_0.encoder_0.self_atten.atten.ctx_proj_b (2048,) 2048 1bwds_wpm_level_lm/transformerlm/encoder_0/multihead_self_atten/multihead_atten/ctx_proj_b/var I0422 04:51:05.348897 140630240360192 py_utils.py:1267] MODEL ANALYSIS: _task.lm.stack.cell_0.encoder_0.self_atten.atten.query_proj (2048, 2048) 4194304 1bwds_wpm_level_lm/transformerlm/encoder_0/multihead_self_atten/multihead_atten/query_proj/var I0422 04:51:05.348948 140630240360192 py_utils.py:1267] MODEL ANALYSIS: _task.lm.stack.cell_0.encoder_0.self_atten.atten.query_proj_b (2048,) 2048 1bwds_wpm_level_lm/transformerlm/encoder_0/multihead_self_atten/multihead_atten/query_proj_b/var I0422 04:51:05.348999 140630240360192 py_utils.py:1267] MODEL ANALYSIS: _task.lm.stack.cell_0.encoder_0.self_atten.atten.source_proj (2048, 2048) 4194304 1bwds_wpm_level_lm/transformerlm/encoder_0/multihead_self_atten/multihead_atten/source_proj/var I0422 04:51:05.349049 140630240360192 py_utils.py:1267] MODEL ANALYSIS: _task.lm.stack.cell_0.encoder_0.self_atten.atten.source_proj_b (2048,) 2048 1bwds_wpm_level_lm/transformerlm/encoder_0/multihead_self_atten/multihead_atten/source_proj_b/var I0422 04:51:05.349102 140630240360192 py_utils.py:1267] MODEL ANALYSIS: _task.lm.stack.cell_0.encoder_0.self_atten.layer_norm.bias (2048,) 2048 1bwds_wpm_level_lm/transformerlm/encoder_0/multihead_self_atten/atten_ln/bias/var I0422 04:51:05.349154 140630240360192 py_utils.py:1267] MODEL ANALYSIS: _task.lm.stack.cell_0.encoder_0.self_atten.layer_norm.scale (2048,) 2048 1bwds_wpm_level_lm/transformerlm/encoder_0/multihead_self_atten/atten_ln/scale/var I0422 04:51:05.349205 140630240360192 py_utils.py:1267] MODEL ANALYSIS: _task.lm.stack.cell_0.encoder_1.fflayer.fflayer.fc[0].b (8192,) 8192 1bwds_wpm_level_lm/transformerlm/encoder_1/tr_fflayer/fflayer/fflayer_0/b/var I0422 04:51:05.349256 140630240360192 py_utils.py:1267] MODEL ANALYSIS: _task.lm.stack.cell_0.encoder_1.fflayer.fflayer.fc[0].w (2048, 8192) 16777216 1bwds_wpm_level_lm/transformerlm/encoder_1/tr_fflayer/fflayer/fflayer_0/w/var I0422 04:51:05.349306 140630240360192 py_utils.py:1267] MODEL ANALYSIS: _task.lm.stack.cell_0.encoder_1.fflayer.fflayer.fc[1].b (2048,) 2048 1bwds_wpm_level_lm/transformerlm/encoder_1/tr_fflayer/fflayer/fflayer_1/b/var I0422 04:51:05.349355 140630240360192 py_utils.py:1267] MODEL ANALYSIS: _task.lm.stack.cell_0.encoder_1.fflayer.fflayer.fc[1].w (8192, 2048) 16777216 1bwds_wpm_level_lm/transformerlm/encoder_1/tr_fflayer/fflayer/fflayer_1/w/var I0422 04:51:05.349406 140630240360192 py_utils.py:1267] MODEL ANALYSIS: _task.lm.stack.cell_0.encoder_1.fflayer.layer_norm.bias (2048,) 2048 1bwds_wpm_level_lm/transformerlm/encoder_1/tr_fflayer/fflayer_ln/bias/var I0422 04:51:05.349457 140630240360192 py_utils.py:1267] MODEL ANALYSIS: _task.lm.stack.cell_0.encoder_1.fflayer.layer_norm.scale (2048,) 2048 1bwds_wpm_level_lm/transformerlm/encoder_1/tr_fflayer/fflayer_ln/scale/var I0422 04:51:05.349507 140630240360192 py_utils.py:1267] MODEL ANALYSIS: _task.lm.stack.cell_0.encoder_1.self_atten.atten.atten.per_dim_scale (128,) 128 1bwds_wpm_level_lm/transformerlm/encoder_1/multihead_self_atten/multihead_atten/inner_att/per_dim_scale/var I0422 04:51:05.349555 140630240360192 py_utils.py:1267] MODEL ANALYSIS: _task.lm.stack.cell_0.encoder_1.self_atten.atten.ctx_post_proj (2048, 2048) 4194304 1bwds_wpm_level_lm/transformerlm/encoder_1/multihead_self_atten/multihead_atten/ctx_post_proj/var I0422 04:51:05.349606 140630240360192 py_utils.py:1267] MODEL ANALYSIS: _task.lm.stack.cell_0.encoder_1.self_atten.atten.ctx_post_proj_b (2048,) 2048 1bwds_wpm_level_lm/transformerlm/encoder_1/multihead_self_atten/multihead_atten/ctx_post_proj_b/var I0422 04:51:05.349656 140630240360192 py_utils.py:1267] MODEL ANALYSIS: _task.lm.stack.cell_0.encoder_1.self_atten.atten.ctx_proj (2048, 2048) 4194304 1bwds_wpm_level_lm/transformerlm/encoder_1/multihead_self_atten/multihead_atten/ctx_proj/var I0422 04:51:05.349705 140630240360192 py_utils.py:1267] MODEL ANALYSIS: _task.lm.stack.cell_0.encoder_1.self_atten.atten.ctx_proj_b (2048,) 2048 1bwds_wpm_level_lm/transformerlm/encoder_1/multihead_self_atten/multihead_atten/ctx_proj_b/var I0422 04:51:05.349756 140630240360192 py_utils.py:1267] MODEL ANALYSIS: _task.lm.stack.cell_0.encoder_1.self_atten.atten.query_proj (2048, 2048) 4194304 1bwds_wpm_level_lm/transformerlm/encoder_1/multihead_self_atten/multihead_atten/query_proj/var I0422 04:51:05.349805 140630240360192 py_utils.py:1267] MODEL ANALYSIS: _task.lm.stack.cell_0.encoder_1.self_atten.atten.query_proj_b (2048,) 2048 1bwds_wpm_level_lm/transformerlm/encoder_1/multihead_self_atten/multihead_atten/query_proj_b/var I0422 04:51:05.349854 140630240360192 py_utils.py:1267] MODEL ANALYSIS: _task.lm.stack.cell_0.encoder_1.self_atten.atten.source_proj (2048, 2048) 4194304 1bwds_wpm_level_lm/transformerlm/encoder_1/multihead_self_atten/multihead_atten/source_proj/var I0422 04:51:05.349910 140630240360192 py_utils.py:1267] MODEL ANALYSIS: _task.lm.stack.cell_0.encoder_1.self_atten.atten.source_proj_b (2048,) 2048 1bwds_wpm_level_lm/transformerlm/encoder_1/multihead_self_atten/multihead_atten/source_proj_b/var I0422 04:51:05.349961 140630240360192 py_utils.py:1267] MODEL ANALYSIS: _task.lm.stack.cell_0.encoder_1.self_atten.layer_norm.bias (2048,) 2048 1bwds_wpm_level_lm/transformerlm/encoder_1/multihead_self_atten/atten_ln/bias/var I0422 04:51:05.350012 140630240360192 py_utils.py:1267] MODEL ANALYSIS: _task.lm.stack.cell_0.encoder_1.self_atten.layer_norm.scale (2048,) 2048 1bwds_wpm_level_lm/transformerlm/encoder_1/multihead_self_atten/atten_ln/scale/var I0422 04:51:05.350061 140630240360192 py_utils.py:1267] MODEL ANALYSIS: _task.lm.stack.cell_0.encoder_2.fflayer.fflayer.fc[0].b (8192,) 8192 1bwds_wpm_level_lm/transformerlm/encoder_2/tr_fflayer/fflayer/fflayer_0/b/var I0422 04:51:05.350111 140630240360192 py_utils.py:1267] MODEL ANALYSIS: _task.lm.stack.cell_0.encoder_2.fflayer.fflayer.fc[0].w (2048, 8192) 16777216 1bwds_wpm_level_lm/transformerlm/encoder_2/tr_fflayer/fflayer/fflayer_0/w/var I0422 04:51:05.350162 140630240360192 py_utils.py:1267] MODEL ANALYSIS: _task.lm.stack.cell_0.encoder_2.fflayer.fflayer.fc[1].b (2048,) 2048 1bwds_wpm_level_lm/transformerlm/encoder_2/tr_fflayer/fflayer/fflayer_1/b/var I0422 04:51:05.350212 140630240360192 py_utils.py:1267] MODEL ANALYSIS: _task.lm.stack.cell_0.encoder_2.fflayer.fflayer.fc[1].w (8192, 2048) 16777216 1bwds_wpm_level_lm/transformerlm/encoder_2/tr_fflayer/fflayer/fflayer_1/w/var I0422 04:51:05.350261 140630240360192 py_utils.py:1267] MODEL ANALYSIS: _task.lm.stack.cell_0.encoder_2.fflayer.layer_norm.bias (2048,) 2048 1bwds_wpm_level_lm/transformerlm/encoder_2/tr_fflayer/fflayer_ln/bias/var I0422 04:51:05.350311 140630240360192 py_utils.py:1267] MODEL ANALYSIS: _task.lm.stack.cell_0.encoder_2.fflayer.layer_norm.scale (2048,) 2048 1bwds_wpm_level_lm/transformerlm/encoder_2/tr_fflayer/fflayer_ln/scale/var I0422 04:51:05.350362 140630240360192 py_utils.py:1267] MODEL ANALYSIS: _task.lm.stack.cell_0.encoder_2.self_atten.atten.atten.per_dim_scale (128,) 128 1bwds_wpm_level_lm/transformerlm/encoder_2/multihead_self_atten/multihead_atten/inner_att/per_dim_scale/var I0422 04:51:05.350411 140630240360192 py_utils.py:1267] MODEL ANALYSIS: _task.lm.stack.cell_0.encoder_2.self_atten.atten.ctx_post_proj (2048, 2048) 4194304 1bwds_wpm_level_lm/transformerlm/encoder_2/multihead_self_atten/multihead_atten/ctx_post_proj/var I0422 04:51:05.350461 140630240360192 py_utils.py:1267] MODEL ANALYSIS: _task.lm.stack.cell_0.encoder_2.self_atten.atten.ctx_post_proj_b (2048,) 2048 1bwds_wpm_level_lm/transformerlm/encoder_2/multihead_self_atten/multihead_atten/ctx_post_proj_b/var I0422 04:51:05.350512 140630240360192 py_utils.py:1267] MODEL ANALYSIS: _task.lm.stack.cell_0.encoder_2.self_atten.atten.ctx_proj (2048, 2048) 4194304 1bwds_wpm_level_lm/transformerlm/encoder_2/multihead_self_atten/multihead_atten/ctx_proj/var I0422 04:51:05.350563 140630240360192 py_utils.py:1267] MODEL ANALYSIS: _task.lm.stack.cell_0.encoder_2.self_atten.atten.ctx_proj_b (2048,) 2048 1bwds_wpm_level_lm/transformerlm/encoder_2/multihead_self_atten/multihead_atten/ctx_proj_b/var I0422 04:51:05.350613 140630240360192 py_utils.py:1267] MODEL ANALYSIS: _task.lm.stack.cell_0.encoder_2.self_atten.atten.query_proj (2048, 2048) 4194304 1bwds_wpm_level_lm/transformerlm/encoder_2/multihead_self_atten/multihead_atten/query_proj/var I0422 04:51:05.350666 140630240360192 py_utils.py:1267] MODEL ANALYSIS: _task.lm.stack.cell_0.encoder_2.self_atten.atten.query_proj_b (2048,) 2048 1bwds_wpm_level_lm/transformerlm/encoder_2/multihead_self_atten/multihead_atten/query_proj_b/var I0422 04:51:05.350718 140630240360192 py_utils.py:1267] MODEL ANALYSIS: _task.lm.stack.cell_0.encoder_2.self_atten.atten.source_proj (2048, 2048) 4194304 1bwds_wpm_level_lm/transformerlm/encoder_2/multihead_self_atten/multihead_atten/source_proj/var I0422 04:51:05.350768 140630240360192 py_utils.py:1267] MODEL ANALYSIS: _task.lm.stack.cell_0.encoder_2.self_atten.atten.source_proj_b (2048,) 2048 1bwds_wpm_level_lm/transformerlm/encoder_2/multihead_self_atten/multihead_atten/source_proj_b/var I0422 04:51:05.350820 140630240360192 py_utils.py:1267] MODEL ANALYSIS: _task.lm.stack.cell_0.encoder_2.self_atten.layer_norm.bias (2048,) 2048 1bwds_wpm_level_lm/transformerlm/encoder_2/multihead_self_atten/atten_ln/bias/var I0422 04:51:05.350874 140630240360192 py_utils.py:1267] MODEL ANALYSIS: _task.lm.stack.cell_0.encoder_2.self_atten.layer_norm.scale (2048,) 2048 1bwds_wpm_level_lm/transformerlm/encoder_2/multihead_self_atten/atten_ln/scale/var I0422 04:51:05.350928 140630240360192 py_utils.py:1267] MODEL ANALYSIS: _task.lm.stack.cell_0.encoder_3.fflayer.fflayer.fc[0].b (8192,) 8192 1bwds_wpm_level_lm/transformerlm/encoder_3/tr_fflayer/fflayer/fflayer_0/b/var I0422 04:51:05.350979 140630240360192 py_utils.py:1267] MODEL ANALYSIS: _task.lm.stack.cell_0.encoder_3.fflayer.fflayer.fc[0].w (2048, 8192) 16777216 1bwds_wpm_level_lm/transformerlm/encoder_3/tr_fflayer/fflayer/fflayer_0/w/var I0422 04:51:05.351030 140630240360192 py_utils.py:1267] MODEL ANALYSIS: _task.lm.stack.cell_0.encoder_3.fflayer.fflayer.fc[1].b (2048,) 2048 1bwds_wpm_level_lm/transformerlm/encoder_3/tr_fflayer/fflayer/fflayer_1/b/var I0422 04:51:05.351078 140630240360192 py_utils.py:1267] MODEL ANALYSIS: _task.lm.stack.cell_0.encoder_3.fflayer.fflayer.fc[1].w (8192, 2048) 16777216 1bwds_wpm_level_lm/transformerlm/encoder_3/tr_fflayer/fflayer/fflayer_1/w/var I0422 04:51:05.351129 140630240360192 py_utils.py:1267] MODEL ANALYSIS: _task.lm.stack.cell_0.encoder_3.fflayer.layer_norm.bias (2048,) 2048 1bwds_wpm_level_lm/transformerlm/encoder_3/tr_fflayer/fflayer_ln/bias/var I0422 04:51:05.351178 140630240360192 py_utils.py:1267] MODEL ANALYSIS: _task.lm.stack.cell_0.encoder_3.fflayer.layer_norm.scale (2048,) 2048 1bwds_wpm_level_lm/transformerlm/encoder_3/tr_fflayer/fflayer_ln/scale/var I0422 04:51:05.351228 140630240360192 py_utils.py:1267] MODEL ANALYSIS: _task.lm.stack.cell_0.encoder_3.self_atten.atten.atten.per_dim_scale (128,) 128 1bwds_wpm_level_lm/transformerlm/encoder_3/multihead_self_atten/multihead_atten/inner_att/per_dim_scale/var I0422 04:51:05.351279 140630240360192 py_utils.py:1267] MODEL ANALYSIS: _task.lm.stack.cell_0.encoder_3.self_atten.atten.ctx_post_proj (2048, 2048) 4194304 1bwds_wpm_level_lm/transformerlm/encoder_3/multihead_self_atten/multihead_atten/ctx_post_proj/var I0422 04:51:05.351329 140630240360192 py_utils.py:1267] MODEL ANALYSIS: _task.lm.stack.cell_0.encoder_3.self_atten.atten.ctx_post_proj_b (2048,) 2048 1bwds_wpm_level_lm/transformerlm/encoder_3/multihead_self_atten/multihead_atten/ctx_post_proj_b/var I0422 04:51:05.351380 140630240360192 py_utils.py:1267] MODEL ANALYSIS: _task.lm.stack.cell_0.encoder_3.self_atten.atten.ctx_proj (2048, 2048) 4194304 1bwds_wpm_level_lm/transformerlm/encoder_3/multihead_self_atten/multihead_atten/ctx_proj/var I0422 04:51:05.351429 140630240360192 py_utils.py:1267] MODEL ANALYSIS: _task.lm.stack.cell_0.encoder_3.self_atten.atten.ctx_proj_b (2048,) 2048 1bwds_wpm_level_lm/transformerlm/encoder_3/multihead_self_atten/multihead_atten/ctx_proj_b/var I0422 04:51:05.351484 140630240360192 py_utils.py:1267] MODEL ANALYSIS: _task.lm.stack.cell_0.encoder_3.self_atten.atten.query_proj (2048, 2048) 4194304 1bwds_wpm_level_lm/transformerlm/encoder_3/multihead_self_atten/multihead_atten/query_proj/var I0422 04:51:05.351535 140630240360192 py_utils.py:1267] MODEL ANALYSIS: _task.lm.stack.cell_0.encoder_3.self_atten.atten.query_proj_b (2048,) 2048 1bwds_wpm_level_lm/transformerlm/encoder_3/multihead_self_atten/multihead_atten/query_proj_b/var I0422 04:51:05.351586 140630240360192 py_utils.py:1267] MODEL ANALYSIS: _task.lm.stack.cell_0.encoder_3.self_atten.atten.source_proj (2048, 2048) 4194304 1bwds_wpm_level_lm/transformerlm/encoder_3/multihead_self_atten/multihead_atten/source_proj/var I0422 04:51:05.351635 140630240360192 py_utils.py:1267] MODEL ANALYSIS: _task.lm.stack.cell_0.encoder_3.self_atten.atten.source_proj_b (2048,) 2048 1bwds_wpm_level_lm/transformerlm/encoder_3/multihead_self_atten/multihead_atten/source_proj_b/var I0422 04:51:05.351686 140630240360192 py_utils.py:1267] MODEL ANALYSIS: _task.lm.stack.cell_0.encoder_3.self_atten.layer_norm.bias (2048,) 2048 1bwds_wpm_level_lm/transformerlm/encoder_3/multihead_self_atten/atten_ln/bias/var I0422 04:51:05.351737 140630240360192 py_utils.py:1267] MODEL ANALYSIS: _task.lm.stack.cell_0.encoder_3.self_atten.layer_norm.scale (2048,) 2048 1bwds_wpm_level_lm/transformerlm/encoder_3/multihead_self_atten/atten_ln/scale/var I0422 04:51:05.351787 140630240360192 py_utils.py:1267] MODEL ANALYSIS: _task.lm.stack.cell_0.encoder_4.fflayer.fflayer.fc[0].b (8192,) 8192 1bwds_wpm_level_lm/transformerlm/encoder_4/tr_fflayer/fflayer/fflayer_0/b/var I0422 04:51:05.351836 140630240360192 py_utils.py:1267] MODEL ANALYSIS: _task.lm.stack.cell_0.encoder_4.fflayer.fflayer.fc[0].w (2048, 8192) 16777216 1bwds_wpm_level_lm/transformerlm/encoder_4/tr_fflayer/fflayer/fflayer_0/w/var I0422 04:51:05.351886 140630240360192 py_utils.py:1267] MODEL ANALYSIS: _task.lm.stack.cell_0.encoder_4.fflayer.fflayer.fc[1].b (2048,) 2048 1bwds_wpm_level_lm/transformerlm/encoder_4/tr_fflayer/fflayer/fflayer_1/b/var I0422 04:51:05.351937 140630240360192 py_utils.py:1267] MODEL ANALYSIS: _task.lm.stack.cell_0.encoder_4.fflayer.fflayer.fc[1].w (8192, 2048) 16777216 1bwds_wpm_level_lm/transformerlm/encoder_4/tr_fflayer/fflayer/fflayer_1/w/var I0422 04:51:05.351986 140630240360192 py_utils.py:1267] MODEL ANALYSIS: _task.lm.stack.cell_0.encoder_4.fflayer.layer_norm.bias (2048,) 2048 1bwds_wpm_level_lm/transformerlm/encoder_4/tr_fflayer/fflayer_ln/bias/var I0422 04:51:05.352035 140630240360192 py_utils.py:1267] MODEL ANALYSIS: _task.lm.stack.cell_0.encoder_4.fflayer.layer_norm.scale (2048,) 2048 1bwds_wpm_level_lm/transformerlm/encoder_4/tr_fflayer/fflayer_ln/scale/var I0422 04:51:05.352086 140630240360192 py_utils.py:1267] MODEL ANALYSIS: _task.lm.stack.cell_0.encoder_4.self_atten.atten.atten.per_dim_scale (128,) 128 1bwds_wpm_level_lm/transformerlm/encoder_4/multihead_self_atten/multihead_atten/inner_att/per_dim_scale/var I0422 04:51:05.352135 140630240360192 py_utils.py:1267] MODEL ANALYSIS: _task.lm.stack.cell_0.encoder_4.self_atten.atten.ctx_post_proj (2048, 2048) 4194304 1bwds_wpm_level_lm/transformerlm/encoder_4/multihead_self_atten/multihead_atten/ctx_post_proj/var I0422 04:51:05.352185 140630240360192 py_utils.py:1267] MODEL ANALYSIS: _task.lm.stack.cell_0.encoder_4.self_atten.atten.ctx_post_proj_b (2048,) 2048 1bwds_wpm_level_lm/transformerlm/encoder_4/multihead_self_atten/multihead_atten/ctx_post_proj_b/var I0422 04:51:05.352241 140630240360192 py_utils.py:1267] MODEL ANALYSIS: _task.lm.stack.cell_0.encoder_4.self_atten.atten.ctx_proj (2048, 2048) 4194304 1bwds_wpm_level_lm/transformerlm/encoder_4/multihead_self_atten/multihead_atten/ctx_proj/var I0422 04:51:05.352292 140630240360192 py_utils.py:1267] MODEL ANALYSIS: _task.lm.stack.cell_0.encoder_4.self_atten.atten.ctx_proj_b (2048,) 2048 1bwds_wpm_level_lm/transformerlm/encoder_4/multihead_self_atten/multihead_atten/ctx_proj_b/var I0422 04:51:05.352341 140630240360192 py_utils.py:1267] MODEL ANALYSIS: _task.lm.stack.cell_0.encoder_4.self_atten.atten.query_proj (2048, 2048) 4194304 1bwds_wpm_level_lm/transformerlm/encoder_4/multihead_self_atten/multihead_atten/query_proj/var I0422 04:51:05.352391 140630240360192 py_utils.py:1267] MODEL ANALYSIS: _task.lm.stack.cell_0.encoder_4.self_atten.atten.query_proj_b (2048,) 2048 1bwds_wpm_level_lm/transformerlm/encoder_4/multihead_self_atten/multihead_atten/query_proj_b/var I0422 04:51:05.352442 140630240360192 py_utils.py:1267] MODEL ANALYSIS: _task.lm.stack.cell_0.encoder_4.self_atten.atten.source_proj (2048, 2048) 4194304 1bwds_wpm_level_lm/transformerlm/encoder_4/multihead_self_atten/multihead_atten/source_proj/var I0422 04:51:05.352492 140630240360192 py_utils.py:1267] MODEL ANALYSIS: _task.lm.stack.cell_0.encoder_4.self_atten.atten.source_proj_b (2048,) 2048 1bwds_wpm_level_lm/transformerlm/encoder_4/multihead_self_atten/multihead_atten/source_proj_b/var I0422 04:51:05.352541 140630240360192 py_utils.py:1267] MODEL ANALYSIS: _task.lm.stack.cell_0.encoder_4.self_atten.layer_norm.bias (2048,) 2048 1bwds_wpm_level_lm/transformerlm/encoder_4/multihead_self_atten/atten_ln/bias/var I0422 04:51:05.352592 140630240360192 py_utils.py:1267] MODEL ANALYSIS: _task.lm.stack.cell_0.encoder_4.self_atten.layer_norm.scale (2048,) 2048 1bwds_wpm_level_lm/transformerlm/encoder_4/multihead_self_atten/atten_ln/scale/var I0422 04:51:05.352642 140630240360192 py_utils.py:1267] MODEL ANALYSIS: _task.lm.stack.cell_0.encoder_5.fflayer.fflayer.fc[0].b (8192,) 8192 1bwds_wpm_level_lm/transformerlm/encoder_5/tr_fflayer/fflayer/fflayer_0/b/var I0422 04:51:05.352693 140630240360192 py_utils.py:1267] MODEL ANALYSIS: _task.lm.stack.cell_0.encoder_5.fflayer.fflayer.fc[0].w (2048, 8192) 16777216 1bwds_wpm_level_lm/transformerlm/encoder_5/tr_fflayer/fflayer/fflayer_0/w/var I0422 04:51:05.352742 140630240360192 py_utils.py:1267] MODEL ANALYSIS: _task.lm.stack.cell_0.encoder_5.fflayer.fflayer.fc[1].b (2048,) 2048 1bwds_wpm_level_lm/transformerlm/encoder_5/tr_fflayer/fflayer/fflayer_1/b/var I0422 04:51:05.352792 140630240360192 py_utils.py:1267] MODEL ANALYSIS: _task.lm.stack.cell_0.encoder_5.fflayer.fflayer.fc[1].w (8192, 2048) 16777216 1bwds_wpm_level_lm/transformerlm/encoder_5/tr_fflayer/fflayer/fflayer_1/w/var I0422 04:51:05.352843 140630240360192 py_utils.py:1267] MODEL ANALYSIS: _task.lm.stack.cell_0.encoder_5.fflayer.layer_norm.bias (2048,) 2048 1bwds_wpm_level_lm/transformerlm/encoder_5/tr_fflayer/fflayer_ln/bias/var I0422 04:51:05.352894 140630240360192 py_utils.py:1267] MODEL ANALYSIS: _task.lm.stack.cell_0.encoder_5.fflayer.layer_norm.scale (2048,) 2048 1bwds_wpm_level_lm/transformerlm/encoder_5/tr_fflayer/fflayer_ln/scale/var I0422 04:51:05.352943 140630240360192 py_utils.py:1267] MODEL ANALYSIS: _task.lm.stack.cell_0.encoder_5.self_atten.atten.atten.per_dim_scale (128,) 128 1bwds_wpm_level_lm/transformerlm/encoder_5/multihead_self_atten/multihead_atten/inner_att/per_dim_scale/var I0422 04:51:05.352993 140630240360192 py_utils.py:1267] MODEL ANALYSIS: _task.lm.stack.cell_0.encoder_5.self_atten.atten.ctx_post_proj (2048, 2048) 4194304 1bwds_wpm_level_lm/transformerlm/encoder_5/multihead_self_atten/multihead_atten/ctx_post_proj/var I0422 04:51:05.353048 140630240360192 py_utils.py:1267] MODEL ANALYSIS: _task.lm.stack.cell_0.encoder_5.self_atten.atten.ctx_post_proj_b (2048,) 2048 1bwds_wpm_level_lm/transformerlm/encoder_5/multihead_self_atten/multihead_atten/ctx_post_proj_b/var I0422 04:51:05.353099 140630240360192 py_utils.py:1267] MODEL ANALYSIS: _task.lm.stack.cell_0.encoder_5.self_atten.atten.ctx_proj (2048, 2048) 4194304 1bwds_wpm_level_lm/transformerlm/encoder_5/multihead_self_atten/multihead_atten/ctx_proj/var I0422 04:51:05.353148 140630240360192 py_utils.py:1267] MODEL ANALYSIS: _task.lm.stack.cell_0.encoder_5.self_atten.atten.ctx_proj_b (2048,) 2048 1bwds_wpm_level_lm/transformerlm/encoder_5/multihead_self_atten/multihead_atten/ctx_proj_b/var I0422 04:51:05.353199 140630240360192 py_utils.py:1267] MODEL ANALYSIS: _task.lm.stack.cell_0.encoder_5.self_atten.atten.query_proj (2048, 2048) 4194304 1bwds_wpm_level_lm/transformerlm/encoder_5/multihead_self_atten/multihead_atten/query_proj/var I0422 04:51:05.353250 140630240360192 py_utils.py:1267] MODEL ANALYSIS: _task.lm.stack.cell_0.encoder_5.self_atten.atten.query_proj_b (2048,) 2048 1bwds_wpm_level_lm/transformerlm/encoder_5/multihead_self_atten/multihead_atten/query_proj_b/var I0422 04:51:05.353298 140630240360192 py_utils.py:1267] MODEL ANALYSIS: _task.lm.stack.cell_0.encoder_5.self_atten.atten.source_proj (2048, 2048) 4194304 1bwds_wpm_level_lm/transformerlm/encoder_5/multihead_self_atten/multihead_atten/source_proj/var I0422 04:51:05.353348 140630240360192 py_utils.py:1267] MODEL ANALYSIS: _task.lm.stack.cell_0.encoder_5.self_atten.atten.source_proj_b (2048,) 2048 1bwds_wpm_level_lm/transformerlm/encoder_5/multihead_self_atten/multihead_atten/source_proj_b/var I0422 04:51:05.353399 140630240360192 py_utils.py:1267] MODEL ANALYSIS: _task.lm.stack.cell_0.encoder_5.self_atten.layer_norm.bias (2048,) 2048 1bwds_wpm_level_lm/transformerlm/encoder_5/multihead_self_atten/atten_ln/bias/var I0422 04:51:05.353449 140630240360192 py_utils.py:1267] MODEL ANALYSIS: _task.lm.stack.cell_0.encoder_5.self_atten.layer_norm.scale (2048,) 2048 1bwds_wpm_level_lm/transformerlm/encoder_5/multihead_self_atten/atten_ln/scale/var I0422 04:51:05.353498 140630240360192 py_utils.py:1267] MODEL ANALYSIS: ==================================================================================================== I0422 04:51:05.353549 140630240360192 py_utils.py:1267] MODEL ANALYSIS: total #params: 433254400 I0422 04:51:05.353600 140630240360192 py_utils.py:1267] MODEL ANALYSIS: I0422 04:51:08.865504 140630240360192 trainer.py:1261] Job trainer_client start I0422 04:51:08.877777 140630240360192 base_runner.py:67] ============================================================ I0422 04:51:08.884105 140630240360192 base_runner.py:69] allow_implicit_capture : NoneType I0422 04:51:08.884196 140630240360192 base_runner.py:69] cls : type/lingvo.core.base_model/SingleTaskModel I0422 04:51:08.884263 140630240360192 base_runner.py:69] cluster.add_summary : NoneType I0422 04:51:08.884322 140630240360192 base_runner.py:69] cluster.cls : type/lingvo.core.cluster/_Cluster I0422 04:51:08.884380 140630240360192 base_runner.py:69] cluster.controller.cpus_per_replica : 1 I0422 04:51:08.884437 140630240360192 base_runner.py:69] cluster.controller.devices_per_split : 1 I0422 04:51:08.884491 140630240360192 base_runner.py:69] cluster.controller.gpus_per_replica : 0 I0422 04:51:08.884546 140630240360192 base_runner.py:69] cluster.controller.name : '/job:local' I0422 04:51:08.884602 140630240360192 base_runner.py:69] cluster.controller.num_tpu_hosts : 0 I0422 04:51:08.884655 140630240360192 base_runner.py:69] cluster.controller.replicas : 1 I0422 04:51:08.884727 140630240360192 base_runner.py:69] cluster.controller.tpus_per_replica : 0 I0422 04:51:08.884783 140630240360192 base_runner.py:69] cluster.decoder.cpus_per_replica : 1 I0422 04:51:08.884839 140630240360192 base_runner.py:69] cluster.decoder.devices_per_split : 1 I0422 04:51:08.884892 140630240360192 base_runner.py:69] cluster.decoder.gpus_per_replica : 1 I0422 04:51:08.884948 140630240360192 base_runner.py:69] cluster.decoder.name : '/job:local' I0422 04:51:08.885003 140630240360192 base_runner.py:69] cluster.decoder.num_tpu_hosts : 0 I0422 04:51:08.885057 140630240360192 base_runner.py:69] cluster.decoder.replicas : 1 I0422 04:51:08.885113 140630240360192 base_runner.py:69] cluster.decoder.tpus_per_replica : 0 I0422 04:51:08.885168 140630240360192 base_runner.py:69] cluster.evaler.cpus_per_replica : 1 I0422 04:51:08.885221 140630240360192 base_runner.py:69] cluster.evaler.devices_per_split : 1 I0422 04:51:08.885277 140630240360192 base_runner.py:69] cluster.evaler.gpus_per_replica : 1 I0422 04:51:08.885332 140630240360192 base_runner.py:69] cluster.evaler.name : '/job:local' I0422 04:51:08.885385 140630240360192 base_runner.py:69] cluster.evaler.num_tpu_hosts : 0 I0422 04:51:08.885440 140630240360192 base_runner.py:69] cluster.evaler.replicas : 1 I0422 04:51:08.885494 140630240360192 base_runner.py:69] cluster.evaler.tpus_per_replica : 0 I0422 04:51:08.885550 140630240360192 base_runner.py:69] cluster.input.cpus_per_replica : 1 I0422 04:51:08.885603 140630240360192 base_runner.py:69] cluster.input.devices_per_split : 1 I0422 04:51:08.885658 140630240360192 base_runner.py:69] cluster.input.gpus_per_replica : 0 I0422 04:51:08.885714 140630240360192 base_runner.py:69] cluster.input.name : '/job:local' I0422 04:51:08.885767 140630240360192 base_runner.py:69] cluster.input.num_tpu_hosts : 0 I0422 04:51:08.885823 140630240360192 base_runner.py:69] cluster.input.replicas : 0 I0422 04:51:08.885878 140630240360192 base_runner.py:69] cluster.input.tpus_per_replica : 0 I0422 04:51:08.885931 140630240360192 base_runner.py:69] cluster.job : 'trainer_client' I0422 04:51:08.885986 140630240360192 base_runner.py:69] cluster.mode : 'sync' I0422 04:51:08.886039 140630240360192 base_runner.py:69] cluster.ps.cpus_per_replica : 1 I0422 04:51:08.886095 140630240360192 base_runner.py:69] cluster.ps.devices_per_split : 1 I0422 04:51:08.886149 140630240360192 base_runner.py:69] cluster.ps.gpus_per_replica : 0 I0422 04:51:08.886204 140630240360192 base_runner.py:69] cluster.ps.name : '/job:local' I0422 04:51:08.886260 140630240360192 base_runner.py:69] cluster.ps.num_tpu_hosts : 0 I0422 04:51:08.886313 140630240360192 base_runner.py:69] cluster.ps.replicas : 1 I0422 04:51:08.886368 140630240360192 base_runner.py:69] cluster.ps.tpus_per_replica : 0 I0422 04:51:08.886423 140630240360192 base_runner.py:69] cluster.task : 0 I0422 04:51:08.886476 140630240360192 base_runner.py:69] cluster.worker.cpus_per_replica : 1 I0422 04:51:08.886531 140630240360192 base_runner.py:69] cluster.worker.devices_per_split : 4 I0422 04:51:08.886584 140630240360192 base_runner.py:69] cluster.worker.gpus_per_replica : 1 I0422 04:51:08.886640 140630240360192 base_runner.py:69] cluster.worker.name : '/job:local' I0422 04:51:08.886693 140630240360192 base_runner.py:69] cluster.worker.num_tpu_hosts : 0 I0422 04:51:08.886748 140630240360192 base_runner.py:69] cluster.worker.replicas : 1 I0422 04:51:08.886805 140630240360192 base_runner.py:69] cluster.worker.tpus_per_replica : 0 I0422 04:51:08.886858 140630240360192 base_runner.py:69] dtype : float32 I0422 04:51:08.886923 140630240360192 base_runner.py:69] fprop_dtype : NoneType I0422 04:51:08.886979 140630240360192 base_runner.py:69] inference_driver_name : NoneType I0422 04:51:08.887032 140630240360192 base_runner.py:69] input.allow_implicit_capture : NoneType I0422 04:51:08.887088 140630240360192 base_runner.py:69] input.bucket_adjust_every_n : 0 I0422 04:51:08.887141 140630240360192 base_runner.py:69] input.bucket_batch_limit : [8] I0422 04:51:08.887202 140630240360192 base_runner.py:69] input.bucket_upper_bound : [100] I0422 04:51:08.887258 140630240360192 base_runner.py:69] input.cls : type/lingvo.tasks.lm.input_generator/LmInput I0422 04:51:08.887312 140630240360192 base_runner.py:69] input.dtype : float32 I0422 04:51:08.887367 140630240360192 base_runner.py:69] input.file_buffer_size : 10000000 I0422 04:51:08.887423 140630240360192 base_runner.py:69] input.file_parallelism : 10 I0422 04:51:08.887478 140630240360192 base_runner.py:69] input.file_pattern : 'text:/tmp/lm1b/1-billion-word-language-modeling-benchmark-r13output/training-monolingual.tokenized.shuffled/news.en*' I0422 04:51:08.887532 140630240360192 base_runner.py:69] input.file_random_seed : 301 I0422 04:51:08.887588 140630240360192 base_runner.py:69] input.fixed_input_shape : True I0422 04:51:08.887643 140630240360192 base_runner.py:69] input.flush_every_n : 0 I0422 04:51:08.887696 140630240360192 base_runner.py:69] input.fprop_dtype : NoneType I0422 04:51:08.887751 140630240360192 base_runner.py:69] input.inference_driver_name : NoneType I0422 04:51:08.887804 140630240360192 base_runner.py:69] input.is_eval : NoneType I0422 04:51:08.887860 140630240360192 base_runner.py:69] input.is_inference : NoneType I0422 04:51:08.887914 140630240360192 base_runner.py:69] input.name : '1bwds_train_set' I0422 04:51:08.887969 140630240360192 base_runner.py:69] input.num_batcher_threads : 16 I0422 04:51:08.888024 140630240360192 base_runner.py:69] input.num_samples : 0 I0422 04:51:08.888077 140630240360192 base_runner.py:69] input.pad_to_max_seq_length : False I0422 04:51:08.888132 140630240360192 base_runner.py:69] input.params_init.method : 'xavier' I0422 04:51:08.888185 140630240360192 base_runner.py:69] input.params_init.scale : 1.000001 I0422 04:51:08.888240 140630240360192 base_runner.py:69] input.params_init.seed : NoneType I0422 04:51:08.888293 140630240360192 base_runner.py:69] input.random_seed : NoneType I0422 04:51:08.888348 140630240360192 base_runner.py:69] input.require_sequential_order : False I0422 04:51:08.888401 140630240360192 base_runner.py:69] input.skip_lp_regularization : NoneType I0422 04:51:08.888456 140630240360192 base_runner.py:69] input.source_max_length : NoneType I0422 04:51:08.888509 140630240360192 base_runner.py:69] input.target_max_length : 1024 I0422 04:51:08.888564 140630240360192 base_runner.py:69] input.tokenizer.allow_implicit_capture : NoneType I0422 04:51:08.888617 140630240360192 base_runner.py:69] input.tokenizer.append_eos : True I0422 04:51:08.888672 140630240360192 base_runner.py:69] input.tokenizer.cls : type/lingvo.core.tokenizers/VocabFileTokenizer I0422 04:51:08.888727 140630240360192 base_runner.py:69] input.tokenizer.dtype : float32 I0422 04:51:08.888782 140630240360192 base_runner.py:69] input.tokenizer.fprop_dtype : NoneType I0422 04:51:08.888835 140630240360192 base_runner.py:69] input.tokenizer.inference_driver_name : NoneType I0422 04:51:08.888890 140630240360192 base_runner.py:69] input.tokenizer.is_eval : NoneType I0422 04:51:08.888945 140630240360192 base_runner.py:69] input.tokenizer.is_inference : NoneType I0422 04:51:08.888998 140630240360192 base_runner.py:69] input.tokenizer.load_token_ids_from_vocab : True I0422 04:51:08.889053 140630240360192 base_runner.py:69] input.tokenizer.name : 'tokenizer' I0422 04:51:08.889106 140630240360192 base_runner.py:69] input.tokenizer.ngram_separator : '' I0422 04:51:08.889159 140630240360192 base_runner.py:69] input.tokenizer.ngram_vocab_filepath : NoneType I0422 04:51:08.889214 140630240360192 base_runner.py:69] input.tokenizer.pad_to_max_length : True I0422 04:51:08.889267 140630240360192 base_runner.py:69] input.tokenizer.params_init.method : 'xavier' I0422 04:51:08.889321 140630240360192 base_runner.py:69] input.tokenizer.params_init.scale : 1.000001 I0422 04:51:08.889374 140630240360192 base_runner.py:69] input.tokenizer.params_init.seed : NoneType I0422 04:51:08.889429 140630240360192 base_runner.py:69] input.tokenizer.random_seed : NoneType I0422 04:51:08.889482 140630240360192 base_runner.py:69] input.tokenizer.skip_lp_regularization : NoneType I0422 04:51:08.889543 140630240360192 base_runner.py:69] input.tokenizer.target_eos_id : 2 I0422 04:51:08.889597 140630240360192 base_runner.py:69] input.tokenizer.target_sos_id : 1 I0422 04:51:08.889652 140630240360192 base_runner.py:69] input.tokenizer.target_unk_id : 3 I0422 04:51:08.889707 140630240360192 base_runner.py:69] input.tokenizer.token_vocab_filepath : '/tmp/lm1b/1-billion-word-language-modeling-benchmark-r13output/vocab.txt' I0422 04:51:08.889761 140630240360192 base_runner.py:69] input.tokenizer.tokens_delimiter : ' ' I0422 04:51:08.889816 140630240360192 base_runner.py:69] input.tokenizer.vn.global_vn : False I0422 04:51:08.889870 140630240360192 base_runner.py:69] input.tokenizer.vn.per_step_vn : False I0422 04:51:08.889925 140630240360192 base_runner.py:69] input.tokenizer.vn.scale : NoneType I0422 04:51:08.889978 140630240360192 base_runner.py:69] input.tokenizer.vn.seed : NoneType I0422 04:51:08.890033 140630240360192 base_runner.py:69] input.tokenizer.vocab_size : 32000 I0422 04:51:08.890085 140630240360192 base_runner.py:69] input.tokenizer_dict : {} I0422 04:51:08.890140 140630240360192 base_runner.py:69] input.tpu_infeed_parallism : 1 I0422 04:51:08.890194 140630240360192 base_runner.py:69] input.use_per_host_infeed : False I0422 04:51:08.890248 140630240360192 base_runner.py:69] input.use_within_batch_mixing : False I0422 04:51:08.890301 140630240360192 base_runner.py:69] input.vn.global_vn : False I0422 04:51:08.890388 140630240360192 base_runner.py:69] input.vn.per_step_vn : False I0422 04:51:08.890450 140630240360192 base_runner.py:69] input.vn.scale : NoneType I0422 04:51:08.890510 140630240360192 base_runner.py:69] input.vn.seed : NoneType I0422 04:51:08.890569 140630240360192 base_runner.py:69] is_eval : NoneType I0422 04:51:08.890629 140630240360192 base_runner.py:69] is_inference : NoneType I0422 04:51:08.890688 140630240360192 base_runner.py:69] model : 'lm.one_billion_wds.OneBWdsGPipeTransformer@/tmp/lingvo/bazel-bin/lingvo/trainer.runfiles/__main__/lingvo/tasks/lm/params/one_billion_wds.py:186' I0422 04:51:08.890779 140630240360192 base_runner.py:69] name : '' I0422 04:51:08.890851 140630240360192 base_runner.py:69] params_init.method : 'xavier' I0422 04:51:08.890924 140630240360192 base_runner.py:69] params_init.scale : 1.000001 I0422 04:51:08.890980 140630240360192 base_runner.py:69] params_init.seed : NoneType I0422 04:51:08.891035 140630240360192 base_runner.py:69] random_seed : NoneType I0422 04:51:08.891088 140630240360192 base_runner.py:69] skip_lp_regularization : NoneType I0422 04:51:08.891143 140630240360192 base_runner.py:69] task.allow_implicit_capture : NoneType I0422 04:51:08.891197 140630240360192 base_runner.py:69] task.cls : type/lingvo.tasks.lm.model/FixedShapeInputLanguageModel I0422 04:51:08.891252 140630240360192 base_runner.py:69] task.decoder : NoneType I0422 04:51:08.891307 140630240360192 base_runner.py:69] task.dtype : float32 I0422 04:51:08.891360 140630240360192 base_runner.py:69] task.encoder : NoneType I0422 04:51:08.891415 140630240360192 base_runner.py:69] task.eval.decoder_samples_per_summary : 0 I0422 04:51:08.891469 140630240360192 base_runner.py:69] task.eval.samples_per_summary : 0 I0422 04:51:08.891522 140630240360192 base_runner.py:69] task.fprop_dtype : NoneType I0422 04:51:08.891577 140630240360192 base_runner.py:69] task.inference_driver_name : NoneType I0422 04:51:08.891630 140630240360192 base_runner.py:69] task.input : NoneType I0422 04:51:08.891685 140630240360192 base_runner.py:69] task.is_eval : NoneType I0422 04:51:08.891738 140630240360192 base_runner.py:69] task.is_inference : NoneType I0422 04:51:08.891793 140630240360192 base_runner.py:69] task.lm.allow_implicit_capture : NoneType I0422 04:51:08.891848 140630240360192 base_runner.py:69] task.lm.atten_dropout_prob : 0.1 I0422 04:51:08.891901 140630240360192 base_runner.py:69] task.lm.cls : type/lingvo.tasks.lm.layers/GPipeTransformerLm I0422 04:51:08.891956 140630240360192 base_runner.py:69] task.lm.dtype : float32 I0422 04:51:08.892009 140630240360192 base_runner.py:69] task.lm.emb.allow_implicit_capture : NoneType I0422 04:51:08.892067 140630240360192 base_runner.py:69] task.lm.emb.apply_pruning : False I0422 04:51:08.892123 140630240360192 base_runner.py:69] task.lm.emb.cls : type/lingvo.core.layers/SimpleEmbeddingLayer I0422 04:51:08.892178 140630240360192 base_runner.py:69] task.lm.emb.dtype : float32 I0422 04:51:08.892232 140630240360192 base_runner.py:69] task.lm.emb.embedding_dim : 2048 I0422 04:51:08.892287 140630240360192 base_runner.py:69] task.lm.emb.fprop_dtype : NoneType I0422 04:51:08.892340 140630240360192 base_runner.py:69] task.lm.emb.fprop_mode : NoneType I0422 04:51:08.892395 140630240360192 base_runner.py:69] task.lm.emb.inference_driver_name : NoneType I0422 04:51:08.892448 140630240360192 base_runner.py:69] task.lm.emb.is_eval : NoneType I0422 04:51:08.892502 140630240360192 base_runner.py:69] task.lm.emb.is_inference : NoneType I0422 04:51:08.892555 140630240360192 base_runner.py:69] task.lm.emb.name : '' I0422 04:51:08.892610 140630240360192 base_runner.py:69] task.lm.emb.params_init.method : 'gaussian' I0422 04:51:08.892663 140630240360192 base_runner.py:69] task.lm.emb.params_init.scale : 0.0220970869121 I0422 04:51:08.892718 140630240360192 base_runner.py:69] task.lm.emb.params_init.seed : NoneType I0422 04:51:08.892771 140630240360192 base_runner.py:69] task.lm.emb.qdomain.default : NoneType I0422 04:51:08.892826 140630240360192 base_runner.py:69] task.lm.emb.random_seed : NoneType I0422 04:51:08.892879 140630240360192 base_runner.py:69] task.lm.emb.skip_lp_regularization : NoneType I0422 04:51:08.892934 140630240360192 base_runner.py:69] task.lm.emb.use_3d_weight_tensor : False I0422 04:51:08.892987 140630240360192 base_runner.py:69] task.lm.emb.use_matmul : False I0422 04:51:08.893042 140630240360192 base_runner.py:69] task.lm.emb.vn.global_vn : False I0422 04:51:08.893095 140630240360192 base_runner.py:69] task.lm.emb.vn.per_step_vn : False I0422 04:51:08.893150 140630240360192 base_runner.py:69] task.lm.emb.vn.scale : NoneType I0422 04:51:08.893203 140630240360192 base_runner.py:69] task.lm.emb.vn.seed : NoneType I0422 04:51:08.893258 140630240360192 base_runner.py:69] task.lm.emb.vocab_size : 32000 I0422 04:51:08.893312 140630240360192 base_runner.py:69] task.lm.fprop_dtype : NoneType I0422 04:51:08.893367 140630240360192 base_runner.py:69] task.lm.inference_driver_name : NoneType I0422 04:51:08.893419 140630240360192 base_runner.py:69] task.lm.input_dropout_prob : 0.0 I0422 04:51:08.893475 140630240360192 base_runner.py:69] task.lm.is_eval : NoneType I0422 04:51:08.893528 140630240360192 base_runner.py:69] task.lm.is_inference : NoneType I0422 04:51:08.893583 140630240360192 base_runner.py:69] task.lm.label_smoother : NoneType I0422 04:51:08.893637 140630240360192 base_runner.py:69] task.lm.model_dim : 2048 I0422 04:51:08.893691 140630240360192 base_runner.py:69] task.lm.name : 'transformerlm' I0422 04:51:08.893744 140630240360192 base_runner.py:69] task.lm.params_init.method : 'xavier' I0422 04:51:08.893799 140630240360192 base_runner.py:69] task.lm.params_init.scale : 1.000001 I0422 04:51:08.893853 140630240360192 base_runner.py:69] task.lm.params_init.seed : NoneType I0422 04:51:08.893907 140630240360192 base_runner.py:69] task.lm.position_emb.allow_implicit_capture : NoneType I0422 04:51:08.893960 140630240360192 base_runner.py:69] task.lm.position_emb.cls : type/lingvo.core.layers/PositionalEmbeddingLayer I0422 04:51:08.894016 140630240360192 base_runner.py:69] task.lm.position_emb.dtype : float32 I0422 04:51:08.894069 140630240360192 base_runner.py:69] task.lm.position_emb.embedding_dim : 2048 I0422 04:51:08.894124 140630240360192 base_runner.py:69] task.lm.position_emb.fprop_dtype : NoneType I0422 04:51:08.894176 140630240360192 base_runner.py:69] task.lm.position_emb.inference_driver_name : NoneType I0422 04:51:08.894231 140630240360192 base_runner.py:69] task.lm.position_emb.is_eval : NoneType I0422 04:51:08.894284 140630240360192 base_runner.py:69] task.lm.position_emb.is_inference : NoneType I0422 04:51:08.894351 140630240360192 base_runner.py:69] task.lm.position_emb.max_timescale : 10000 I0422 04:51:08.894407 140630240360192 base_runner.py:69] task.lm.position_emb.min_timescale : 1 I0422 04:51:08.894460 140630240360192 base_runner.py:69] task.lm.position_emb.name : '' I0422 04:51:08.894515 140630240360192 base_runner.py:69] task.lm.position_emb.params_init.method : 'xavier' I0422 04:51:08.894568 140630240360192 base_runner.py:69] task.lm.position_emb.params_init.scale : 1.000001 I0422 04:51:08.894623 140630240360192 base_runner.py:69] task.lm.position_emb.params_init.seed : NoneType I0422 04:51:08.894676 140630240360192 base_runner.py:69] task.lm.position_emb.random_seed : NoneType I0422 04:51:08.894731 140630240360192 base_runner.py:69] task.lm.position_emb.skip_lp_regularization : NoneType I0422 04:51:08.894783 140630240360192 base_runner.py:69] task.lm.position_emb.trainable_scaling : False I0422 04:51:08.894838 140630240360192 base_runner.py:69] task.lm.position_emb.trainable_scaling_init : 1.0 I0422 04:51:08.894902 140630240360192 base_runner.py:69] task.lm.position_emb.vn.global_vn : False I0422 04:51:08.894958 140630240360192 base_runner.py:69] task.lm.position_emb.vn.per_step_vn : False I0422 04:51:08.895011 140630240360192 base_runner.py:69] task.lm.position_emb.vn.scale : NoneType I0422 04:51:08.895066 140630240360192 base_runner.py:69] task.lm.position_emb.vn.seed : NoneType I0422 04:51:08.895119 140630240360192 base_runner.py:69] task.lm.random_seed : NoneType I0422 04:51:08.895173 140630240360192 base_runner.py:69] task.lm.relu_dropout_prob : 0.0 I0422 04:51:08.895226 140630240360192 base_runner.py:69] task.lm.residual_dropout_prob : 0.1 I0422 04:51:08.895281 140630240360192 base_runner.py:69] task.lm.skip_lp_regularization : NoneType I0422 04:51:08.895334 140630240360192 base_runner.py:69] task.lm.softmax.allow_implicit_capture : NoneType I0422 04:51:08.895389 140630240360192 base_runner.py:69] task.lm.softmax.apply_pruning : False I0422 04:51:08.895442 140630240360192 base_runner.py:69] task.lm.softmax.chunk_size : 4194 I0422 04:51:08.895497 140630240360192 base_runner.py:69] task.lm.softmax.cls : type/lingvo.core.layers/SimpleFullSoftmax I0422 04:51:08.895551 140630240360192 base_runner.py:69] task.lm.softmax.dtype : float32 I0422 04:51:08.895606 140630240360192 base_runner.py:69] task.lm.softmax.fprop_dtype : NoneType I0422 04:51:08.895659 140630240360192 base_runner.py:69] task.lm.softmax.inference_driver_name : NoneType I0422 04:51:08.895713 140630240360192 base_runner.py:69] task.lm.softmax.input_dim : 0 I0422 04:51:08.895766 140630240360192 base_runner.py:69] task.lm.softmax.is_eval : NoneType I0422 04:51:08.895819 140630240360192 base_runner.py:69] task.lm.softmax.is_inference : NoneType I0422 04:51:08.895873 140630240360192 base_runner.py:69] task.lm.softmax.logits_abs_max : NoneType I0422 04:51:08.895926 140630240360192 base_runner.py:69] task.lm.softmax.name : '' I0422 04:51:08.895979 140630240360192 base_runner.py:69] task.lm.softmax.num_classes : 32000 I0422 04:51:08.896034 140630240360192 base_runner.py:69] task.lm.softmax.num_sampled : 0 I0422 04:51:08.896087 140630240360192 base_runner.py:69] task.lm.softmax.num_shards : 16 I0422 04:51:08.896141 140630240360192 base_runner.py:69] task.lm.softmax.params_init.method : 'xavier' I0422 04:51:08.896194 140630240360192 base_runner.py:69] task.lm.softmax.params_init.scale : 1.000001 I0422 04:51:08.896249 140630240360192 base_runner.py:69] task.lm.softmax.params_init.seed : NoneType I0422 04:51:08.896301 140630240360192 base_runner.py:69] task.lm.softmax.qdomain.default : NoneType I0422 04:51:08.896356 140630240360192 base_runner.py:69] task.lm.softmax.random_seed : NoneType I0422 04:51:08.896409 140630240360192 base_runner.py:69] task.lm.softmax.skip_lp_regularization : NoneType I0422 04:51:08.896462 140630240360192 base_runner.py:69] task.lm.softmax.vn.global_vn : False I0422 04:51:08.896517 140630240360192 base_runner.py:69] task.lm.softmax.vn.per_step_vn : False I0422 04:51:08.896569 140630240360192 base_runner.py:69] task.lm.softmax.vn.scale : NoneType I0422 04:51:08.896627 140630240360192 base_runner.py:69] task.lm.softmax.vn.seed : NoneType I0422 04:51:08.896682 140630240360192 base_runner.py:69] task.lm.stack.allow_implicit_capture : NoneType I0422 04:51:08.896737 140630240360192 base_runner.py:69] task.lm.stack.apply_dropout_every_n : 1 I0422 04:51:08.896790 140630240360192 base_runner.py:69] task.lm.stack.batch_dim : 1 I0422 04:51:08.896843 140630240360192 base_runner.py:69] task.lm.stack.cls : type/lingvo.core.layers_with_gpipe/GPipeTransformerStack I0422 04:51:08.896898 140630240360192 base_runner.py:69] task.lm.stack.decoder_tpl.allow_implicit_capture : NoneType I0422 04:51:08.896951 140630240360192 base_runner.py:69] task.lm.stack.decoder_tpl.cls : type/lingvo.core.layers_with_gpipe/GPipeTransformerLayer I0422 04:51:08.897006 140630240360192 base_runner.py:69] task.lm.stack.decoder_tpl.dtype : float32 I0422 04:51:08.897059 140630240360192 base_runner.py:69] task.lm.stack.decoder_tpl.fprop_dtype : NoneType I0422 04:51:08.897114 140630240360192 base_runner.py:69] task.lm.stack.decoder_tpl.has_aux_atten : True I0422 04:51:08.897166 140630240360192 base_runner.py:69] task.lm.stack.decoder_tpl.inference_driver_name : NoneType I0422 04:51:08.897221 140630240360192 base_runner.py:69] task.lm.stack.decoder_tpl.is_decoder : False I0422 04:51:08.897274 140630240360192 base_runner.py:69] task.lm.stack.decoder_tpl.is_eval : NoneType I0422 04:51:08.897329 140630240360192 base_runner.py:69] task.lm.stack.decoder_tpl.is_inference : NoneType I0422 04:51:08.897382 140630240360192 base_runner.py:69] task.lm.stack.decoder_tpl.is_transparent : False I0422 04:51:08.897435 140630240360192 base_runner.py:69] task.lm.stack.decoder_tpl.mask_self_atten : True I0422 04:51:08.897490 140630240360192 base_runner.py:69] task.lm.stack.decoder_tpl.name : '' I0422 04:51:08.897543 140630240360192 base_runner.py:69] task.lm.stack.decoder_tpl.num_transparent_outputs : 0 I0422 04:51:08.897597 140630240360192 base_runner.py:69] task.lm.stack.decoder_tpl.output_dim : 0 I0422 04:51:08.897650 140630240360192 base_runner.py:69] task.lm.stack.decoder_tpl.packed_input : False I0422 04:51:08.897703 140630240360192 base_runner.py:69] task.lm.stack.decoder_tpl.params_init.method : 'xavier' I0422 04:51:08.897758 140630240360192 base_runner.py:69] task.lm.stack.decoder_tpl.params_init.scale : 1.000001 I0422 04:51:08.897811 140630240360192 base_runner.py:69] task.lm.stack.decoder_tpl.params_init.seed : NoneType I0422 04:51:08.897866 140630240360192 base_runner.py:69] task.lm.stack.decoder_tpl.random_seed : NoneType I0422 04:51:08.897922 140630240360192 base_runner.py:69] task.lm.stack.decoder_tpl.skip_lp_regularization : NoneType I0422 04:51:08.897975 140630240360192 base_runner.py:69] task.lm.stack.decoder_tpl.source_dim : 0 I0422 04:51:08.898030 140630240360192 base_runner.py:69] task.lm.stack.decoder_tpl.tr_atten_tpl.add_unnormalized_input : False I0422 04:51:08.898083 140630240360192 base_runner.py:69] task.lm.stack.decoder_tpl.tr_atten_tpl.allow_implicit_capture : NoneType I0422 04:51:08.898138 140630240360192 base_runner.py:69] task.lm.stack.decoder_tpl.tr_atten_tpl.atten_dropout_prob : 0.0 I0422 04:51:08.898191 140630240360192 base_runner.py:69] task.lm.stack.decoder_tpl.tr_atten_tpl.atten_hidden_dim : 0 I0422 04:51:08.898246 140630240360192 base_runner.py:69] task.lm.stack.decoder_tpl.tr_atten_tpl.atten_tpl.allow_implicit_capture : NoneType I0422 04:51:08.898299 140630240360192 base_runner.py:69] task.lm.stack.decoder_tpl.tr_atten_tpl.atten_tpl.atten_dropout_deterministic : False I0422 04:51:08.898354 140630240360192 base_runner.py:69] task.lm.stack.decoder_tpl.tr_atten_tpl.atten_tpl.atten_dropout_prob : 0.0 I0422 04:51:08.898407 140630240360192 base_runner.py:69] task.lm.stack.decoder_tpl.tr_atten_tpl.atten_tpl.cls : type/lingvo.core.attention/MultiHeadedAttention I0422 04:51:08.898463 140630240360192 base_runner.py:69] task.lm.stack.decoder_tpl.tr_atten_tpl.atten_tpl.context_dim : 0 I0422 04:51:08.898516 140630240360192 base_runner.py:69] task.lm.stack.decoder_tpl.tr_atten_tpl.atten_tpl.ctx_post_proj_dim : 0 I0422 04:51:08.898575 140630240360192 base_runner.py:69] task.lm.stack.decoder_tpl.tr_atten_tpl.atten_tpl.dtype : float32 I0422 04:51:08.898632 140630240360192 base_runner.py:69] task.lm.stack.decoder_tpl.tr_atten_tpl.atten_tpl.enable_ctx_post_proj : True I0422 04:51:08.898684 140630240360192 base_runner.py:69] task.lm.stack.decoder_tpl.tr_atten_tpl.atten_tpl.enable_ctx_pre_proj : False I0422 04:51:08.898739 140630240360192 base_runner.py:69] task.lm.stack.decoder_tpl.tr_atten_tpl.atten_tpl.enable_query_proj : True I0422 04:51:08.898792 140630240360192 base_runner.py:69] task.lm.stack.decoder_tpl.tr_atten_tpl.atten_tpl.enable_source_proj : True I0422 04:51:08.898864 140630240360192 base_runner.py:69] task.lm.stack.decoder_tpl.tr_atten_tpl.atten_tpl.fprop_dtype : NoneType I0422 04:51:08.898933 140630240360192 base_runner.py:69] task.lm.stack.decoder_tpl.tr_atten_tpl.atten_tpl.hidden_dim : 0 I0422 04:51:08.898992 140630240360192 base_runner.py:69] task.lm.stack.decoder_tpl.tr_atten_tpl.atten_tpl.inference_driver_name : NoneType I0422 04:51:08.899049 140630240360192 base_runner.py:69] task.lm.stack.decoder_tpl.tr_atten_tpl.atten_tpl.inner_atten_params.allow_implicit_capture : NoneType I0422 04:51:08.899105 140630240360192 base_runner.py:69] task.lm.stack.decoder_tpl.tr_atten_tpl.atten_tpl.inner_atten_params.atten_dropout_deterministic : False I0422 04:51:08.899161 140630240360192 base_runner.py:69] task.lm.stack.decoder_tpl.tr_atten_tpl.atten_tpl.inner_atten_params.atten_dropout_prob : 0.0 I0422 04:51:08.899231 140630240360192 base_runner.py:69] task.lm.stack.decoder_tpl.tr_atten_tpl.atten_tpl.inner_atten_params.cls : type/lingvo.core.attention/DotProductAttention I0422 04:51:08.899287 140630240360192 base_runner.py:69] task.lm.stack.decoder_tpl.tr_atten_tpl.atten_tpl.inner_atten_params.dtype : float32 I0422 04:51:08.899343 140630240360192 base_runner.py:69] task.lm.stack.decoder_tpl.tr_atten_tpl.atten_tpl.inner_atten_params.fprop_dtype : NoneType I0422 04:51:08.899396 140630240360192 base_runner.py:69] task.lm.stack.decoder_tpl.tr_atten_tpl.atten_tpl.inner_atten_params.hidden_dim : 0 I0422 04:51:08.899451 140630240360192 base_runner.py:69] task.lm.stack.decoder_tpl.tr_atten_tpl.atten_tpl.inner_atten_params.inference_driver_name : NoneType I0422 04:51:08.899504 140630240360192 base_runner.py:69] task.lm.stack.decoder_tpl.tr_atten_tpl.atten_tpl.inner_atten_params.is_eval : NoneType I0422 04:51:08.899559 140630240360192 base_runner.py:69] task.lm.stack.decoder_tpl.tr_atten_tpl.atten_tpl.inner_atten_params.is_inference : NoneType I0422 04:51:08.899612 140630240360192 base_runner.py:69] task.lm.stack.decoder_tpl.tr_atten_tpl.atten_tpl.inner_atten_params.name : '' I0422 04:51:08.899667 140630240360192 base_runner.py:69] task.lm.stack.decoder_tpl.tr_atten_tpl.atten_tpl.inner_atten_params.packed_input : False I0422 04:51:08.899722 140630240360192 base_runner.py:69] task.lm.stack.decoder_tpl.tr_atten_tpl.atten_tpl.inner_atten_params.params_init.method : 'xavier' I0422 04:51:08.899775 140630240360192 base_runner.py:69] task.lm.stack.decoder_tpl.tr_atten_tpl.atten_tpl.inner_atten_params.params_init.scale : 1.000001 I0422 04:51:08.899830 140630240360192 base_runner.py:69] task.lm.stack.decoder_tpl.tr_atten_tpl.atten_tpl.inner_atten_params.params_init.seed : NoneType I0422 04:51:08.899883 140630240360192 base_runner.py:69] task.lm.stack.decoder_tpl.tr_atten_tpl.atten_tpl.inner_atten_params.qdomain.default : NoneType I0422 04:51:08.899938 140630240360192 base_runner.py:69] task.lm.stack.decoder_tpl.tr_atten_tpl.atten_tpl.inner_atten_params.qdomain.fullyconnected : NoneType I0422 04:51:08.899991 140630240360192 base_runner.py:69] task.lm.stack.decoder_tpl.tr_atten_tpl.atten_tpl.inner_atten_params.qdomain.softmax : NoneType I0422 04:51:08.900044 140630240360192 base_runner.py:69] task.lm.stack.decoder_tpl.tr_atten_tpl.atten_tpl.inner_atten_params.query_dim : 0 I0422 04:51:08.900099 140630240360192 base_runner.py:69] task.lm.stack.decoder_tpl.tr_atten_tpl.atten_tpl.inner_atten_params.random_seed : NoneType I0422 04:51:08.900152 140630240360192 base_runner.py:69] task.lm.stack.decoder_tpl.tr_atten_tpl.atten_tpl.inner_atten_params.skip_lp_regularization : NoneType I0422 04:51:08.900209 140630240360192 base_runner.py:69] task.lm.stack.decoder_tpl.tr_atten_tpl.atten_tpl.inner_atten_params.source_dim : 0 I0422 04:51:08.900264 140630240360192 base_runner.py:69] task.lm.stack.decoder_tpl.tr_atten_tpl.atten_tpl.inner_atten_params.vn.global_vn : False I0422 04:51:08.900317 140630240360192 base_runner.py:69] task.lm.stack.decoder_tpl.tr_atten_tpl.atten_tpl.inner_atten_params.vn.per_step_vn : False I0422 04:51:08.900371 140630240360192 base_runner.py:69] task.lm.stack.decoder_tpl.tr_atten_tpl.atten_tpl.inner_atten_params.vn.scale : NoneType I0422 04:51:08.900424 140630240360192 base_runner.py:69] task.lm.stack.decoder_tpl.tr_atten_tpl.atten_tpl.inner_atten_params.vn.seed : NoneType I0422 04:51:08.900477 140630240360192 base_runner.py:69] task.lm.stack.decoder_tpl.tr_atten_tpl.atten_tpl.is_eval : NoneType I0422 04:51:08.900532 140630240360192 base_runner.py:69] task.lm.stack.decoder_tpl.tr_atten_tpl.atten_tpl.is_inference : NoneType I0422 04:51:08.900584 140630240360192 base_runner.py:69] task.lm.stack.decoder_tpl.tr_atten_tpl.atten_tpl.name : '' I0422 04:51:08.900640 140630240360192 base_runner.py:69] task.lm.stack.decoder_tpl.tr_atten_tpl.atten_tpl.num_attention_heads : 2 I0422 04:51:08.900692 140630240360192 base_runner.py:69] task.lm.stack.decoder_tpl.tr_atten_tpl.atten_tpl.packed_input : False I0422 04:51:08.900748 140630240360192 base_runner.py:69] task.lm.stack.decoder_tpl.tr_atten_tpl.atten_tpl.params_init.method : 'xavier' I0422 04:51:08.900801 140630240360192 base_runner.py:69] task.lm.stack.decoder_tpl.tr_atten_tpl.atten_tpl.params_init.scale : 1.0 I0422 04:51:08.900856 140630240360192 base_runner.py:69] task.lm.stack.decoder_tpl.tr_atten_tpl.atten_tpl.params_init.seed : NoneType I0422 04:51:08.900911 140630240360192 base_runner.py:69] task.lm.stack.decoder_tpl.tr_atten_tpl.atten_tpl.qdomain.atten_context : NoneType I0422 04:51:08.900964 140630240360192 base_runner.py:69] task.lm.stack.decoder_tpl.tr_atten_tpl.atten_tpl.qdomain.default : NoneType I0422 04:51:08.901017 140630240360192 base_runner.py:69] task.lm.stack.decoder_tpl.tr_atten_tpl.atten_tpl.qdomain.fullyconnected : NoneType I0422 04:51:08.901072 140630240360192 base_runner.py:69] task.lm.stack.decoder_tpl.tr_atten_tpl.atten_tpl.qdomain.softmax : NoneType I0422 04:51:08.901125 140630240360192 base_runner.py:69] task.lm.stack.decoder_tpl.tr_atten_tpl.atten_tpl.query_dim : 0 I0422 04:51:08.901180 140630240360192 base_runner.py:69] task.lm.stack.decoder_tpl.tr_atten_tpl.atten_tpl.random_seed : NoneType I0422 04:51:08.901233 140630240360192 base_runner.py:69] task.lm.stack.decoder_tpl.tr_atten_tpl.atten_tpl.skip_lp_regularization : NoneType I0422 04:51:08.901287 140630240360192 base_runner.py:69] task.lm.stack.decoder_tpl.tr_atten_tpl.atten_tpl.source_dim : 0 I0422 04:51:08.901340 140630240360192 base_runner.py:69] task.lm.stack.decoder_tpl.tr_atten_tpl.atten_tpl.use_source_vec_as_attention_value : False I0422 04:51:08.901395 140630240360192 base_runner.py:69] task.lm.stack.decoder_tpl.tr_atten_tpl.atten_tpl.vn.global_vn : False I0422 04:51:08.901448 140630240360192 base_runner.py:69] task.lm.stack.decoder_tpl.tr_atten_tpl.atten_tpl.vn.per_step_vn : False I0422 04:51:08.901504 140630240360192 base_runner.py:69] task.lm.stack.decoder_tpl.tr_atten_tpl.atten_tpl.vn.scale : NoneType I0422 04:51:08.901557 140630240360192 base_runner.py:69] task.lm.stack.decoder_tpl.tr_atten_tpl.atten_tpl.vn.seed : NoneType I0422 04:51:08.901612 140630240360192 base_runner.py:69] task.lm.stack.decoder_tpl.tr_atten_tpl.cls : type/lingvo.core.layers_with_attention/TransformerAttentionLayer I0422 04:51:08.901665 140630240360192 base_runner.py:69] task.lm.stack.decoder_tpl.tr_atten_tpl.dtype : float32 I0422 04:51:08.901721 140630240360192 base_runner.py:69] task.lm.stack.decoder_tpl.tr_atten_tpl.fprop_dtype : NoneType I0422 04:51:08.901774 140630240360192 base_runner.py:69] task.lm.stack.decoder_tpl.tr_atten_tpl.inference_driver_name : NoneType I0422 04:51:08.901833 140630240360192 base_runner.py:69] task.lm.stack.decoder_tpl.tr_atten_tpl.is_eval : NoneType I0422 04:51:08.901887 140630240360192 base_runner.py:69] task.lm.stack.decoder_tpl.tr_atten_tpl.is_inference : NoneType I0422 04:51:08.901942 140630240360192 base_runner.py:69] task.lm.stack.decoder_tpl.tr_atten_tpl.is_masked : False I0422 04:51:08.901997 140630240360192 base_runner.py:69] task.lm.stack.decoder_tpl.tr_atten_tpl.ln_tpl.allow_implicit_capture : NoneType I0422 04:51:08.902050 140630240360192 base_runner.py:69] task.lm.stack.decoder_tpl.tr_atten_tpl.ln_tpl.cls : type/lingvo.core.layers/LayerNorm I0422 04:51:08.902105 140630240360192 base_runner.py:69] task.lm.stack.decoder_tpl.tr_atten_tpl.ln_tpl.dtype : float32 I0422 04:51:08.902158 140630240360192 base_runner.py:69] task.lm.stack.decoder_tpl.tr_atten_tpl.ln_tpl.epsilon : 1e-06 I0422 04:51:08.902213 140630240360192 base_runner.py:69] task.lm.stack.decoder_tpl.tr_atten_tpl.ln_tpl.fprop_dtype : NoneType I0422 04:51:08.902266 140630240360192 base_runner.py:69] task.lm.stack.decoder_tpl.tr_atten_tpl.ln_tpl.inference_driver_name : NoneType I0422 04:51:08.902321 140630240360192 base_runner.py:69] task.lm.stack.decoder_tpl.tr_atten_tpl.ln_tpl.input_dim : 0 I0422 04:51:08.902374 140630240360192 base_runner.py:69] task.lm.stack.decoder_tpl.tr_atten_tpl.ln_tpl.is_eval : NoneType I0422 04:51:08.902429 140630240360192 base_runner.py:69] task.lm.stack.decoder_tpl.tr_atten_tpl.ln_tpl.is_inference : NoneType I0422 04:51:08.902482 140630240360192 base_runner.py:69] task.lm.stack.decoder_tpl.tr_atten_tpl.ln_tpl.name : '' I0422 04:51:08.902537 140630240360192 base_runner.py:69] task.lm.stack.decoder_tpl.tr_atten_tpl.ln_tpl.params_init.method : 'xavier' I0422 04:51:08.902590 140630240360192 base_runner.py:69] task.lm.stack.decoder_tpl.tr_atten_tpl.ln_tpl.params_init.scale : 1.000001 I0422 04:51:08.902645 140630240360192 base_runner.py:69] task.lm.stack.decoder_tpl.tr_atten_tpl.ln_tpl.params_init.seed : NoneType I0422 04:51:08.902698 140630240360192 base_runner.py:69] task.lm.stack.decoder_tpl.tr_atten_tpl.ln_tpl.random_seed : NoneType I0422 04:51:08.902754 140630240360192 base_runner.py:69] task.lm.stack.decoder_tpl.tr_atten_tpl.ln_tpl.skip_lp_regularization : NoneType I0422 04:51:08.902806 140630240360192 base_runner.py:69] task.lm.stack.decoder_tpl.tr_atten_tpl.ln_tpl.vn.global_vn : False I0422 04:51:08.902861 140630240360192 base_runner.py:69] task.lm.stack.decoder_tpl.tr_atten_tpl.ln_tpl.vn.per_step_vn : False I0422 04:51:08.902932 140630240360192 base_runner.py:69] task.lm.stack.decoder_tpl.tr_atten_tpl.ln_tpl.vn.scale : NoneType I0422 04:51:08.902987 140630240360192 base_runner.py:69] task.lm.stack.decoder_tpl.tr_atten_tpl.ln_tpl.vn.seed : NoneType I0422 04:51:08.903042 140630240360192 base_runner.py:69] task.lm.stack.decoder_tpl.tr_atten_tpl.name : '' I0422 04:51:08.903096 140630240360192 base_runner.py:69] task.lm.stack.decoder_tpl.tr_atten_tpl.num_attention_heads : 8 I0422 04:51:08.903151 140630240360192 base_runner.py:69] task.lm.stack.decoder_tpl.tr_atten_tpl.packed_input : False I0422 04:51:08.903204 140630240360192 base_runner.py:69] task.lm.stack.decoder_tpl.tr_atten_tpl.params_init.method : 'xavier' I0422 04:51:08.903259 140630240360192 base_runner.py:69] task.lm.stack.decoder_tpl.tr_atten_tpl.params_init.scale : 1.000001 I0422 04:51:08.903314 140630240360192 base_runner.py:69] task.lm.stack.decoder_tpl.tr_atten_tpl.params_init.seed : NoneType I0422 04:51:08.903367 140630240360192 base_runner.py:69] task.lm.stack.decoder_tpl.tr_atten_tpl.random_seed : NoneType I0422 04:51:08.903422 140630240360192 base_runner.py:69] task.lm.stack.decoder_tpl.tr_atten_tpl.residual_dropout_prob : 0.0 I0422 04:51:08.903475 140630240360192 base_runner.py:69] task.lm.stack.decoder_tpl.tr_atten_tpl.residual_dropout_tpl.allow_implicit_capture : NoneType I0422 04:51:08.903531 140630240360192 base_runner.py:69] task.lm.stack.decoder_tpl.tr_atten_tpl.residual_dropout_tpl.cls : type/lingvo.core.layers/DropoutLayer I0422 04:51:08.903584 140630240360192 base_runner.py:69] task.lm.stack.decoder_tpl.tr_atten_tpl.residual_dropout_tpl.dropout_at_eval : False I0422 04:51:08.903642 140630240360192 base_runner.py:69] task.lm.stack.decoder_tpl.tr_atten_tpl.residual_dropout_tpl.dtype : float32 I0422 04:51:08.903697 140630240360192 base_runner.py:69] task.lm.stack.decoder_tpl.tr_atten_tpl.residual_dropout_tpl.fprop_dtype : NoneType I0422 04:51:08.903752 140630240360192 base_runner.py:69] task.lm.stack.decoder_tpl.tr_atten_tpl.residual_dropout_tpl.inference_driver_name : NoneType I0422 04:51:08.903805 140630240360192 base_runner.py:69] task.lm.stack.decoder_tpl.tr_atten_tpl.residual_dropout_tpl.is_eval : NoneType I0422 04:51:08.903861 140630240360192 base_runner.py:69] task.lm.stack.decoder_tpl.tr_atten_tpl.residual_dropout_tpl.is_inference : NoneType I0422 04:51:08.903914 140630240360192 base_runner.py:69] task.lm.stack.decoder_tpl.tr_atten_tpl.residual_dropout_tpl.keep_prob : 1.0 I0422 04:51:08.903969 140630240360192 base_runner.py:69] task.lm.stack.decoder_tpl.tr_atten_tpl.residual_dropout_tpl.name : '' I0422 04:51:08.904023 140630240360192 base_runner.py:69] task.lm.stack.decoder_tpl.tr_atten_tpl.residual_dropout_tpl.noise_shape : NoneType I0422 04:51:08.904078 140630240360192 base_runner.py:69] task.lm.stack.decoder_tpl.tr_atten_tpl.residual_dropout_tpl.noise_shape_broadcast_dims : NoneType I0422 04:51:08.904131 140630240360192 base_runner.py:69] task.lm.stack.decoder_tpl.tr_atten_tpl.residual_dropout_tpl.params_init.method : 'xavier' I0422 04:51:08.904186 140630240360192 base_runner.py:69] task.lm.stack.decoder_tpl.tr_atten_tpl.residual_dropout_tpl.params_init.scale : 1.000001 I0422 04:51:08.904241 140630240360192 base_runner.py:69] task.lm.stack.decoder_tpl.tr_atten_tpl.residual_dropout_tpl.params_init.seed : NoneType I0422 04:51:08.904294 140630240360192 base_runner.py:69] task.lm.stack.decoder_tpl.tr_atten_tpl.residual_dropout_tpl.random_seed : NoneType I0422 04:51:08.904349 140630240360192 base_runner.py:69] task.lm.stack.decoder_tpl.tr_atten_tpl.residual_dropout_tpl.skip_lp_regularization : NoneType I0422 04:51:08.904402 140630240360192 base_runner.py:69] task.lm.stack.decoder_tpl.tr_atten_tpl.residual_dropout_tpl.vn.global_vn : False I0422 04:51:08.904455 140630240360192 base_runner.py:69] task.lm.stack.decoder_tpl.tr_atten_tpl.residual_dropout_tpl.vn.per_step_vn : False I0422 04:51:08.904510 140630240360192 base_runner.py:69] task.lm.stack.decoder_tpl.tr_atten_tpl.residual_dropout_tpl.vn.scale : NoneType I0422 04:51:08.904563 140630240360192 base_runner.py:69] task.lm.stack.decoder_tpl.tr_atten_tpl.residual_dropout_tpl.vn.seed : NoneType I0422 04:51:08.904618 140630240360192 base_runner.py:69] task.lm.stack.decoder_tpl.tr_atten_tpl.skip_lp_regularization : NoneType I0422 04:51:08.904670 140630240360192 base_runner.py:69] task.lm.stack.decoder_tpl.tr_atten_tpl.source_dim : 0 I0422 04:51:08.904725 140630240360192 base_runner.py:69] task.lm.stack.decoder_tpl.tr_atten_tpl.vn.global_vn : False I0422 04:51:08.904778 140630240360192 base_runner.py:69] task.lm.stack.decoder_tpl.tr_atten_tpl.vn.per_step_vn : False I0422 04:51:08.904833 140630240360192 base_runner.py:69] task.lm.stack.decoder_tpl.tr_atten_tpl.vn.scale : NoneType I0422 04:51:08.904886 140630240360192 base_runner.py:69] task.lm.stack.decoder_tpl.tr_atten_tpl.vn.seed : NoneType I0422 04:51:08.904941 140630240360192 base_runner.py:69] task.lm.stack.decoder_tpl.tr_aux_atten_tpl : NoneType I0422 04:51:08.904994 140630240360192 base_runner.py:69] task.lm.stack.decoder_tpl.tr_fflayer_tpl.activation : 'RELU' I0422 04:51:08.905047 140630240360192 base_runner.py:69] task.lm.stack.decoder_tpl.tr_fflayer_tpl.allow_implicit_capture : NoneType I0422 04:51:08.905102 140630240360192 base_runner.py:69] task.lm.stack.decoder_tpl.tr_fflayer_tpl.cls : type/lingvo.core.layers_with_attention/TransformerFeedForwardLayer I0422 04:51:08.905157 140630240360192 base_runner.py:69] task.lm.stack.decoder_tpl.tr_fflayer_tpl.dtype : float32 I0422 04:51:08.905210 140630240360192 base_runner.py:69] task.lm.stack.decoder_tpl.tr_fflayer_tpl.fflayer_tpl.activation : ['RELU', 'NONE'] I0422 04:51:08.905267 140630240360192 base_runner.py:69] task.lm.stack.decoder_tpl.tr_fflayer_tpl.fflayer_tpl.allow_implicit_capture : NoneType I0422 04:51:08.905323 140630240360192 base_runner.py:69] task.lm.stack.decoder_tpl.tr_fflayer_tpl.fflayer_tpl.batch_norm : False I0422 04:51:08.905378 140630240360192 base_runner.py:69] task.lm.stack.decoder_tpl.tr_fflayer_tpl.fflayer_tpl.bn_fold_weights : NoneType I0422 04:51:08.905431 140630240360192 base_runner.py:69] task.lm.stack.decoder_tpl.tr_fflayer_tpl.fflayer_tpl.cls : type/lingvo.core.layers/FeedForwardNet I0422 04:51:08.905486 140630240360192 base_runner.py:69] task.lm.stack.decoder_tpl.tr_fflayer_tpl.fflayer_tpl.dropout.allow_implicit_capture : NoneType I0422 04:51:08.905539 140630240360192 base_runner.py:69] task.lm.stack.decoder_tpl.tr_fflayer_tpl.fflayer_tpl.dropout.cls : type/lingvo.core.layers/DropoutLayer I0422 04:51:08.905594 140630240360192 base_runner.py:69] task.lm.stack.decoder_tpl.tr_fflayer_tpl.fflayer_tpl.dropout.dropout_at_eval : False I0422 04:51:08.905647 140630240360192 base_runner.py:69] task.lm.stack.decoder_tpl.tr_fflayer_tpl.fflayer_tpl.dropout.dtype : float32 I0422 04:51:08.905702 140630240360192 base_runner.py:69] task.lm.stack.decoder_tpl.tr_fflayer_tpl.fflayer_tpl.dropout.fprop_dtype : NoneType I0422 04:51:08.905755 140630240360192 base_runner.py:69] task.lm.stack.decoder_tpl.tr_fflayer_tpl.fflayer_tpl.dropout.inference_driver_name : NoneType I0422 04:51:08.905811 140630240360192 base_runner.py:69] task.lm.stack.decoder_tpl.tr_fflayer_tpl.fflayer_tpl.dropout.is_eval : NoneType I0422 04:51:08.905864 140630240360192 base_runner.py:69] task.lm.stack.decoder_tpl.tr_fflayer_tpl.fflayer_tpl.dropout.is_inference : NoneType I0422 04:51:08.905919 140630240360192 base_runner.py:69] task.lm.stack.decoder_tpl.tr_fflayer_tpl.fflayer_tpl.dropout.keep_prob : 1.0 I0422 04:51:08.905972 140630240360192 base_runner.py:69] task.lm.stack.decoder_tpl.tr_fflayer_tpl.fflayer_tpl.dropout.name : '' I0422 04:51:08.906028 140630240360192 base_runner.py:69] task.lm.stack.decoder_tpl.tr_fflayer_tpl.fflayer_tpl.dropout.noise_shape : NoneType I0422 04:51:08.906083 140630240360192 base_runner.py:69] task.lm.stack.decoder_tpl.tr_fflayer_tpl.fflayer_tpl.dropout.noise_shape_broadcast_dims : NoneType I0422 04:51:08.906136 140630240360192 base_runner.py:69] task.lm.stack.decoder_tpl.tr_fflayer_tpl.fflayer_tpl.dropout.params_init.method : 'xavier' I0422 04:51:08.906192 140630240360192 base_runner.py:69] task.lm.stack.decoder_tpl.tr_fflayer_tpl.fflayer_tpl.dropout.params_init.scale : 1.000001 I0422 04:51:08.906245 140630240360192 base_runner.py:69] task.lm.stack.decoder_tpl.tr_fflayer_tpl.fflayer_tpl.dropout.params_init.seed : NoneType I0422 04:51:08.906301 140630240360192 base_runner.py:69] task.lm.stack.decoder_tpl.tr_fflayer_tpl.fflayer_tpl.dropout.random_seed : NoneType I0422 04:51:08.906354 140630240360192 base_runner.py:69] task.lm.stack.decoder_tpl.tr_fflayer_tpl.fflayer_tpl.dropout.skip_lp_regularization : NoneType I0422 04:51:08.906409 140630240360192 base_runner.py:69] task.lm.stack.decoder_tpl.tr_fflayer_tpl.fflayer_tpl.dropout.vn.global_vn : False I0422 04:51:08.906462 140630240360192 base_runner.py:69] task.lm.stack.decoder_tpl.tr_fflayer_tpl.fflayer_tpl.dropout.vn.per_step_vn : False I0422 04:51:08.906517 140630240360192 base_runner.py:69] task.lm.stack.decoder_tpl.tr_fflayer_tpl.fflayer_tpl.dropout.vn.scale : NoneType I0422 04:51:08.906572 140630240360192 base_runner.py:69] task.lm.stack.decoder_tpl.tr_fflayer_tpl.fflayer_tpl.dropout.vn.seed : NoneType I0422 04:51:08.906625 140630240360192 base_runner.py:69] task.lm.stack.decoder_tpl.tr_fflayer_tpl.fflayer_tpl.dtype : float32 I0422 04:51:08.906680 140630240360192 base_runner.py:69] task.lm.stack.decoder_tpl.tr_fflayer_tpl.fflayer_tpl.fprop_dtype : NoneType I0422 04:51:08.906733 140630240360192 base_runner.py:69] task.lm.stack.decoder_tpl.tr_fflayer_tpl.fflayer_tpl.inference_driver_name : NoneType I0422 04:51:08.906788 140630240360192 base_runner.py:69] task.lm.stack.decoder_tpl.tr_fflayer_tpl.fflayer_tpl.input_dim : 0 I0422 04:51:08.906846 140630240360192 base_runner.py:69] task.lm.stack.decoder_tpl.tr_fflayer_tpl.fflayer_tpl.is_eval : NoneType I0422 04:51:08.906908 140630240360192 base_runner.py:69] task.lm.stack.decoder_tpl.tr_fflayer_tpl.fflayer_tpl.is_inference : NoneType I0422 04:51:08.906963 140630240360192 base_runner.py:69] task.lm.stack.decoder_tpl.tr_fflayer_tpl.fflayer_tpl.name : '' I0422 04:51:08.907017 140630240360192 base_runner.py:69] task.lm.stack.decoder_tpl.tr_fflayer_tpl.fflayer_tpl.params_init.method : 'xavier' I0422 04:51:08.907072 140630240360192 base_runner.py:69] task.lm.stack.decoder_tpl.tr_fflayer_tpl.fflayer_tpl.params_init.scale : 1.000001 I0422 04:51:08.907126 140630240360192 base_runner.py:69] task.lm.stack.decoder_tpl.tr_fflayer_tpl.fflayer_tpl.params_init.seed : NoneType I0422 04:51:08.907181 140630240360192 base_runner.py:69] task.lm.stack.decoder_tpl.tr_fflayer_tpl.fflayer_tpl.qdomain.default : NoneType I0422 04:51:08.907234 140630240360192 base_runner.py:69] task.lm.stack.decoder_tpl.tr_fflayer_tpl.fflayer_tpl.random_seed : NoneType I0422 04:51:08.907289 140630240360192 base_runner.py:69] task.lm.stack.decoder_tpl.tr_fflayer_tpl.fflayer_tpl.skip_connections : NoneType I0422 04:51:08.907342 140630240360192 base_runner.py:69] task.lm.stack.decoder_tpl.tr_fflayer_tpl.fflayer_tpl.skip_lp_regularization : NoneType I0422 04:51:08.907397 140630240360192 base_runner.py:69] task.lm.stack.decoder_tpl.tr_fflayer_tpl.fflayer_tpl.vn.global_vn : False I0422 04:51:08.907452 140630240360192 base_runner.py:69] task.lm.stack.decoder_tpl.tr_fflayer_tpl.fflayer_tpl.vn.per_step_vn : False I0422 04:51:08.907505 140630240360192 base_runner.py:69] task.lm.stack.decoder_tpl.tr_fflayer_tpl.fflayer_tpl.vn.scale : NoneType I0422 04:51:08.907560 140630240360192 base_runner.py:69] task.lm.stack.decoder_tpl.tr_fflayer_tpl.fflayer_tpl.vn.seed : NoneType I0422 04:51:08.907614 140630240360192 base_runner.py:69] task.lm.stack.decoder_tpl.tr_fflayer_tpl.fprop_dtype : NoneType I0422 04:51:08.907669 140630240360192 base_runner.py:69] task.lm.stack.decoder_tpl.tr_fflayer_tpl.hidden_dim : 2048 I0422 04:51:08.907722 140630240360192 base_runner.py:69] task.lm.stack.decoder_tpl.tr_fflayer_tpl.inference_driver_name : NoneType I0422 04:51:08.907778 140630240360192 base_runner.py:69] task.lm.stack.decoder_tpl.tr_fflayer_tpl.input_dim : 0 I0422 04:51:08.907833 140630240360192 base_runner.py:69] task.lm.stack.decoder_tpl.tr_fflayer_tpl.is_eval : NoneType I0422 04:51:08.907886 140630240360192 base_runner.py:69] task.lm.stack.decoder_tpl.tr_fflayer_tpl.is_inference : NoneType I0422 04:51:08.907941 140630240360192 base_runner.py:69] task.lm.stack.decoder_tpl.tr_fflayer_tpl.ln_tpl.allow_implicit_capture : NoneType I0422 04:51:08.907994 140630240360192 base_runner.py:69] task.lm.stack.decoder_tpl.tr_fflayer_tpl.ln_tpl.cls : type/lingvo.core.layers/LayerNorm I0422 04:51:08.908049 140630240360192 base_runner.py:69] task.lm.stack.decoder_tpl.tr_fflayer_tpl.ln_tpl.dtype : float32 I0422 04:51:08.908103 140630240360192 base_runner.py:69] task.lm.stack.decoder_tpl.tr_fflayer_tpl.ln_tpl.epsilon : 1e-06 I0422 04:51:08.908158 140630240360192 base_runner.py:69] task.lm.stack.decoder_tpl.tr_fflayer_tpl.ln_tpl.fprop_dtype : NoneType I0422 04:51:08.908211 140630240360192 base_runner.py:69] task.lm.stack.decoder_tpl.tr_fflayer_tpl.ln_tpl.inference_driver_name : NoneType I0422 04:51:08.908266 140630240360192 base_runner.py:69] task.lm.stack.decoder_tpl.tr_fflayer_tpl.ln_tpl.input_dim : 0 I0422 04:51:08.908319 140630240360192 base_runner.py:69] task.lm.stack.decoder_tpl.tr_fflayer_tpl.ln_tpl.is_eval : NoneType I0422 04:51:08.908375 140630240360192 base_runner.py:69] task.lm.stack.decoder_tpl.tr_fflayer_tpl.ln_tpl.is_inference : NoneType I0422 04:51:08.908428 140630240360192 base_runner.py:69] task.lm.stack.decoder_tpl.tr_fflayer_tpl.ln_tpl.name : '' I0422 04:51:08.908483 140630240360192 base_runner.py:69] task.lm.stack.decoder_tpl.tr_fflayer_tpl.ln_tpl.params_init.method : 'xavier' I0422 04:51:08.908535 140630240360192 base_runner.py:69] task.lm.stack.decoder_tpl.tr_fflayer_tpl.ln_tpl.params_init.scale : 1.000001 I0422 04:51:08.908593 140630240360192 base_runner.py:69] task.lm.stack.decoder_tpl.tr_fflayer_tpl.ln_tpl.params_init.seed : NoneType I0422 04:51:08.908648 140630240360192 base_runner.py:69] task.lm.stack.decoder_tpl.tr_fflayer_tpl.ln_tpl.random_seed : NoneType I0422 04:51:08.908703 140630240360192 base_runner.py:69] task.lm.stack.decoder_tpl.tr_fflayer_tpl.ln_tpl.skip_lp_regularization : NoneType I0422 04:51:08.908756 140630240360192 base_runner.py:69] task.lm.stack.decoder_tpl.tr_fflayer_tpl.ln_tpl.vn.global_vn : False I0422 04:51:08.908811 140630240360192 base_runner.py:69] task.lm.stack.decoder_tpl.tr_fflayer_tpl.ln_tpl.vn.per_step_vn : False I0422 04:51:08.908864 140630240360192 base_runner.py:69] task.lm.stack.decoder_tpl.tr_fflayer_tpl.ln_tpl.vn.scale : NoneType I0422 04:51:08.908919 140630240360192 base_runner.py:69] task.lm.stack.decoder_tpl.tr_fflayer_tpl.ln_tpl.vn.seed : NoneType I0422 04:51:08.908972 140630240360192 base_runner.py:69] task.lm.stack.decoder_tpl.tr_fflayer_tpl.name : '' I0422 04:51:08.909027 140630240360192 base_runner.py:69] task.lm.stack.decoder_tpl.tr_fflayer_tpl.output_dim : 0 I0422 04:51:08.909080 140630240360192 base_runner.py:69] task.lm.stack.decoder_tpl.tr_fflayer_tpl.params_init.method : 'xavier' I0422 04:51:08.909135 140630240360192 base_runner.py:69] task.lm.stack.decoder_tpl.tr_fflayer_tpl.params_init.scale : 1.000001 I0422 04:51:08.909188 140630240360192 base_runner.py:69] task.lm.stack.decoder_tpl.tr_fflayer_tpl.params_init.seed : NoneType I0422 04:51:08.909241 140630240360192 base_runner.py:69] task.lm.stack.decoder_tpl.tr_fflayer_tpl.random_seed : NoneType I0422 04:51:08.909296 140630240360192 base_runner.py:69] task.lm.stack.decoder_tpl.tr_fflayer_tpl.relu_dropout_prob : 0.0 I0422 04:51:08.909351 140630240360192 base_runner.py:69] task.lm.stack.decoder_tpl.tr_fflayer_tpl.res_proj_tpl.activation : 'RELU' I0422 04:51:08.909404 140630240360192 base_runner.py:69] task.lm.stack.decoder_tpl.tr_fflayer_tpl.res_proj_tpl.affine_last : False I0422 04:51:08.909459 140630240360192 base_runner.py:69] task.lm.stack.decoder_tpl.tr_fflayer_tpl.res_proj_tpl.allow_implicit_capture : NoneType I0422 04:51:08.909512 140630240360192 base_runner.py:69] task.lm.stack.decoder_tpl.tr_fflayer_tpl.res_proj_tpl.batch_norm : True I0422 04:51:08.909565 140630240360192 base_runner.py:69] task.lm.stack.decoder_tpl.tr_fflayer_tpl.res_proj_tpl.bias_init : 0.0 I0422 04:51:08.909620 140630240360192 base_runner.py:69] task.lm.stack.decoder_tpl.tr_fflayer_tpl.res_proj_tpl.bn_fold_weights : NoneType I0422 04:51:08.909673 140630240360192 base_runner.py:69] task.lm.stack.decoder_tpl.tr_fflayer_tpl.res_proj_tpl.cls : type/lingvo.core.layers/ProjectionLayer I0422 04:51:08.909728 140630240360192 base_runner.py:69] task.lm.stack.decoder_tpl.tr_fflayer_tpl.res_proj_tpl.dtype : float32 I0422 04:51:08.909781 140630240360192 base_runner.py:69] task.lm.stack.decoder_tpl.tr_fflayer_tpl.res_proj_tpl.fprop_dtype : NoneType I0422 04:51:08.909836 140630240360192 base_runner.py:69] task.lm.stack.decoder_tpl.tr_fflayer_tpl.res_proj_tpl.has_bias : False I0422 04:51:08.909889 140630240360192 base_runner.py:69] task.lm.stack.decoder_tpl.tr_fflayer_tpl.res_proj_tpl.inference_driver_name : NoneType I0422 04:51:08.909943 140630240360192 base_runner.py:69] task.lm.stack.decoder_tpl.tr_fflayer_tpl.res_proj_tpl.input_dim : 0 I0422 04:51:08.909996 140630240360192 base_runner.py:69] task.lm.stack.decoder_tpl.tr_fflayer_tpl.res_proj_tpl.is_eval : NoneType I0422 04:51:08.910051 140630240360192 base_runner.py:69] task.lm.stack.decoder_tpl.tr_fflayer_tpl.res_proj_tpl.is_inference : NoneType I0422 04:51:08.910104 140630240360192 base_runner.py:69] task.lm.stack.decoder_tpl.tr_fflayer_tpl.res_proj_tpl.name : '' I0422 04:51:08.910157 140630240360192 base_runner.py:69] task.lm.stack.decoder_tpl.tr_fflayer_tpl.res_proj_tpl.output_dim : 0 I0422 04:51:08.910212 140630240360192 base_runner.py:69] task.lm.stack.decoder_tpl.tr_fflayer_tpl.res_proj_tpl.params_init.method : 'xavier' I0422 04:51:08.910269 140630240360192 base_runner.py:69] task.lm.stack.decoder_tpl.tr_fflayer_tpl.res_proj_tpl.params_init.scale : 1.000001 I0422 04:51:08.910324 140630240360192 base_runner.py:69] task.lm.stack.decoder_tpl.tr_fflayer_tpl.res_proj_tpl.params_init.seed : NoneType I0422 04:51:08.910473 140630240360192 base_runner.py:69] task.lm.stack.decoder_tpl.tr_fflayer_tpl.res_proj_tpl.qdomain.default : NoneType I0422 04:51:08.910535 140630240360192 base_runner.py:69] task.lm.stack.decoder_tpl.tr_fflayer_tpl.res_proj_tpl.random_seed : NoneType I0422 04:51:08.910592 140630240360192 base_runner.py:69] task.lm.stack.decoder_tpl.tr_fflayer_tpl.res_proj_tpl.skip_lp_regularization : NoneType I0422 04:51:08.910645 140630240360192 base_runner.py:69] task.lm.stack.decoder_tpl.tr_fflayer_tpl.res_proj_tpl.vn.global_vn : False I0422 04:51:08.910701 140630240360192 base_runner.py:69] task.lm.stack.decoder_tpl.tr_fflayer_tpl.res_proj_tpl.vn.per_step_vn : False I0422 04:51:08.910753 140630240360192 base_runner.py:69] task.lm.stack.decoder_tpl.tr_fflayer_tpl.res_proj_tpl.vn.scale : NoneType I0422 04:51:08.910808 140630240360192 base_runner.py:69] task.lm.stack.decoder_tpl.tr_fflayer_tpl.res_proj_tpl.vn.seed : NoneType I0422 04:51:08.910862 140630240360192 base_runner.py:69] task.lm.stack.decoder_tpl.tr_fflayer_tpl.res_proj_tpl.weight_norm : False I0422 04:51:08.910923 140630240360192 base_runner.py:69] task.lm.stack.decoder_tpl.tr_fflayer_tpl.residual_dropout_prob : 0.0 I0422 04:51:08.910979 140630240360192 base_runner.py:69] task.lm.stack.decoder_tpl.tr_fflayer_tpl.residual_dropout_tpl.allow_implicit_capture : NoneType I0422 04:51:08.911032 140630240360192 base_runner.py:69] task.lm.stack.decoder_tpl.tr_fflayer_tpl.residual_dropout_tpl.cls : type/lingvo.core.layers/DropoutLayer I0422 04:51:08.911087 140630240360192 base_runner.py:69] task.lm.stack.decoder_tpl.tr_fflayer_tpl.residual_dropout_tpl.dropout_at_eval : False I0422 04:51:08.911140 140630240360192 base_runner.py:69] task.lm.stack.decoder_tpl.tr_fflayer_tpl.residual_dropout_tpl.dtype : float32 I0422 04:51:08.911195 140630240360192 base_runner.py:69] task.lm.stack.decoder_tpl.tr_fflayer_tpl.residual_dropout_tpl.fprop_dtype : NoneType I0422 04:51:08.911250 140630240360192 base_runner.py:69] task.lm.stack.decoder_tpl.tr_fflayer_tpl.residual_dropout_tpl.inference_driver_name : NoneType I0422 04:51:08.911303 140630240360192 base_runner.py:69] task.lm.stack.decoder_tpl.tr_fflayer_tpl.residual_dropout_tpl.is_eval : NoneType I0422 04:51:08.911358 140630240360192 base_runner.py:69] task.lm.stack.decoder_tpl.tr_fflayer_tpl.residual_dropout_tpl.is_inference : NoneType I0422 04:51:08.911411 140630240360192 base_runner.py:69] task.lm.stack.decoder_tpl.tr_fflayer_tpl.residual_dropout_tpl.keep_prob : 1.0 I0422 04:51:08.911464 140630240360192 base_runner.py:69] task.lm.stack.decoder_tpl.tr_fflayer_tpl.residual_dropout_tpl.name : '' I0422 04:51:08.911520 140630240360192 base_runner.py:69] task.lm.stack.decoder_tpl.tr_fflayer_tpl.residual_dropout_tpl.noise_shape : NoneType I0422 04:51:08.911573 140630240360192 base_runner.py:69] task.lm.stack.decoder_tpl.tr_fflayer_tpl.residual_dropout_tpl.noise_shape_broadcast_dims : NoneType I0422 04:51:08.911628 140630240360192 base_runner.py:69] task.lm.stack.decoder_tpl.tr_fflayer_tpl.residual_dropout_tpl.params_init.method : 'xavier' I0422 04:51:08.911681 140630240360192 base_runner.py:69] task.lm.stack.decoder_tpl.tr_fflayer_tpl.residual_dropout_tpl.params_init.scale : 1.000001 I0422 04:51:08.911736 140630240360192 base_runner.py:69] task.lm.stack.decoder_tpl.tr_fflayer_tpl.residual_dropout_tpl.params_init.seed : NoneType I0422 04:51:08.911789 140630240360192 base_runner.py:69] task.lm.stack.decoder_tpl.tr_fflayer_tpl.residual_dropout_tpl.random_seed : NoneType I0422 04:51:08.911843 140630240360192 base_runner.py:69] task.lm.stack.decoder_tpl.tr_fflayer_tpl.residual_dropout_tpl.skip_lp_regularization : NoneType I0422 04:51:08.911895 140630240360192 base_runner.py:69] task.lm.stack.decoder_tpl.tr_fflayer_tpl.residual_dropout_tpl.vn.global_vn : False I0422 04:51:08.911956 140630240360192 base_runner.py:69] task.lm.stack.decoder_tpl.tr_fflayer_tpl.residual_dropout_tpl.vn.per_step_vn : False I0422 04:51:08.912010 140630240360192 base_runner.py:69] task.lm.stack.decoder_tpl.tr_fflayer_tpl.residual_dropout_tpl.vn.scale : NoneType I0422 04:51:08.912065 140630240360192 base_runner.py:69] task.lm.stack.decoder_tpl.tr_fflayer_tpl.residual_dropout_tpl.vn.seed : NoneType I0422 04:51:08.912118 140630240360192 base_runner.py:69] task.lm.stack.decoder_tpl.tr_fflayer_tpl.skip_lp_regularization : NoneType I0422 04:51:08.912173 140630240360192 base_runner.py:69] task.lm.stack.decoder_tpl.tr_fflayer_tpl.vn.global_vn : False I0422 04:51:08.912228 140630240360192 base_runner.py:69] task.lm.stack.decoder_tpl.tr_fflayer_tpl.vn.per_step_vn : False I0422 04:51:08.912281 140630240360192 base_runner.py:69] task.lm.stack.decoder_tpl.tr_fflayer_tpl.vn.scale : NoneType I0422 04:51:08.912334 140630240360192 base_runner.py:69] task.lm.stack.decoder_tpl.tr_fflayer_tpl.vn.seed : NoneType I0422 04:51:08.912389 140630240360192 base_runner.py:69] task.lm.stack.decoder_tpl.transparent_merger_tpl : NoneType I0422 04:51:08.912442 140630240360192 base_runner.py:69] task.lm.stack.decoder_tpl.vn.global_vn : False I0422 04:51:08.912496 140630240360192 base_runner.py:69] task.lm.stack.decoder_tpl.vn.per_step_vn : False I0422 04:51:08.912549 140630240360192 base_runner.py:69] task.lm.stack.decoder_tpl.vn.scale : NoneType I0422 04:51:08.912604 140630240360192 base_runner.py:69] task.lm.stack.decoder_tpl.vn.seed : NoneType I0422 04:51:08.912657 140630240360192 base_runner.py:69] task.lm.stack.dtype : float32 I0422 04:51:08.912712 140630240360192 base_runner.py:69] task.lm.stack.emb_tpl.add_tgt_embedding_layer : False I0422 04:51:08.912765 140630240360192 base_runner.py:69] task.lm.stack.emb_tpl.allow_implicit_capture : NoneType I0422 04:51:08.912820 140630240360192 base_runner.py:69] task.lm.stack.emb_tpl.cls : type/lingvo.core.layers_with_gpipe/GPipeTransformerEmbeddingLayer I0422 04:51:08.912873 140630240360192 base_runner.py:69] task.lm.stack.emb_tpl.dropout_tpl.allow_implicit_capture : NoneType I0422 04:51:08.912928 140630240360192 base_runner.py:69] task.lm.stack.emb_tpl.dropout_tpl.cls : type/lingvo.core.layers/DropoutLayer I0422 04:51:08.912981 140630240360192 base_runner.py:69] task.lm.stack.emb_tpl.dropout_tpl.dropout_at_eval : False I0422 04:51:08.913034 140630240360192 base_runner.py:69] task.lm.stack.emb_tpl.dropout_tpl.dtype : float32 I0422 04:51:08.913089 140630240360192 base_runner.py:69] task.lm.stack.emb_tpl.dropout_tpl.fprop_dtype : NoneType I0422 04:51:08.913142 140630240360192 base_runner.py:69] task.lm.stack.emb_tpl.dropout_tpl.inference_driver_name : NoneType I0422 04:51:08.913197 140630240360192 base_runner.py:69] task.lm.stack.emb_tpl.dropout_tpl.is_eval : NoneType I0422 04:51:08.913250 140630240360192 base_runner.py:69] task.lm.stack.emb_tpl.dropout_tpl.is_inference : NoneType I0422 04:51:08.913305 140630240360192 base_runner.py:69] task.lm.stack.emb_tpl.dropout_tpl.keep_prob : 1.0 I0422 04:51:08.913357 140630240360192 base_runner.py:69] task.lm.stack.emb_tpl.dropout_tpl.name : '' I0422 04:51:08.913412 140630240360192 base_runner.py:69] task.lm.stack.emb_tpl.dropout_tpl.noise_shape : NoneType I0422 04:51:08.913465 140630240360192 base_runner.py:69] task.lm.stack.emb_tpl.dropout_tpl.noise_shape_broadcast_dims : NoneType I0422 04:51:08.913518 140630240360192 base_runner.py:69] task.lm.stack.emb_tpl.dropout_tpl.params_init.method : 'xavier' I0422 04:51:08.913573 140630240360192 base_runner.py:69] task.lm.stack.emb_tpl.dropout_tpl.params_init.scale : 1.000001 I0422 04:51:08.913626 140630240360192 base_runner.py:69] task.lm.stack.emb_tpl.dropout_tpl.params_init.seed : NoneType I0422 04:51:08.913681 140630240360192 base_runner.py:69] task.lm.stack.emb_tpl.dropout_tpl.random_seed : NoneType I0422 04:51:08.913734 140630240360192 base_runner.py:69] task.lm.stack.emb_tpl.dropout_tpl.skip_lp_regularization : NoneType I0422 04:51:08.913789 140630240360192 base_runner.py:69] task.lm.stack.emb_tpl.dropout_tpl.vn.global_vn : False I0422 04:51:08.913847 140630240360192 base_runner.py:69] task.lm.stack.emb_tpl.dropout_tpl.vn.per_step_vn : False I0422 04:51:08.913902 140630240360192 base_runner.py:69] task.lm.stack.emb_tpl.dropout_tpl.vn.scale : NoneType I0422 04:51:08.913955 140630240360192 base_runner.py:69] task.lm.stack.emb_tpl.dropout_tpl.vn.seed : NoneType I0422 04:51:08.914010 140630240360192 base_runner.py:69] task.lm.stack.emb_tpl.dtype : float32 I0422 04:51:08.914063 140630240360192 base_runner.py:69] task.lm.stack.emb_tpl.fprop_dtype : NoneType I0422 04:51:08.914118 140630240360192 base_runner.py:69] task.lm.stack.emb_tpl.inference_driver_name : NoneType I0422 04:51:08.914170 140630240360192 base_runner.py:69] task.lm.stack.emb_tpl.input_dropout_prob : 0.0 I0422 04:51:08.914225 140630240360192 base_runner.py:69] task.lm.stack.emb_tpl.is_eval : NoneType I0422 04:51:08.914278 140630240360192 base_runner.py:69] task.lm.stack.emb_tpl.is_inference : NoneType I0422 04:51:08.914333 140630240360192 base_runner.py:69] task.lm.stack.emb_tpl.is_transparent : False I0422 04:51:08.914386 140630240360192 base_runner.py:69] task.lm.stack.emb_tpl.max_seq_len : 300 I0422 04:51:08.914439 140630240360192 base_runner.py:69] task.lm.stack.emb_tpl.name : '' I0422 04:51:08.914493 140630240360192 base_runner.py:69] task.lm.stack.emb_tpl.packed_input : False I0422 04:51:08.914546 140630240360192 base_runner.py:69] task.lm.stack.emb_tpl.params_init.method : 'xavier' I0422 04:51:08.914599 140630240360192 base_runner.py:69] task.lm.stack.emb_tpl.params_init.scale : 1.000001 I0422 04:51:08.914654 140630240360192 base_runner.py:69] task.lm.stack.emb_tpl.params_init.seed : NoneType I0422 04:51:08.914706 140630240360192 base_runner.py:69] task.lm.stack.emb_tpl.position_emb.allow_implicit_capture : NoneType I0422 04:51:08.914761 140630240360192 base_runner.py:69] task.lm.stack.emb_tpl.position_emb.cls : type/lingvo.core.layers/PositionalEmbeddingLayer I0422 04:51:08.914814 140630240360192 base_runner.py:69] task.lm.stack.emb_tpl.position_emb.dtype : float32 I0422 04:51:08.914871 140630240360192 base_runner.py:69] task.lm.stack.emb_tpl.position_emb.embedding_dim : 0 I0422 04:51:08.914927 140630240360192 base_runner.py:69] task.lm.stack.emb_tpl.position_emb.fprop_dtype : NoneType I0422 04:51:08.914983 140630240360192 base_runner.py:69] task.lm.stack.emb_tpl.position_emb.inference_driver_name : NoneType I0422 04:51:08.915036 140630240360192 base_runner.py:69] task.lm.stack.emb_tpl.position_emb.is_eval : NoneType I0422 04:51:08.915090 140630240360192 base_runner.py:69] task.lm.stack.emb_tpl.position_emb.is_inference : NoneType I0422 04:51:08.915144 140630240360192 base_runner.py:69] task.lm.stack.emb_tpl.position_emb.max_timescale : 10000 I0422 04:51:08.915198 140630240360192 base_runner.py:69] task.lm.stack.emb_tpl.position_emb.min_timescale : 1 I0422 04:51:08.915251 140630240360192 base_runner.py:69] task.lm.stack.emb_tpl.position_emb.name : '' I0422 04:51:08.915306 140630240360192 base_runner.py:69] task.lm.stack.emb_tpl.position_emb.params_init.method : 'xavier' I0422 04:51:08.915359 140630240360192 base_runner.py:69] task.lm.stack.emb_tpl.position_emb.params_init.scale : 1.000001 I0422 04:51:08.915412 140630240360192 base_runner.py:69] task.lm.stack.emb_tpl.position_emb.params_init.seed : NoneType I0422 04:51:08.915466 140630240360192 base_runner.py:69] task.lm.stack.emb_tpl.position_emb.random_seed : NoneType I0422 04:51:08.915519 140630240360192 base_runner.py:69] task.lm.stack.emb_tpl.position_emb.skip_lp_regularization : NoneType I0422 04:51:08.915574 140630240360192 base_runner.py:69] task.lm.stack.emb_tpl.position_emb.trainable_scaling : False I0422 04:51:08.915627 140630240360192 base_runner.py:69] task.lm.stack.emb_tpl.position_emb.trainable_scaling_init : 1.0 I0422 04:51:08.915682 140630240360192 base_runner.py:69] task.lm.stack.emb_tpl.position_emb.vn.global_vn : False I0422 04:51:08.915735 140630240360192 base_runner.py:69] task.lm.stack.emb_tpl.position_emb.vn.per_step_vn : False I0422 04:51:08.915807 140630240360192 base_runner.py:69] task.lm.stack.emb_tpl.position_emb.vn.scale : NoneType I0422 04:51:08.915903 140630240360192 base_runner.py:69] task.lm.stack.emb_tpl.position_emb.vn.seed : NoneType I0422 04:51:08.915962 140630240360192 base_runner.py:69] task.lm.stack.emb_tpl.random_seed : NoneType I0422 04:51:08.916022 140630240360192 base_runner.py:69] task.lm.stack.emb_tpl.skip_lp_regularization : NoneType I0422 04:51:08.916081 140630240360192 base_runner.py:69] task.lm.stack.emb_tpl.token_emb.allow_implicit_capture : NoneType I0422 04:51:08.916141 140630240360192 base_runner.py:69] task.lm.stack.emb_tpl.token_emb.apply_pruning : False I0422 04:51:08.916229 140630240360192 base_runner.py:69] task.lm.stack.emb_tpl.token_emb.cls : type/lingvo.core.layers/SimpleEmbeddingLayer I0422 04:51:08.916297 140630240360192 base_runner.py:69] task.lm.stack.emb_tpl.token_emb.dtype : float32 I0422 04:51:08.916352 140630240360192 base_runner.py:69] task.lm.stack.emb_tpl.token_emb.embedding_dim : 0 I0422 04:51:08.916407 140630240360192 base_runner.py:69] task.lm.stack.emb_tpl.token_emb.fprop_dtype : NoneType I0422 04:51:08.916460 140630240360192 base_runner.py:69] task.lm.stack.emb_tpl.token_emb.fprop_mode : NoneType I0422 04:51:08.916513 140630240360192 base_runner.py:69] task.lm.stack.emb_tpl.token_emb.inference_driver_name : NoneType I0422 04:51:08.916568 140630240360192 base_runner.py:69] task.lm.stack.emb_tpl.token_emb.is_eval : NoneType I0422 04:51:08.916621 140630240360192 base_runner.py:69] task.lm.stack.emb_tpl.token_emb.is_inference : NoneType I0422 04:51:08.916676 140630240360192 base_runner.py:69] task.lm.stack.emb_tpl.token_emb.name : '' I0422 04:51:08.916729 140630240360192 base_runner.py:69] task.lm.stack.emb_tpl.token_emb.params_init.method : 'xavier' I0422 04:51:08.916783 140630240360192 base_runner.py:69] task.lm.stack.emb_tpl.token_emb.params_init.scale : 1.000001 I0422 04:51:08.916836 140630240360192 base_runner.py:69] task.lm.stack.emb_tpl.token_emb.params_init.seed : NoneType I0422 04:51:08.916889 140630240360192 base_runner.py:69] task.lm.stack.emb_tpl.token_emb.qdomain.default : NoneType I0422 04:51:08.916944 140630240360192 base_runner.py:69] task.lm.stack.emb_tpl.token_emb.random_seed : NoneType I0422 04:51:08.916997 140630240360192 base_runner.py:69] task.lm.stack.emb_tpl.token_emb.skip_lp_regularization : NoneType I0422 04:51:08.917052 140630240360192 base_runner.py:69] task.lm.stack.emb_tpl.token_emb.use_3d_weight_tensor : False I0422 04:51:08.917105 140630240360192 base_runner.py:69] task.lm.stack.emb_tpl.token_emb.use_matmul : False I0422 04:51:08.917160 140630240360192 base_runner.py:69] task.lm.stack.emb_tpl.token_emb.vn.global_vn : False I0422 04:51:08.917212 140630240360192 base_runner.py:69] task.lm.stack.emb_tpl.token_emb.vn.per_step_vn : False I0422 04:51:08.917265 140630240360192 base_runner.py:69] task.lm.stack.emb_tpl.token_emb.vn.scale : NoneType I0422 04:51:08.917320 140630240360192 base_runner.py:69] task.lm.stack.emb_tpl.token_emb.vn.seed : NoneType I0422 04:51:08.917372 140630240360192 base_runner.py:69] task.lm.stack.emb_tpl.token_emb.vocab_size : 0 I0422 04:51:08.917427 140630240360192 base_runner.py:69] task.lm.stack.emb_tpl.vn.global_vn : False I0422 04:51:08.917479 140630240360192 base_runner.py:69] task.lm.stack.emb_tpl.vn.per_step_vn : False I0422 04:51:08.917532 140630240360192 base_runner.py:69] task.lm.stack.emb_tpl.vn.scale : NoneType I0422 04:51:08.917587 140630240360192 base_runner.py:69] task.lm.stack.emb_tpl.vn.seed : NoneType I0422 04:51:08.917640 140630240360192 base_runner.py:69] task.lm.stack.encoder_tpl.allow_implicit_capture : NoneType I0422 04:51:08.917695 140630240360192 base_runner.py:69] task.lm.stack.encoder_tpl.cls : type/lingvo.core.layers_with_gpipe/GPipeTransformerLayer I0422 04:51:08.917748 140630240360192 base_runner.py:69] task.lm.stack.encoder_tpl.dtype : float32 I0422 04:51:08.917803 140630240360192 base_runner.py:69] task.lm.stack.encoder_tpl.fprop_dtype : NoneType I0422 04:51:08.917856 140630240360192 base_runner.py:69] task.lm.stack.encoder_tpl.has_aux_atten : False I0422 04:51:08.917915 140630240360192 base_runner.py:69] task.lm.stack.encoder_tpl.inference_driver_name : NoneType I0422 04:51:08.917969 140630240360192 base_runner.py:69] task.lm.stack.encoder_tpl.is_decoder : False I0422 04:51:08.918024 140630240360192 base_runner.py:69] task.lm.stack.encoder_tpl.is_eval : NoneType I0422 04:51:08.918077 140630240360192 base_runner.py:69] task.lm.stack.encoder_tpl.is_inference : NoneType I0422 04:51:08.918131 140630240360192 base_runner.py:69] task.lm.stack.encoder_tpl.is_transparent : False I0422 04:51:08.918184 140630240360192 base_runner.py:69] task.lm.stack.encoder_tpl.mask_self_atten : True I0422 04:51:08.918237 140630240360192 base_runner.py:69] task.lm.stack.encoder_tpl.name : '' I0422 04:51:08.918292 140630240360192 base_runner.py:69] task.lm.stack.encoder_tpl.num_transparent_outputs : 0 I0422 04:51:08.918345 140630240360192 base_runner.py:69] task.lm.stack.encoder_tpl.output_dim : 0 I0422 04:51:08.918400 140630240360192 base_runner.py:69] task.lm.stack.encoder_tpl.packed_input : False I0422 04:51:08.918453 140630240360192 base_runner.py:69] task.lm.stack.encoder_tpl.params_init.method : 'xavier' I0422 04:51:08.918508 140630240360192 base_runner.py:69] task.lm.stack.encoder_tpl.params_init.scale : 1.000001 I0422 04:51:08.918561 140630240360192 base_runner.py:69] task.lm.stack.encoder_tpl.params_init.seed : NoneType I0422 04:51:08.918616 140630240360192 base_runner.py:69] task.lm.stack.encoder_tpl.random_seed : NoneType I0422 04:51:08.918669 140630240360192 base_runner.py:69] task.lm.stack.encoder_tpl.skip_lp_regularization : NoneType I0422 04:51:08.918724 140630240360192 base_runner.py:69] task.lm.stack.encoder_tpl.source_dim : 0 I0422 04:51:08.918777 140630240360192 base_runner.py:69] task.lm.stack.encoder_tpl.tr_atten_tpl.add_unnormalized_input : False I0422 04:51:08.918832 140630240360192 base_runner.py:69] task.lm.stack.encoder_tpl.tr_atten_tpl.allow_implicit_capture : NoneType I0422 04:51:08.918899 140630240360192 base_runner.py:69] task.lm.stack.encoder_tpl.tr_atten_tpl.atten_dropout_prob : 0.0 I0422 04:51:08.918956 140630240360192 base_runner.py:69] task.lm.stack.encoder_tpl.tr_atten_tpl.atten_hidden_dim : 0 I0422 04:51:08.919009 140630240360192 base_runner.py:69] task.lm.stack.encoder_tpl.tr_atten_tpl.atten_tpl.allow_implicit_capture : NoneType I0422 04:51:08.919064 140630240360192 base_runner.py:69] task.lm.stack.encoder_tpl.tr_atten_tpl.atten_tpl.atten_dropout_deterministic : False I0422 04:51:08.919117 140630240360192 base_runner.py:69] task.lm.stack.encoder_tpl.tr_atten_tpl.atten_tpl.atten_dropout_prob : 0.0 I0422 04:51:08.919171 140630240360192 base_runner.py:69] task.lm.stack.encoder_tpl.tr_atten_tpl.atten_tpl.cls : type/lingvo.core.attention/MultiHeadedAttention I0422 04:51:08.919224 140630240360192 base_runner.py:69] task.lm.stack.encoder_tpl.tr_atten_tpl.atten_tpl.context_dim : 0 I0422 04:51:08.919280 140630240360192 base_runner.py:69] task.lm.stack.encoder_tpl.tr_atten_tpl.atten_tpl.ctx_post_proj_dim : 0 I0422 04:51:08.919333 140630240360192 base_runner.py:69] task.lm.stack.encoder_tpl.tr_atten_tpl.atten_tpl.dtype : float32 I0422 04:51:08.919388 140630240360192 base_runner.py:69] task.lm.stack.encoder_tpl.tr_atten_tpl.atten_tpl.enable_ctx_post_proj : True I0422 04:51:08.919441 140630240360192 base_runner.py:69] task.lm.stack.encoder_tpl.tr_atten_tpl.atten_tpl.enable_ctx_pre_proj : True I0422 04:51:08.919496 140630240360192 base_runner.py:69] task.lm.stack.encoder_tpl.tr_atten_tpl.atten_tpl.enable_query_proj : True I0422 04:51:08.919548 140630240360192 base_runner.py:69] task.lm.stack.encoder_tpl.tr_atten_tpl.atten_tpl.enable_source_proj : True I0422 04:51:08.919603 140630240360192 base_runner.py:69] task.lm.stack.encoder_tpl.tr_atten_tpl.atten_tpl.fprop_dtype : NoneType I0422 04:51:08.919657 140630240360192 base_runner.py:69] task.lm.stack.encoder_tpl.tr_atten_tpl.atten_tpl.hidden_dim : 0 I0422 04:51:08.919711 140630240360192 base_runner.py:69] task.lm.stack.encoder_tpl.tr_atten_tpl.atten_tpl.inference_driver_name : NoneType I0422 04:51:08.919764 140630240360192 base_runner.py:69] task.lm.stack.encoder_tpl.tr_atten_tpl.atten_tpl.inner_atten_params.allow_implicit_capture : NoneType I0422 04:51:08.919823 140630240360192 base_runner.py:69] task.lm.stack.encoder_tpl.tr_atten_tpl.atten_tpl.inner_atten_params.atten_dropout_deterministic : False I0422 04:51:08.919878 140630240360192 base_runner.py:69] task.lm.stack.encoder_tpl.tr_atten_tpl.atten_tpl.inner_atten_params.atten_dropout_prob : 0.0 I0422 04:51:08.919933 140630240360192 base_runner.py:69] task.lm.stack.encoder_tpl.tr_atten_tpl.atten_tpl.inner_atten_params.cls : type/lingvo.core.attention/DotProductAttention I0422 04:51:08.919986 140630240360192 base_runner.py:69] task.lm.stack.encoder_tpl.tr_atten_tpl.atten_tpl.inner_atten_params.dtype : float32 I0422 04:51:08.920041 140630240360192 base_runner.py:69] task.lm.stack.encoder_tpl.tr_atten_tpl.atten_tpl.inner_atten_params.fprop_dtype : NoneType I0422 04:51:08.920094 140630240360192 base_runner.py:69] task.lm.stack.encoder_tpl.tr_atten_tpl.atten_tpl.inner_atten_params.hidden_dim : 0 I0422 04:51:08.920149 140630240360192 base_runner.py:69] task.lm.stack.encoder_tpl.tr_atten_tpl.atten_tpl.inner_atten_params.inference_driver_name : NoneType I0422 04:51:08.920202 140630240360192 base_runner.py:69] task.lm.stack.encoder_tpl.tr_atten_tpl.atten_tpl.inner_atten_params.is_eval : NoneType I0422 04:51:08.920255 140630240360192 base_runner.py:69] task.lm.stack.encoder_tpl.tr_atten_tpl.atten_tpl.inner_atten_params.is_inference : NoneType I0422 04:51:08.920310 140630240360192 base_runner.py:69] task.lm.stack.encoder_tpl.tr_atten_tpl.atten_tpl.inner_atten_params.name : '' I0422 04:51:08.920363 140630240360192 base_runner.py:69] task.lm.stack.encoder_tpl.tr_atten_tpl.atten_tpl.inner_atten_params.packed_input : False I0422 04:51:08.920417 140630240360192 base_runner.py:69] task.lm.stack.encoder_tpl.tr_atten_tpl.atten_tpl.inner_atten_params.params_init.method : 'xavier' I0422 04:51:08.920469 140630240360192 base_runner.py:69] task.lm.stack.encoder_tpl.tr_atten_tpl.atten_tpl.inner_atten_params.params_init.scale : 1.000001 I0422 04:51:08.920523 140630240360192 base_runner.py:69] task.lm.stack.encoder_tpl.tr_atten_tpl.atten_tpl.inner_atten_params.params_init.seed : NoneType I0422 04:51:08.920578 140630240360192 base_runner.py:69] task.lm.stack.encoder_tpl.tr_atten_tpl.atten_tpl.inner_atten_params.qdomain.default : NoneType I0422 04:51:08.920631 140630240360192 base_runner.py:69] task.lm.stack.encoder_tpl.tr_atten_tpl.atten_tpl.inner_atten_params.qdomain.fullyconnected : NoneType I0422 04:51:08.920685 140630240360192 base_runner.py:69] task.lm.stack.encoder_tpl.tr_atten_tpl.atten_tpl.inner_atten_params.qdomain.softmax : NoneType I0422 04:51:08.920738 140630240360192 base_runner.py:69] task.lm.stack.encoder_tpl.tr_atten_tpl.atten_tpl.inner_atten_params.query_dim : 0 I0422 04:51:08.920793 140630240360192 base_runner.py:69] task.lm.stack.encoder_tpl.tr_atten_tpl.atten_tpl.inner_atten_params.random_seed : NoneType I0422 04:51:08.920846 140630240360192 base_runner.py:69] task.lm.stack.encoder_tpl.tr_atten_tpl.atten_tpl.inner_atten_params.skip_lp_regularization : NoneType I0422 04:51:08.920901 140630240360192 base_runner.py:69] task.lm.stack.encoder_tpl.tr_atten_tpl.atten_tpl.inner_atten_params.source_dim : 0 I0422 04:51:08.920953 140630240360192 base_runner.py:69] task.lm.stack.encoder_tpl.tr_atten_tpl.atten_tpl.inner_atten_params.vn.global_vn : False I0422 04:51:08.921008 140630240360192 base_runner.py:69] task.lm.stack.encoder_tpl.tr_atten_tpl.atten_tpl.inner_atten_params.vn.per_step_vn : False I0422 04:51:08.921061 140630240360192 base_runner.py:69] task.lm.stack.encoder_tpl.tr_atten_tpl.atten_tpl.inner_atten_params.vn.scale : NoneType I0422 04:51:08.921114 140630240360192 base_runner.py:69] task.lm.stack.encoder_tpl.tr_atten_tpl.atten_tpl.inner_atten_params.vn.seed : NoneType I0422 04:51:08.921169 140630240360192 base_runner.py:69] task.lm.stack.encoder_tpl.tr_atten_tpl.atten_tpl.is_eval : NoneType I0422 04:51:08.921224 140630240360192 base_runner.py:69] task.lm.stack.encoder_tpl.tr_atten_tpl.atten_tpl.is_inference : NoneType I0422 04:51:08.921286 140630240360192 base_runner.py:69] task.lm.stack.encoder_tpl.tr_atten_tpl.atten_tpl.name : '' I0422 04:51:08.921340 140630240360192 base_runner.py:69] task.lm.stack.encoder_tpl.tr_atten_tpl.atten_tpl.num_attention_heads : 2 I0422 04:51:08.921396 140630240360192 base_runner.py:69] task.lm.stack.encoder_tpl.tr_atten_tpl.atten_tpl.packed_input : False I0422 04:51:08.921448 140630240360192 base_runner.py:69] task.lm.stack.encoder_tpl.tr_atten_tpl.atten_tpl.params_init.method : 'xavier' I0422 04:51:08.921503 140630240360192 base_runner.py:69] task.lm.stack.encoder_tpl.tr_atten_tpl.atten_tpl.params_init.scale : 1.0 I0422 04:51:08.921556 140630240360192 base_runner.py:69] task.lm.stack.encoder_tpl.tr_atten_tpl.atten_tpl.params_init.seed : NoneType I0422 04:51:08.921611 140630240360192 base_runner.py:69] task.lm.stack.encoder_tpl.tr_atten_tpl.atten_tpl.qdomain.atten_context : NoneType I0422 04:51:08.921663 140630240360192 base_runner.py:69] task.lm.stack.encoder_tpl.tr_atten_tpl.atten_tpl.qdomain.default : NoneType I0422 04:51:08.921717 140630240360192 base_runner.py:69] task.lm.stack.encoder_tpl.tr_atten_tpl.atten_tpl.qdomain.fullyconnected : NoneType I0422 04:51:08.921772 140630240360192 base_runner.py:69] task.lm.stack.encoder_tpl.tr_atten_tpl.atten_tpl.qdomain.softmax : NoneType I0422 04:51:08.921825 140630240360192 base_runner.py:69] task.lm.stack.encoder_tpl.tr_atten_tpl.atten_tpl.query_dim : 0 I0422 04:51:08.921880 140630240360192 base_runner.py:69] task.lm.stack.encoder_tpl.tr_atten_tpl.atten_tpl.random_seed : NoneType I0422 04:51:08.921933 140630240360192 base_runner.py:69] task.lm.stack.encoder_tpl.tr_atten_tpl.atten_tpl.skip_lp_regularization : NoneType I0422 04:51:08.921987 140630240360192 base_runner.py:69] task.lm.stack.encoder_tpl.tr_atten_tpl.atten_tpl.source_dim : 0 I0422 04:51:08.922040 140630240360192 base_runner.py:69] task.lm.stack.encoder_tpl.tr_atten_tpl.atten_tpl.use_source_vec_as_attention_value : False I0422 04:51:08.922096 140630240360192 base_runner.py:69] task.lm.stack.encoder_tpl.tr_atten_tpl.atten_tpl.vn.global_vn : False I0422 04:51:08.922149 140630240360192 base_runner.py:69] task.lm.stack.encoder_tpl.tr_atten_tpl.atten_tpl.vn.per_step_vn : False I0422 04:51:08.922203 140630240360192 base_runner.py:69] task.lm.stack.encoder_tpl.tr_atten_tpl.atten_tpl.vn.scale : NoneType I0422 04:51:08.922256 140630240360192 base_runner.py:69] task.lm.stack.encoder_tpl.tr_atten_tpl.atten_tpl.vn.seed : NoneType I0422 04:51:08.922312 140630240360192 base_runner.py:69] task.lm.stack.encoder_tpl.tr_atten_tpl.cls : type/lingvo.core.layers_with_attention/TransformerAttentionLayer I0422 04:51:08.922365 140630240360192 base_runner.py:69] task.lm.stack.encoder_tpl.tr_atten_tpl.dtype : float32 I0422 04:51:08.922420 140630240360192 base_runner.py:69] task.lm.stack.encoder_tpl.tr_atten_tpl.fprop_dtype : NoneType I0422 04:51:08.922473 140630240360192 base_runner.py:69] task.lm.stack.encoder_tpl.tr_atten_tpl.inference_driver_name : NoneType I0422 04:51:08.922528 140630240360192 base_runner.py:69] task.lm.stack.encoder_tpl.tr_atten_tpl.is_eval : NoneType I0422 04:51:08.922583 140630240360192 base_runner.py:69] task.lm.stack.encoder_tpl.tr_atten_tpl.is_inference : NoneType I0422 04:51:08.922656 140630240360192 base_runner.py:69] task.lm.stack.encoder_tpl.tr_atten_tpl.is_masked : True I0422 04:51:08.922713 140630240360192 base_runner.py:69] task.lm.stack.encoder_tpl.tr_atten_tpl.ln_tpl.allow_implicit_capture : NoneType I0422 04:51:08.922770 140630240360192 base_runner.py:69] task.lm.stack.encoder_tpl.tr_atten_tpl.ln_tpl.cls : type/lingvo.core.layers/LayerNorm I0422 04:51:08.922826 140630240360192 base_runner.py:69] task.lm.stack.encoder_tpl.tr_atten_tpl.ln_tpl.dtype : float32 I0422 04:51:08.922887 140630240360192 base_runner.py:69] task.lm.stack.encoder_tpl.tr_atten_tpl.ln_tpl.epsilon : 1e-06 I0422 04:51:08.922981 140630240360192 base_runner.py:69] task.lm.stack.encoder_tpl.tr_atten_tpl.ln_tpl.fprop_dtype : NoneType I0422 04:51:08.923037 140630240360192 base_runner.py:69] task.lm.stack.encoder_tpl.tr_atten_tpl.ln_tpl.inference_driver_name : NoneType I0422 04:51:08.923095 140630240360192 base_runner.py:69] task.lm.stack.encoder_tpl.tr_atten_tpl.ln_tpl.input_dim : 0 I0422 04:51:08.923151 140630240360192 base_runner.py:69] task.lm.stack.encoder_tpl.tr_atten_tpl.ln_tpl.is_eval : NoneType I0422 04:51:08.923206 140630240360192 base_runner.py:69] task.lm.stack.encoder_tpl.tr_atten_tpl.ln_tpl.is_inference : NoneType I0422 04:51:08.923259 140630240360192 base_runner.py:69] task.lm.stack.encoder_tpl.tr_atten_tpl.ln_tpl.name : '' I0422 04:51:08.923314 140630240360192 base_runner.py:69] task.lm.stack.encoder_tpl.tr_atten_tpl.ln_tpl.params_init.method : 'xavier' I0422 04:51:08.923367 140630240360192 base_runner.py:69] task.lm.stack.encoder_tpl.tr_atten_tpl.ln_tpl.params_init.scale : 1.000001 I0422 04:51:08.923420 140630240360192 base_runner.py:69] task.lm.stack.encoder_tpl.tr_atten_tpl.ln_tpl.params_init.seed : NoneType I0422 04:51:08.923475 140630240360192 base_runner.py:69] task.lm.stack.encoder_tpl.tr_atten_tpl.ln_tpl.random_seed : NoneType I0422 04:51:08.923528 140630240360192 base_runner.py:69] task.lm.stack.encoder_tpl.tr_atten_tpl.ln_tpl.skip_lp_regularization : NoneType I0422 04:51:08.923583 140630240360192 base_runner.py:69] task.lm.stack.encoder_tpl.tr_atten_tpl.ln_tpl.vn.global_vn : False I0422 04:51:08.923638 140630240360192 base_runner.py:69] task.lm.stack.encoder_tpl.tr_atten_tpl.ln_tpl.vn.per_step_vn : False I0422 04:51:08.923691 140630240360192 base_runner.py:69] task.lm.stack.encoder_tpl.tr_atten_tpl.ln_tpl.vn.scale : NoneType I0422 04:51:08.923746 140630240360192 base_runner.py:69] task.lm.stack.encoder_tpl.tr_atten_tpl.ln_tpl.vn.seed : NoneType I0422 04:51:08.923799 140630240360192 base_runner.py:69] task.lm.stack.encoder_tpl.tr_atten_tpl.name : '' I0422 04:51:08.923852 140630240360192 base_runner.py:69] task.lm.stack.encoder_tpl.tr_atten_tpl.num_attention_heads : 16 I0422 04:51:08.923907 140630240360192 base_runner.py:69] task.lm.stack.encoder_tpl.tr_atten_tpl.packed_input : False I0422 04:51:08.923960 140630240360192 base_runner.py:69] task.lm.stack.encoder_tpl.tr_atten_tpl.params_init.method : 'xavier' I0422 04:51:08.924015 140630240360192 base_runner.py:69] task.lm.stack.encoder_tpl.tr_atten_tpl.params_init.scale : 1.000001 I0422 04:51:08.924068 140630240360192 base_runner.py:69] task.lm.stack.encoder_tpl.tr_atten_tpl.params_init.seed : NoneType I0422 04:51:08.924123 140630240360192 base_runner.py:69] task.lm.stack.encoder_tpl.tr_atten_tpl.random_seed : NoneType I0422 04:51:08.924176 140630240360192 base_runner.py:69] task.lm.stack.encoder_tpl.tr_atten_tpl.residual_dropout_prob : 0.0 I0422 04:51:08.924231 140630240360192 base_runner.py:69] task.lm.stack.encoder_tpl.tr_atten_tpl.residual_dropout_tpl.allow_implicit_capture : NoneType I0422 04:51:08.924284 140630240360192 base_runner.py:69] task.lm.stack.encoder_tpl.tr_atten_tpl.residual_dropout_tpl.cls : type/lingvo.core.layers/DropoutLayer I0422 04:51:08.924340 140630240360192 base_runner.py:69] task.lm.stack.encoder_tpl.tr_atten_tpl.residual_dropout_tpl.dropout_at_eval : False I0422 04:51:08.924392 140630240360192 base_runner.py:69] task.lm.stack.encoder_tpl.tr_atten_tpl.residual_dropout_tpl.dtype : float32 I0422 04:51:08.924448 140630240360192 base_runner.py:69] task.lm.stack.encoder_tpl.tr_atten_tpl.residual_dropout_tpl.fprop_dtype : NoneType I0422 04:51:08.924500 140630240360192 base_runner.py:69] task.lm.stack.encoder_tpl.tr_atten_tpl.residual_dropout_tpl.inference_driver_name : NoneType I0422 04:51:08.924556 140630240360192 base_runner.py:69] task.lm.stack.encoder_tpl.tr_atten_tpl.residual_dropout_tpl.is_eval : NoneType I0422 04:51:08.924609 140630240360192 base_runner.py:69] task.lm.stack.encoder_tpl.tr_atten_tpl.residual_dropout_tpl.is_inference : NoneType I0422 04:51:08.924664 140630240360192 base_runner.py:69] task.lm.stack.encoder_tpl.tr_atten_tpl.residual_dropout_tpl.keep_prob : 1.0 I0422 04:51:08.924716 140630240360192 base_runner.py:69] task.lm.stack.encoder_tpl.tr_atten_tpl.residual_dropout_tpl.name : '' I0422 04:51:08.924772 140630240360192 base_runner.py:69] task.lm.stack.encoder_tpl.tr_atten_tpl.residual_dropout_tpl.noise_shape : NoneType I0422 04:51:08.924829 140630240360192 base_runner.py:69] task.lm.stack.encoder_tpl.tr_atten_tpl.residual_dropout_tpl.noise_shape_broadcast_dims : NoneType I0422 04:51:08.924885 140630240360192 base_runner.py:69] task.lm.stack.encoder_tpl.tr_atten_tpl.residual_dropout_tpl.params_init.method : 'xavier' I0422 04:51:08.924938 140630240360192 base_runner.py:69] task.lm.stack.encoder_tpl.tr_atten_tpl.residual_dropout_tpl.params_init.scale : 1.000001 I0422 04:51:08.924993 140630240360192 base_runner.py:69] task.lm.stack.encoder_tpl.tr_atten_tpl.residual_dropout_tpl.params_init.seed : NoneType I0422 04:51:08.925048 140630240360192 base_runner.py:69] task.lm.stack.encoder_tpl.tr_atten_tpl.residual_dropout_tpl.random_seed : NoneType I0422 04:51:08.925101 140630240360192 base_runner.py:69] task.lm.stack.encoder_tpl.tr_atten_tpl.residual_dropout_tpl.skip_lp_regularization : NoneType I0422 04:51:08.925154 140630240360192 base_runner.py:69] task.lm.stack.encoder_tpl.tr_atten_tpl.residual_dropout_tpl.vn.global_vn : False I0422 04:51:08.925209 140630240360192 base_runner.py:69] task.lm.stack.encoder_tpl.tr_atten_tpl.residual_dropout_tpl.vn.per_step_vn : False I0422 04:51:08.925262 140630240360192 base_runner.py:69] task.lm.stack.encoder_tpl.tr_atten_tpl.residual_dropout_tpl.vn.scale : NoneType I0422 04:51:08.925317 140630240360192 base_runner.py:69] task.lm.stack.encoder_tpl.tr_atten_tpl.residual_dropout_tpl.vn.seed : NoneType I0422 04:51:08.925370 140630240360192 base_runner.py:69] task.lm.stack.encoder_tpl.tr_atten_tpl.skip_lp_regularization : NoneType I0422 04:51:08.925424 140630240360192 base_runner.py:69] task.lm.stack.encoder_tpl.tr_atten_tpl.source_dim : 0 I0422 04:51:08.925477 140630240360192 base_runner.py:69] task.lm.stack.encoder_tpl.tr_atten_tpl.vn.global_vn : False I0422 04:51:08.925533 140630240360192 base_runner.py:69] task.lm.stack.encoder_tpl.tr_atten_tpl.vn.per_step_vn : False I0422 04:51:08.925586 140630240360192 base_runner.py:69] task.lm.stack.encoder_tpl.tr_atten_tpl.vn.scale : NoneType I0422 04:51:08.925641 140630240360192 base_runner.py:69] task.lm.stack.encoder_tpl.tr_atten_tpl.vn.seed : NoneType I0422 04:51:08.925694 140630240360192 base_runner.py:69] task.lm.stack.encoder_tpl.tr_aux_atten_tpl : NoneType I0422 04:51:08.925749 140630240360192 base_runner.py:69] task.lm.stack.encoder_tpl.tr_fflayer_tpl.activation : 'RELU' I0422 04:51:08.925802 140630240360192 base_runner.py:69] task.lm.stack.encoder_tpl.tr_fflayer_tpl.allow_implicit_capture : NoneType I0422 04:51:08.925858 140630240360192 base_runner.py:69] task.lm.stack.encoder_tpl.tr_fflayer_tpl.cls : type/lingvo.core.layers_with_attention/TransformerFeedForwardLayer I0422 04:51:08.925911 140630240360192 base_runner.py:69] task.lm.stack.encoder_tpl.tr_fflayer_tpl.dtype : float32 I0422 04:51:08.925966 140630240360192 base_runner.py:69] task.lm.stack.encoder_tpl.tr_fflayer_tpl.fflayer_tpl.activation : ['RELU', 'NONE'] I0422 04:51:08.926021 140630240360192 base_runner.py:69] task.lm.stack.encoder_tpl.tr_fflayer_tpl.fflayer_tpl.allow_implicit_capture : NoneType I0422 04:51:08.926074 140630240360192 base_runner.py:69] task.lm.stack.encoder_tpl.tr_fflayer_tpl.fflayer_tpl.batch_norm : False I0422 04:51:08.926129 140630240360192 base_runner.py:69] task.lm.stack.encoder_tpl.tr_fflayer_tpl.fflayer_tpl.bn_fold_weights : NoneType I0422 04:51:08.926182 140630240360192 base_runner.py:69] task.lm.stack.encoder_tpl.tr_fflayer_tpl.fflayer_tpl.cls : type/lingvo.core.layers/FeedForwardNet I0422 04:51:08.926237 140630240360192 base_runner.py:69] task.lm.stack.encoder_tpl.tr_fflayer_tpl.fflayer_tpl.dropout.allow_implicit_capture : NoneType I0422 04:51:08.926290 140630240360192 base_runner.py:69] task.lm.stack.encoder_tpl.tr_fflayer_tpl.fflayer_tpl.dropout.cls : type/lingvo.core.layers/DropoutLayer I0422 04:51:08.926345 140630240360192 base_runner.py:69] task.lm.stack.encoder_tpl.tr_fflayer_tpl.fflayer_tpl.dropout.dropout_at_eval : False I0422 04:51:08.926398 140630240360192 base_runner.py:69] task.lm.stack.encoder_tpl.tr_fflayer_tpl.fflayer_tpl.dropout.dtype : float32 I0422 04:51:08.926455 140630240360192 base_runner.py:69] task.lm.stack.encoder_tpl.tr_fflayer_tpl.fflayer_tpl.dropout.fprop_dtype : NoneType I0422 04:51:08.926512 140630240360192 base_runner.py:69] task.lm.stack.encoder_tpl.tr_fflayer_tpl.fflayer_tpl.dropout.inference_driver_name : NoneType I0422 04:51:08.926567 140630240360192 base_runner.py:69] task.lm.stack.encoder_tpl.tr_fflayer_tpl.fflayer_tpl.dropout.is_eval : NoneType I0422 04:51:08.926620 140630240360192 base_runner.py:69] task.lm.stack.encoder_tpl.tr_fflayer_tpl.fflayer_tpl.dropout.is_inference : NoneType I0422 04:51:08.926673 140630240360192 base_runner.py:69] task.lm.stack.encoder_tpl.tr_fflayer_tpl.fflayer_tpl.dropout.keep_prob : 1.0 I0422 04:51:08.926728 140630240360192 base_runner.py:69] task.lm.stack.encoder_tpl.tr_fflayer_tpl.fflayer_tpl.dropout.name : '' I0422 04:51:08.926781 140630240360192 base_runner.py:69] task.lm.stack.encoder_tpl.tr_fflayer_tpl.fflayer_tpl.dropout.noise_shape : NoneType I0422 04:51:08.926836 140630240360192 base_runner.py:69] task.lm.stack.encoder_tpl.tr_fflayer_tpl.fflayer_tpl.dropout.noise_shape_broadcast_dims : NoneType I0422 04:51:08.926894 140630240360192 base_runner.py:69] task.lm.stack.encoder_tpl.tr_fflayer_tpl.fflayer_tpl.dropout.params_init.method : 'xavier' I0422 04:51:08.926950 140630240360192 base_runner.py:69] task.lm.stack.encoder_tpl.tr_fflayer_tpl.fflayer_tpl.dropout.params_init.scale : 1.000001 I0422 04:51:08.927005 140630240360192 base_runner.py:69] task.lm.stack.encoder_tpl.tr_fflayer_tpl.fflayer_tpl.dropout.params_init.seed : NoneType I0422 04:51:08.927057 140630240360192 base_runner.py:69] task.lm.stack.encoder_tpl.tr_fflayer_tpl.fflayer_tpl.dropout.random_seed : NoneType I0422 04:51:08.927113 140630240360192 base_runner.py:69] task.lm.stack.encoder_tpl.tr_fflayer_tpl.fflayer_tpl.dropout.skip_lp_regularization : NoneType I0422 04:51:08.927166 140630240360192 base_runner.py:69] task.lm.stack.encoder_tpl.tr_fflayer_tpl.fflayer_tpl.dropout.vn.global_vn : False I0422 04:51:08.927221 140630240360192 base_runner.py:69] task.lm.stack.encoder_tpl.tr_fflayer_tpl.fflayer_tpl.dropout.vn.per_step_vn : False I0422 04:51:08.927273 140630240360192 base_runner.py:69] task.lm.stack.encoder_tpl.tr_fflayer_tpl.fflayer_tpl.dropout.vn.scale : NoneType I0422 04:51:08.927326 140630240360192 base_runner.py:69] task.lm.stack.encoder_tpl.tr_fflayer_tpl.fflayer_tpl.dropout.vn.seed : NoneType I0422 04:51:08.927381 140630240360192 base_runner.py:69] task.lm.stack.encoder_tpl.tr_fflayer_tpl.fflayer_tpl.dtype : float32 I0422 04:51:08.927434 140630240360192 base_runner.py:69] task.lm.stack.encoder_tpl.tr_fflayer_tpl.fflayer_tpl.fprop_dtype : NoneType I0422 04:51:08.927489 140630240360192 base_runner.py:69] task.lm.stack.encoder_tpl.tr_fflayer_tpl.fflayer_tpl.inference_driver_name : NoneType I0422 04:51:08.927542 140630240360192 base_runner.py:69] task.lm.stack.encoder_tpl.tr_fflayer_tpl.fflayer_tpl.input_dim : 0 I0422 04:51:08.927596 140630240360192 base_runner.py:69] task.lm.stack.encoder_tpl.tr_fflayer_tpl.fflayer_tpl.is_eval : NoneType I0422 04:51:08.927649 140630240360192 base_runner.py:69] task.lm.stack.encoder_tpl.tr_fflayer_tpl.fflayer_tpl.is_inference : NoneType I0422 04:51:08.927704 140630240360192 base_runner.py:69] task.lm.stack.encoder_tpl.tr_fflayer_tpl.fflayer_tpl.name : '' I0422 04:51:08.927757 140630240360192 base_runner.py:69] task.lm.stack.encoder_tpl.tr_fflayer_tpl.fflayer_tpl.params_init.method : 'xavier' I0422 04:51:08.927810 140630240360192 base_runner.py:69] task.lm.stack.encoder_tpl.tr_fflayer_tpl.fflayer_tpl.params_init.scale : 1.000001 I0422 04:51:08.927865 140630240360192 base_runner.py:69] task.lm.stack.encoder_tpl.tr_fflayer_tpl.fflayer_tpl.params_init.seed : NoneType I0422 04:51:08.927918 140630240360192 base_runner.py:69] task.lm.stack.encoder_tpl.tr_fflayer_tpl.fflayer_tpl.qdomain.default : NoneType I0422 04:51:08.927972 140630240360192 base_runner.py:69] task.lm.stack.encoder_tpl.tr_fflayer_tpl.fflayer_tpl.random_seed : NoneType I0422 04:51:08.928029 140630240360192 base_runner.py:69] task.lm.stack.encoder_tpl.tr_fflayer_tpl.fflayer_tpl.skip_connections : NoneType I0422 04:51:08.928083 140630240360192 base_runner.py:69] task.lm.stack.encoder_tpl.tr_fflayer_tpl.fflayer_tpl.skip_lp_regularization : NoneType I0422 04:51:08.928137 140630240360192 base_runner.py:69] task.lm.stack.encoder_tpl.tr_fflayer_tpl.fflayer_tpl.vn.global_vn : False I0422 04:51:08.928190 140630240360192 base_runner.py:69] task.lm.stack.encoder_tpl.tr_fflayer_tpl.fflayer_tpl.vn.per_step_vn : False I0422 04:51:08.928245 140630240360192 base_runner.py:69] task.lm.stack.encoder_tpl.tr_fflayer_tpl.fflayer_tpl.vn.scale : NoneType I0422 04:51:08.928298 140630240360192 base_runner.py:69] task.lm.stack.encoder_tpl.tr_fflayer_tpl.fflayer_tpl.vn.seed : NoneType I0422 04:51:08.928353 140630240360192 base_runner.py:69] task.lm.stack.encoder_tpl.tr_fflayer_tpl.fprop_dtype : NoneType I0422 04:51:08.928406 140630240360192 base_runner.py:69] task.lm.stack.encoder_tpl.tr_fflayer_tpl.hidden_dim : 8192 I0422 04:51:08.928461 140630240360192 base_runner.py:69] task.lm.stack.encoder_tpl.tr_fflayer_tpl.inference_driver_name : NoneType I0422 04:51:08.928514 140630240360192 base_runner.py:69] task.lm.stack.encoder_tpl.tr_fflayer_tpl.input_dim : 0 I0422 04:51:08.928569 140630240360192 base_runner.py:69] task.lm.stack.encoder_tpl.tr_fflayer_tpl.is_eval : NoneType I0422 04:51:08.928622 140630240360192 base_runner.py:69] task.lm.stack.encoder_tpl.tr_fflayer_tpl.is_inference : NoneType I0422 04:51:08.928675 140630240360192 base_runner.py:69] task.lm.stack.encoder_tpl.tr_fflayer_tpl.ln_tpl.allow_implicit_capture : NoneType I0422 04:51:08.928730 140630240360192 base_runner.py:69] task.lm.stack.encoder_tpl.tr_fflayer_tpl.ln_tpl.cls : type/lingvo.core.layers/LayerNorm I0422 04:51:08.928783 140630240360192 base_runner.py:69] task.lm.stack.encoder_tpl.tr_fflayer_tpl.ln_tpl.dtype : float32 I0422 04:51:08.928837 140630240360192 base_runner.py:69] task.lm.stack.encoder_tpl.tr_fflayer_tpl.ln_tpl.epsilon : 1e-06 I0422 04:51:08.928890 140630240360192 base_runner.py:69] task.lm.stack.encoder_tpl.tr_fflayer_tpl.ln_tpl.fprop_dtype : NoneType I0422 04:51:08.928946 140630240360192 base_runner.py:69] task.lm.stack.encoder_tpl.tr_fflayer_tpl.ln_tpl.inference_driver_name : NoneType I0422 04:51:08.929001 140630240360192 base_runner.py:69] task.lm.stack.encoder_tpl.tr_fflayer_tpl.ln_tpl.input_dim : 0 I0422 04:51:08.929054 140630240360192 base_runner.py:69] task.lm.stack.encoder_tpl.tr_fflayer_tpl.ln_tpl.is_eval : NoneType I0422 04:51:08.929107 140630240360192 base_runner.py:69] task.lm.stack.encoder_tpl.tr_fflayer_tpl.ln_tpl.is_inference : NoneType I0422 04:51:08.929161 140630240360192 base_runner.py:69] task.lm.stack.encoder_tpl.tr_fflayer_tpl.ln_tpl.name : '' I0422 04:51:08.929214 140630240360192 base_runner.py:69] task.lm.stack.encoder_tpl.tr_fflayer_tpl.ln_tpl.params_init.method : 'xavier' I0422 04:51:08.929267 140630240360192 base_runner.py:69] task.lm.stack.encoder_tpl.tr_fflayer_tpl.ln_tpl.params_init.scale : 1.000001 I0422 04:51:08.929322 140630240360192 base_runner.py:69] task.lm.stack.encoder_tpl.tr_fflayer_tpl.ln_tpl.params_init.seed : NoneType I0422 04:51:08.929375 140630240360192 base_runner.py:69] task.lm.stack.encoder_tpl.tr_fflayer_tpl.ln_tpl.random_seed : NoneType I0422 04:51:08.929430 140630240360192 base_runner.py:69] task.lm.stack.encoder_tpl.tr_fflayer_tpl.ln_tpl.skip_lp_regularization : NoneType I0422 04:51:08.929483 140630240360192 base_runner.py:69] task.lm.stack.encoder_tpl.tr_fflayer_tpl.ln_tpl.vn.global_vn : False I0422 04:51:08.929538 140630240360192 base_runner.py:69] task.lm.stack.encoder_tpl.tr_fflayer_tpl.ln_tpl.vn.per_step_vn : False I0422 04:51:08.929590 140630240360192 base_runner.py:69] task.lm.stack.encoder_tpl.tr_fflayer_tpl.ln_tpl.vn.scale : NoneType I0422 04:51:08.929645 140630240360192 base_runner.py:69] task.lm.stack.encoder_tpl.tr_fflayer_tpl.ln_tpl.vn.seed : NoneType I0422 04:51:08.929698 140630240360192 base_runner.py:69] task.lm.stack.encoder_tpl.tr_fflayer_tpl.name : '' I0422 04:51:08.929755 140630240360192 base_runner.py:69] task.lm.stack.encoder_tpl.tr_fflayer_tpl.output_dim : 0 I0422 04:51:08.929811 140630240360192 base_runner.py:69] task.lm.stack.encoder_tpl.tr_fflayer_tpl.params_init.method : 'xavier' I0422 04:51:08.929864 140630240360192 base_runner.py:69] task.lm.stack.encoder_tpl.tr_fflayer_tpl.params_init.scale : 1.000001 I0422 04:51:08.929919 140630240360192 base_runner.py:69] task.lm.stack.encoder_tpl.tr_fflayer_tpl.params_init.seed : NoneType I0422 04:51:08.929971 140630240360192 base_runner.py:69] task.lm.stack.encoder_tpl.tr_fflayer_tpl.random_seed : NoneType I0422 04:51:08.930026 140630240360192 base_runner.py:69] task.lm.stack.encoder_tpl.tr_fflayer_tpl.relu_dropout_prob : 0.0 I0422 04:51:08.930079 140630240360192 base_runner.py:69] task.lm.stack.encoder_tpl.tr_fflayer_tpl.res_proj_tpl.activation : 'RELU' I0422 04:51:08.930135 140630240360192 base_runner.py:69] task.lm.stack.encoder_tpl.tr_fflayer_tpl.res_proj_tpl.affine_last : False I0422 04:51:08.930190 140630240360192 base_runner.py:69] task.lm.stack.encoder_tpl.tr_fflayer_tpl.res_proj_tpl.allow_implicit_capture : NoneType I0422 04:51:08.930250 140630240360192 base_runner.py:69] task.lm.stack.encoder_tpl.tr_fflayer_tpl.res_proj_tpl.batch_norm : True I0422 04:51:08.930310 140630240360192 base_runner.py:69] task.lm.stack.encoder_tpl.tr_fflayer_tpl.res_proj_tpl.bias_init : 0.0 I0422 04:51:08.930368 140630240360192 base_runner.py:69] task.lm.stack.encoder_tpl.tr_fflayer_tpl.res_proj_tpl.bn_fold_weights : NoneType I0422 04:51:08.930428 140630240360192 base_runner.py:69] task.lm.stack.encoder_tpl.tr_fflayer_tpl.res_proj_tpl.cls : type/lingvo.core.layers/ProjectionLayer I0422 04:51:08.930485 140630240360192 base_runner.py:69] task.lm.stack.encoder_tpl.tr_fflayer_tpl.res_proj_tpl.dtype : float32 I0422 04:51:08.930545 140630240360192 base_runner.py:69] task.lm.stack.encoder_tpl.tr_fflayer_tpl.res_proj_tpl.fprop_dtype : NoneType I0422 04:51:08.930603 140630240360192 base_runner.py:69] task.lm.stack.encoder_tpl.tr_fflayer_tpl.res_proj_tpl.has_bias : False I0422 04:51:08.930660 140630240360192 base_runner.py:69] task.lm.stack.encoder_tpl.tr_fflayer_tpl.res_proj_tpl.inference_driver_name : NoneType I0422 04:51:08.930717 140630240360192 base_runner.py:69] task.lm.stack.encoder_tpl.tr_fflayer_tpl.res_proj_tpl.input_dim : 0 I0422 04:51:08.930775 140630240360192 base_runner.py:69] task.lm.stack.encoder_tpl.tr_fflayer_tpl.res_proj_tpl.is_eval : NoneType I0422 04:51:08.930833 140630240360192 base_runner.py:69] task.lm.stack.encoder_tpl.tr_fflayer_tpl.res_proj_tpl.is_inference : NoneType I0422 04:51:08.930905 140630240360192 base_runner.py:69] task.lm.stack.encoder_tpl.tr_fflayer_tpl.res_proj_tpl.name : '' I0422 04:51:08.930963 140630240360192 base_runner.py:69] task.lm.stack.encoder_tpl.tr_fflayer_tpl.res_proj_tpl.output_dim : 0 I0422 04:51:08.931022 140630240360192 base_runner.py:69] task.lm.stack.encoder_tpl.tr_fflayer_tpl.res_proj_tpl.params_init.method : 'xavier' I0422 04:51:08.931080 140630240360192 base_runner.py:69] task.lm.stack.encoder_tpl.tr_fflayer_tpl.res_proj_tpl.params_init.scale : 1.000001 I0422 04:51:08.931138 140630240360192 base_runner.py:69] task.lm.stack.encoder_tpl.tr_fflayer_tpl.res_proj_tpl.params_init.seed : NoneType I0422 04:51:08.931195 140630240360192 base_runner.py:69] task.lm.stack.encoder_tpl.tr_fflayer_tpl.res_proj_tpl.qdomain.default : NoneType I0422 04:51:08.931252 140630240360192 base_runner.py:69] task.lm.stack.encoder_tpl.tr_fflayer_tpl.res_proj_tpl.random_seed : NoneType I0422 04:51:08.931310 140630240360192 base_runner.py:69] task.lm.stack.encoder_tpl.tr_fflayer_tpl.res_proj_tpl.skip_lp_regularization : NoneType I0422 04:51:08.931368 140630240360192 base_runner.py:69] task.lm.stack.encoder_tpl.tr_fflayer_tpl.res_proj_tpl.vn.global_vn : False I0422 04:51:08.931426 140630240360192 base_runner.py:69] task.lm.stack.encoder_tpl.tr_fflayer_tpl.res_proj_tpl.vn.per_step_vn : False I0422 04:51:08.931483 140630240360192 base_runner.py:69] task.lm.stack.encoder_tpl.tr_fflayer_tpl.res_proj_tpl.vn.scale : NoneType I0422 04:51:08.931545 140630240360192 base_runner.py:69] task.lm.stack.encoder_tpl.tr_fflayer_tpl.res_proj_tpl.vn.seed : NoneType I0422 04:51:08.931603 140630240360192 base_runner.py:69] task.lm.stack.encoder_tpl.tr_fflayer_tpl.res_proj_tpl.weight_norm : False I0422 04:51:08.931662 140630240360192 base_runner.py:69] task.lm.stack.encoder_tpl.tr_fflayer_tpl.residual_dropout_prob : 0.0 I0422 04:51:08.931720 140630240360192 base_runner.py:69] task.lm.stack.encoder_tpl.tr_fflayer_tpl.residual_dropout_tpl.allow_implicit_capture : NoneType I0422 04:51:08.931777 140630240360192 base_runner.py:69] task.lm.stack.encoder_tpl.tr_fflayer_tpl.residual_dropout_tpl.cls : type/lingvo.core.layers/DropoutLayer I0422 04:51:08.931834 140630240360192 base_runner.py:69] task.lm.stack.encoder_tpl.tr_fflayer_tpl.residual_dropout_tpl.dropout_at_eval : False I0422 04:51:08.931893 140630240360192 base_runner.py:69] task.lm.stack.encoder_tpl.tr_fflayer_tpl.residual_dropout_tpl.dtype : float32 I0422 04:51:08.931950 140630240360192 base_runner.py:69] task.lm.stack.encoder_tpl.tr_fflayer_tpl.residual_dropout_tpl.fprop_dtype : NoneType I0422 04:51:08.932007 140630240360192 base_runner.py:69] task.lm.stack.encoder_tpl.tr_fflayer_tpl.residual_dropout_tpl.inference_driver_name : NoneType I0422 04:51:08.932065 140630240360192 base_runner.py:69] task.lm.stack.encoder_tpl.tr_fflayer_tpl.residual_dropout_tpl.is_eval : NoneType I0422 04:51:08.932121 140630240360192 base_runner.py:69] task.lm.stack.encoder_tpl.tr_fflayer_tpl.residual_dropout_tpl.is_inference : NoneType I0422 04:51:08.932178 140630240360192 base_runner.py:69] task.lm.stack.encoder_tpl.tr_fflayer_tpl.residual_dropout_tpl.keep_prob : 1.0 I0422 04:51:08.932236 140630240360192 base_runner.py:69] task.lm.stack.encoder_tpl.tr_fflayer_tpl.residual_dropout_tpl.name : '' I0422 04:51:08.932295 140630240360192 base_runner.py:69] task.lm.stack.encoder_tpl.tr_fflayer_tpl.residual_dropout_tpl.noise_shape : NoneType I0422 04:51:08.932352 140630240360192 base_runner.py:69] task.lm.stack.encoder_tpl.tr_fflayer_tpl.residual_dropout_tpl.noise_shape_broadcast_dims : NoneType I0422 04:51:08.932410 140630240360192 base_runner.py:69] task.lm.stack.encoder_tpl.tr_fflayer_tpl.residual_dropout_tpl.params_init.method : 'xavier' I0422 04:51:08.932468 140630240360192 base_runner.py:69] task.lm.stack.encoder_tpl.tr_fflayer_tpl.residual_dropout_tpl.params_init.scale : 1.000001 I0422 04:51:08.932524 140630240360192 base_runner.py:69] task.lm.stack.encoder_tpl.tr_fflayer_tpl.residual_dropout_tpl.params_init.seed : NoneType I0422 04:51:08.932583 140630240360192 base_runner.py:69] task.lm.stack.encoder_tpl.tr_fflayer_tpl.residual_dropout_tpl.random_seed : NoneType I0422 04:51:08.932640 140630240360192 base_runner.py:69] task.lm.stack.encoder_tpl.tr_fflayer_tpl.residual_dropout_tpl.skip_lp_regularization : NoneType I0422 04:51:08.932697 140630240360192 base_runner.py:69] task.lm.stack.encoder_tpl.tr_fflayer_tpl.residual_dropout_tpl.vn.global_vn : False I0422 04:51:08.932754 140630240360192 base_runner.py:69] task.lm.stack.encoder_tpl.tr_fflayer_tpl.residual_dropout_tpl.vn.per_step_vn : False I0422 04:51:08.932811 140630240360192 base_runner.py:69] task.lm.stack.encoder_tpl.tr_fflayer_tpl.residual_dropout_tpl.vn.scale : NoneType I0422 04:51:08.932868 140630240360192 base_runner.py:69] task.lm.stack.encoder_tpl.tr_fflayer_tpl.residual_dropout_tpl.vn.seed : NoneType I0422 04:51:08.932926 140630240360192 base_runner.py:69] task.lm.stack.encoder_tpl.tr_fflayer_tpl.skip_lp_regularization : NoneType I0422 04:51:08.932984 140630240360192 base_runner.py:69] task.lm.stack.encoder_tpl.tr_fflayer_tpl.vn.global_vn : False I0422 04:51:08.933042 140630240360192 base_runner.py:69] task.lm.stack.encoder_tpl.tr_fflayer_tpl.vn.per_step_vn : False I0422 04:51:08.933099 140630240360192 base_runner.py:69] task.lm.stack.encoder_tpl.tr_fflayer_tpl.vn.scale : NoneType I0422 04:51:08.933156 140630240360192 base_runner.py:69] task.lm.stack.encoder_tpl.tr_fflayer_tpl.vn.seed : NoneType I0422 04:51:08.933214 140630240360192 base_runner.py:69] task.lm.stack.encoder_tpl.transparent_merger_tpl : NoneType I0422 04:51:08.933275 140630240360192 base_runner.py:69] task.lm.stack.encoder_tpl.vn.global_vn : False I0422 04:51:08.933334 140630240360192 base_runner.py:69] task.lm.stack.encoder_tpl.vn.per_step_vn : False I0422 04:51:08.933392 140630240360192 base_runner.py:69] task.lm.stack.encoder_tpl.vn.scale : NoneType I0422 04:51:08.933449 140630240360192 base_runner.py:69] task.lm.stack.encoder_tpl.vn.seed : NoneType I0422 04:51:08.933506 140630240360192 base_runner.py:69] task.lm.stack.fprop_dtype : NoneType I0422 04:51:08.933563 140630240360192 base_runner.py:69] task.lm.stack.inference_driver_name : NoneType I0422 04:51:08.933621 140630240360192 base_runner.py:69] task.lm.stack.is_eval : NoneType I0422 04:51:08.933679 140630240360192 base_runner.py:69] task.lm.stack.is_inference : NoneType I0422 04:51:08.933737 140630240360192 base_runner.py:69] task.lm.stack.is_transparent : False I0422 04:51:08.933794 140630240360192 base_runner.py:69] task.lm.stack.model_dim : 1024 I0422 04:51:08.933851 140630240360192 base_runner.py:69] task.lm.stack.name : '' I0422 04:51:08.933908 140630240360192 base_runner.py:69] task.lm.stack.num_decoder_layers : 0 I0422 04:51:08.933965 140630240360192 base_runner.py:69] task.lm.stack.num_encoder_layers : 6 I0422 04:51:08.934024 140630240360192 base_runner.py:69] task.lm.stack.num_micro_batches : 1 I0422 04:51:08.934081 140630240360192 base_runner.py:69] task.lm.stack.num_splits : 1 I0422 04:51:08.934139 140630240360192 base_runner.py:69] task.lm.stack.num_transparent_outputs : 0 I0422 04:51:08.934195 140630240360192 base_runner.py:69] task.lm.stack.packed_input : False I0422 04:51:08.934252 140630240360192 base_runner.py:69] task.lm.stack.params_init.method : 'xavier' I0422 04:51:08.934310 140630240360192 base_runner.py:69] task.lm.stack.params_init.scale : 1.000001 I0422 04:51:08.934367 140630240360192 base_runner.py:69] task.lm.stack.params_init.seed : NoneType I0422 04:51:08.934425 140630240360192 base_runner.py:69] task.lm.stack.random_seed : NoneType I0422 04:51:08.934483 140630240360192 base_runner.py:69] task.lm.stack.skip_lp_regularization : NoneType I0422 04:51:08.934540 140630240360192 base_runner.py:69] task.lm.stack.splits : 1 I0422 04:51:08.934597 140630240360192 base_runner.py:69] task.lm.stack.state_dtype : NoneType I0422 04:51:08.934654 140630240360192 base_runner.py:69] task.lm.stack.transparent_merger_dropout_prob : 0.1 I0422 04:51:08.934711 140630240360192 base_runner.py:69] task.lm.stack.use_pipelined_embeddings : False I0422 04:51:08.934768 140630240360192 base_runner.py:69] task.lm.stack.vn.global_vn : False I0422 04:51:08.934827 140630240360192 base_runner.py:69] task.lm.stack.vn.per_step_vn : False I0422 04:51:08.934909 140630240360192 base_runner.py:69] task.lm.stack.vn.scale : NoneType I0422 04:51:08.934971 140630240360192 base_runner.py:69] task.lm.stack.vn.seed : NoneType I0422 04:51:08.935029 140630240360192 base_runner.py:69] task.lm.vn.global_vn : False I0422 04:51:08.935086 140630240360192 base_runner.py:69] task.lm.vn.per_step_vn : False I0422 04:51:08.935142 140630240360192 base_runner.py:69] task.lm.vn.scale : NoneType I0422 04:51:08.935198 140630240360192 base_runner.py:69] task.lm.vn.seed : NoneType I0422 04:51:08.935267 140630240360192 base_runner.py:69] task.lm.vocab_size : 32000 I0422 04:51:08.935322 140630240360192 base_runner.py:69] task.name : '1bwds_wpm_level_lm' I0422 04:51:08.935375 140630240360192 base_runner.py:69] task.online_encoder : NoneType I0422 04:51:08.935431 140630240360192 base_runner.py:69] task.params_init.method : 'xavier' I0422 04:51:08.935484 140630240360192 base_runner.py:69] task.params_init.scale : 1.000001 I0422 04:51:08.935539 140630240360192 base_runner.py:69] task.params_init.seed : NoneType I0422 04:51:08.935591 140630240360192 base_runner.py:69] task.random_seed : NoneType I0422 04:51:08.935646 140630240360192 base_runner.py:69] task.skip_lp_regularization : NoneType I0422 04:51:08.935699 140630240360192 base_runner.py:69] task.train.bprop_variable_filter : NoneType I0422 04:51:08.935758 140630240360192 base_runner.py:69] task.train.clip_gradient_norm_to_value : 0.0 I0422 04:51:08.935811 140630240360192 base_runner.py:69] task.train.clip_gradient_single_norm_to_value : 0.0 I0422 04:51:08.935867 140630240360192 base_runner.py:69] task.train.colocate_gradients_with_ops : True I0422 04:51:08.935920 140630240360192 base_runner.py:69] task.train.early_stop.metric_history.jobname : 'eval_dev' I0422 04:51:08.935975 140630240360192 base_runner.py:69] task.train.early_stop.metric_history.local_filesystem : False I0422 04:51:08.936028 140630240360192 base_runner.py:69] task.train.early_stop.metric_history.logdir : '' I0422 04:51:08.936083 140630240360192 base_runner.py:69] task.train.early_stop.metric_history.metric : 'log_pplx' I0422 04:51:08.936136 140630240360192 base_runner.py:69] task.train.early_stop.metric_history.minimize : True I0422 04:51:08.936189 140630240360192 base_runner.py:69] task.train.early_stop.metric_history.name : 'MetricHistory' I0422 04:51:08.936244 140630240360192 base_runner.py:69] task.train.early_stop.metric_history.tfevent_file : False I0422 04:51:08.936297 140630240360192 base_runner.py:69] task.train.early_stop.name : 'EarlyStop' I0422 04:51:08.936351 140630240360192 base_runner.py:69] task.train.early_stop.tolerance : 0.0 I0422 04:51:08.936404 140630240360192 base_runner.py:69] task.train.early_stop.verbose : True I0422 04:51:08.936458 140630240360192 base_runner.py:69] task.train.early_stop.window : 0 I0422 04:51:08.936511 140630240360192 base_runner.py:69] task.train.ema_decay : 0.0 I0422 04:51:08.936564 140630240360192 base_runner.py:69] task.train.gate_gradients : False I0422 04:51:08.936618 140630240360192 base_runner.py:69] task.train.grad_aggregation_method : 1 I0422 04:51:08.936671 140630240360192 base_runner.py:69] task.train.grad_norm_to_clip_to_zero : 0.0 I0422 04:51:08.936724 140630240360192 base_runner.py:69] task.train.grad_norm_tracker : NoneType I0422 04:51:08.936778 140630240360192 base_runner.py:69] task.train.init_from_checkpoint_rules : {} I0422 04:51:08.936831 140630240360192 base_runner.py:69] task.train.l1_regularizer_weight : NoneType I0422 04:51:08.936885 140630240360192 base_runner.py:69] task.train.l2_regularizer_weight : 1e-06 I0422 04:51:08.936938 140630240360192 base_runner.py:69] task.train.learning_rate : 0.5 I0422 04:51:08.936990 140630240360192 base_runner.py:69] task.train.lr_schedule.allow_implicit_capture : NoneType I0422 04:51:08.937045 140630240360192 base_runner.py:69] task.train.lr_schedule.cls : type/lingvo.core.lr_schedule/TransformerLearningRateSchedule I0422 04:51:08.937098 140630240360192 base_runner.py:69] task.train.lr_schedule.decay_end : NoneType I0422 04:51:08.937153 140630240360192 base_runner.py:69] task.train.lr_schedule.dtype : float32 I0422 04:51:08.937206 140630240360192 base_runner.py:69] task.train.lr_schedule.fprop_dtype : NoneType I0422 04:51:08.937258 140630240360192 base_runner.py:69] task.train.lr_schedule.inference_driver_name : NoneType I0422 04:51:08.937313 140630240360192 base_runner.py:69] task.train.lr_schedule.is_eval : NoneType I0422 04:51:08.937366 140630240360192 base_runner.py:69] task.train.lr_schedule.is_inference : NoneType I0422 04:51:08.937419 140630240360192 base_runner.py:69] task.train.lr_schedule.model_dim : 2048 I0422 04:51:08.937474 140630240360192 base_runner.py:69] task.train.lr_schedule.name : 'LRSched' I0422 04:51:08.937526 140630240360192 base_runner.py:69] task.train.lr_schedule.params_init.method : 'xavier' I0422 04:51:08.937580 140630240360192 base_runner.py:69] task.train.lr_schedule.params_init.scale : 1.000001 I0422 04:51:08.937633 140630240360192 base_runner.py:69] task.train.lr_schedule.params_init.seed : NoneType I0422 04:51:08.937688 140630240360192 base_runner.py:69] task.train.lr_schedule.random_seed : NoneType I0422 04:51:08.937741 140630240360192 base_runner.py:69] task.train.lr_schedule.skip_lp_regularization : NoneType I0422 04:51:08.937794 140630240360192 base_runner.py:69] task.train.lr_schedule.vn.global_vn : False I0422 04:51:08.937849 140630240360192 base_runner.py:69] task.train.lr_schedule.vn.per_step_vn : False I0422 04:51:08.937910 140630240360192 base_runner.py:69] task.train.lr_schedule.vn.scale : NoneType I0422 04:51:08.937963 140630240360192 base_runner.py:69] task.train.lr_schedule.vn.seed : NoneType I0422 04:51:08.938018 140630240360192 base_runner.py:69] task.train.lr_schedule.warmup_steps : 40000 I0422 04:51:08.938071 140630240360192 base_runner.py:69] task.train.lr_schedule.worker_replicas : 1 I0422 04:51:08.938124 140630240360192 base_runner.py:69] task.train.max_lstm_gradient_norm : 0.0 I0422 04:51:08.938179 140630240360192 base_runner.py:69] task.train.max_steps : 4000000 I0422 04:51:08.938231 140630240360192 base_runner.py:69] task.train.optimizer.allow_implicit_capture : NoneType I0422 04:51:08.938286 140630240360192 base_runner.py:69] task.train.optimizer.beta1 : 0.9 I0422 04:51:08.938339 140630240360192 base_runner.py:69] task.train.optimizer.beta2 : 0.997 I0422 04:51:08.938395 140630240360192 base_runner.py:69] task.train.optimizer.cls : type/lingvo.core.optimizer/Adam I0422 04:51:08.938448 140630240360192 base_runner.py:69] task.train.optimizer.dtype : float32 I0422 04:51:08.938502 140630240360192 base_runner.py:69] task.train.optimizer.epsilon : 1e-09 I0422 04:51:08.938555 140630240360192 base_runner.py:69] task.train.optimizer.fprop_dtype : NoneType I0422 04:51:08.938610 140630240360192 base_runner.py:69] task.train.optimizer.inference_driver_name : NoneType I0422 04:51:08.938663 140630240360192 base_runner.py:69] task.train.optimizer.is_eval : NoneType I0422 04:51:08.938716 140630240360192 base_runner.py:69] task.train.optimizer.is_inference : NoneType I0422 04:51:08.938771 140630240360192 base_runner.py:69] task.train.optimizer.name : 'Adam' I0422 04:51:08.938823 140630240360192 base_runner.py:69] task.train.optimizer.params_init.method : 'xavier' I0422 04:51:08.938884 140630240360192 base_runner.py:69] task.train.optimizer.params_init.scale : 1.000001 I0422 04:51:08.938941 140630240360192 base_runner.py:69] task.train.optimizer.params_init.seed : NoneType I0422 04:51:08.938994 140630240360192 base_runner.py:69] task.train.optimizer.random_seed : NoneType I0422 04:51:08.939049 140630240360192 base_runner.py:69] task.train.optimizer.skip_lp_regularization : NoneType I0422 04:51:08.939101 140630240360192 base_runner.py:69] task.train.optimizer.vn.global_vn : False I0422 04:51:08.939156 140630240360192 base_runner.py:69] task.train.optimizer.vn.per_step_vn : False I0422 04:51:08.939209 140630240360192 base_runner.py:69] task.train.optimizer.vn.scale : NoneType I0422 04:51:08.939263 140630240360192 base_runner.py:69] task.train.optimizer.vn.seed : NoneType I0422 04:51:08.939316 140630240360192 base_runner.py:69] task.train.pruning_hparams_dict : NoneType I0422 04:51:08.939371 140630240360192 base_runner.py:69] task.train.save_interval_seconds : 600 I0422 04:51:08.939424 140630240360192 base_runner.py:69] task.train.start_up_delay_steps : 200 I0422 04:51:08.939477 140630240360192 base_runner.py:69] task.train.sum_loss_across_tokens_in_batch : False I0422 04:51:08.939531 140630240360192 base_runner.py:69] task.train.summary_interval_steps : 100 I0422 04:51:08.939584 140630240360192 base_runner.py:69] task.train.tpu_steps_per_loop : 100 I0422 04:51:08.939637 140630240360192 base_runner.py:69] task.train.vn_start_step : 20000 I0422 04:51:08.939692 140630240360192 base_runner.py:69] task.train.vn_std : 0.0 I0422 04:51:08.939744 140630240360192 base_runner.py:69] task.vn.global_vn : False I0422 04:51:08.939798 140630240360192 base_runner.py:69] task.vn.per_step_vn : False I0422 04:51:08.939851 140630240360192 base_runner.py:69] task.vn.scale : NoneType I0422 04:51:08.939905 140630240360192 base_runner.py:69] task.vn.seed : NoneType I0422 04:51:08.939959 140630240360192 base_runner.py:69] train.early_stop.metric_history.jobname : 'eval_dev' I0422 04:51:08.940012 140630240360192 base_runner.py:69] train.early_stop.metric_history.local_filesystem : False I0422 04:51:08.940066 140630240360192 base_runner.py:69] train.early_stop.metric_history.logdir : '' I0422 04:51:08.940123 140630240360192 base_runner.py:69] train.early_stop.metric_history.metric : 'log_pplx' I0422 04:51:08.940177 140630240360192 base_runner.py:69] train.early_stop.metric_history.minimize : True I0422 04:51:08.940231 140630240360192 base_runner.py:69] train.early_stop.metric_history.name : 'MetricHistory' I0422 04:51:08.940284 140630240360192 base_runner.py:69] train.early_stop.metric_history.tfevent_file : False I0422 04:51:08.940339 140630240360192 base_runner.py:69] train.early_stop.name : 'EarlyStop' I0422 04:51:08.940392 140630240360192 base_runner.py:69] train.early_stop.tolerance : 0.0 I0422 04:51:08.940445 140630240360192 base_runner.py:69] train.early_stop.verbose : True I0422 04:51:08.940500 140630240360192 base_runner.py:69] train.early_stop.window : 0 I0422 04:51:08.940553 140630240360192 base_runner.py:69] train.ema_decay : 0.0 I0422 04:51:08.940608 140630240360192 base_runner.py:69] train.init_from_checkpoint_rules : {} I0422 04:51:08.940661 140630240360192 base_runner.py:69] train.max_steps : 4000000 I0422 04:51:08.940716 140630240360192 base_runner.py:69] train.save_interval_seconds : 600 I0422 04:51:08.940768 140630240360192 base_runner.py:69] train.start_up_delay_steps : 200 I0422 04:51:08.940823 140630240360192 base_runner.py:69] train.summary_interval_steps : 100 I0422 04:51:08.940876 140630240360192 base_runner.py:69] train.tpu_steps_per_loop : 100 I0422 04:51:08.940931 140630240360192 base_runner.py:69] vn.global_vn : False I0422 04:51:08.940984 140630240360192 base_runner.py:69] vn.per_step_vn : False I0422 04:51:08.941036 140630240360192 base_runner.py:69] vn.scale : NoneType I0422 04:51:08.941091 140630240360192 base_runner.py:69] vn.seed : NoneType I0422 04:51:08.941144 140630240360192 base_runner.py:69] I0422 04:51:08.941211 140630240360192 base_runner.py:70] ============================================================ I0422 04:51:08.942976 140630240360192 base_runner.py:115] Starting ... I0422 04:51:08.943193 140630240360192 cluster.py:429] _LeastLoadedPlacer : ['/job:local/replica:0/task:0/device:CPU:0'] I0422 04:51:08.952148 140630240360192 cluster.py:447] Place variable global_step on /job:local/replica:0/task:0/device:CPU:0 8 I0422 04:51:08.964440 140630240360192 base_model.py:1116] Training parameters for : { early_stop: { metric_history: { "eval_dev" local_filesystem: False "/tmp/mnist/log" "log_pplx" minimize: True "MetricHistory" tfevent_file: False } "EarlyStop" tolerance: 0.0 verbose: True window: 0 } ema_decay: 0.0 init_from_checkpoint_rules: {} max_steps: 4000000 save_interval_seconds: 600 start_up_delay_steps: 200 summary_interval_steps: 100 tpu_steps_per_loop: 100 } Traceback (most recent call last): File "/tmp/lingvo/bazel-bin/lingvo/trainer.runfiles/__main__/lingvo/trainer.py", line 1554, in tf.app.run(main) File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/platform/app.py", line 40, in run _run(main=main, argv=argv, flags_parser=_parse_flags_tolerate_undef) File "/usr/local/lib/python2.7/dist-packages/absl/app.py", line 300, in run _run_main(main, args) File "/usr/local/lib/python2.7/dist-packages/absl/app.py", line 251, in _run_main sys.exit(main(argv)) File "/tmp/lingvo/bazel-bin/lingvo/trainer.runfiles/__main__/lingvo/trainer.py", line 1550, in main RunnerManager(FLAGS.model).Start() File "/tmp/lingvo/bazel-bin/lingvo/trainer.runfiles/__main__/lingvo/trainer.py", line 1543, in Start self.StartRunners(self.CreateRunners(FLAGS.job.split(','), FLAGS.logdir)) File "/tmp/lingvo/bazel-bin/lingvo/trainer.runfiles/__main__/lingvo/trainer.py", line 1311, in CreateRunners trial) File "/tmp/lingvo/bazel-bin/lingvo/trainer.runfiles/__main__/lingvo/trainer.py", line 1274, in _CreateRunner return self.Trainer(cfg, *common_args) File "/tmp/lingvo/bazel-bin/lingvo/trainer.runfiles/__main__/lingvo/trainer.py", line 384, in __init__ self._model = self.params.cls(self.params) File "/tmp/lingvo/bazel-bin/lingvo/trainer.runfiles/__main__/lingvo/core/base_layer.py", line 118, in wrapper func(self, *args, **kwargs) File "/tmp/lingvo/bazel-bin/lingvo/trainer.runfiles/__main__/lingvo/core/base_model.py", line 1221, in __init__ self.CreateChild('_task', p.task) File "/tmp/lingvo/bazel-bin/lingvo/trainer.runfiles/__main__/lingvo/core/base_layer.py", line 602, in CreateChild child = p.cls(p) File "/tmp/lingvo/bazel-bin/lingvo/trainer.runfiles/__main__/lingvo/core/base_layer.py", line 118, in wrapper func(self, *args, **kwargs) File "/tmp/lingvo/bazel-bin/lingvo/trainer.runfiles/__main__/lingvo/tasks/lm/model.py", line 62, in __init__ super(LanguageModel, self).__init__(params) File "/tmp/lingvo/bazel-bin/lingvo/trainer.runfiles/__main__/lingvo/core/base_layer.py", line 118, in wrapper func(self, *args, **kwargs) File "/tmp/lingvo/bazel-bin/lingvo/trainer.runfiles/__main__/lingvo/core/base_model.py", line 239, in __init__ self.CreateChild('input', p.input) File "/tmp/lingvo/bazel-bin/lingvo/trainer.runfiles/__main__/lingvo/core/base_layer.py", line 602, in CreateChild child = p.cls(p) File "/tmp/lingvo/bazel-bin/lingvo/trainer.runfiles/__main__/lingvo/tasks/lm/input_generator.py", line 46, in __init__ text, self._word_count = self._BuildDataSource() File "/tmp/lingvo/bazel-bin/lingvo/trainer.runfiles/__main__/lingvo/core/base_input_generator.py", line 408, in _BuildDataSource return self._DataSourceFromFilePattern(input_file_pattern) File "/tmp/lingvo/bazel-bin/lingvo/trainer.runfiles/__main__/lingvo/tasks/lm/input_generator.py", line 83, in _DataSourceFromFilePattern **self.CommonInputOpArgs()) File "/tmp/lingvo/bazel-bin/lingvo/trainer.runfiles/__main__/lingvo/core/base_input_generator.py", line 289, in CommonInputOpArgs args.update(self._InputOpBucketingArgs()) File "/tmp/lingvo/bazel-bin/lingvo/trainer.runfiles/__main__/lingvo/core/base_input_generator.py", line 509, in _InputOpBucketingArgs bucket_batch_limit = self.scaled_bucket_batch_limit File "/tmp/lingvo/bazel-bin/lingvo/trainer.runfiles/__main__/lingvo/core/base_input_generator.py", line 478, in scaled_bucket_batch_limit b * cluster.num_splits_per_client for b in p.bucket_batch_limit File "/tmp/lingvo/bazel-bin/lingvo/trainer.runfiles/__main__/lingvo/core/cluster.py", line 281, in num_splits_per_client return self.num_splits_per_replica * self.num_replicas File "/tmp/lingvo/bazel-bin/lingvo/trainer.runfiles/__main__/lingvo/core/cluster.py", line 273, in num_splits_per_replica assert self.num_devices_per_replica % self.num_devices_per_split == 0 AssertionError