Bug on create a Dataset from a DataFrame when index its not ordered #20347

marcelino-m · 2018-06-27T15:28:03Z

This code generate a segmentation fault

import pandas     as pd
import tensorflow as tf
import tensorflow.contrib.eager as tfe
tf.enable_eager_execution()

train = pd.DataFrame({'A': [1, 2, 3], 'B': [4, 5, 6]}, index=[2, 1, 4])
label = pd.Series([7, 8, 9], index=[2, 1, 4])

ds = tf.data.Dataset.from_tensor_slices((dict(train), label))

But this not, (only index it's changed) from [2, 1, 4] to [0, 1, 4]:

import pandas     as pd
import tensorflow as tf
import tensorflow.contrib.eager as tfe
tf.enable_eager_execution()


train = pd.DataFrame({'A': [1, 2, 3], 'B': [4, 5, 6]}, index=[0, 1, 4])
label = pd.Series([7, 8, 9], index=[0, 1, 4])

ds = tf.data.Dataset.from_tensor_slices((dict(train), label))

System information

Have I written custom code (as opposed to using a stock example script provided in TensorFlow): Yes
OS Platform and Distribution (e.g., Linux Ubuntu 16.04): Kubuntu 17.10 (Artful Aardvark), Kernel 4.13.0-21-generic
TensorFlow installed from (source or binary): binary
TensorFlow version (use command below): 1.8.0
Python version: 3.6.3
Bazel version (if compiling from source): N/A
GCC/Compiler version (if compiling from source): N/A
CUDA/cuDNN version: N/A
GPU model and memory: N/A
Exact command to reproduce: N/A

tensorflowbutler · 2018-06-28T07:33:01Z

Thank you for your post. We noticed you have not filled out the following field in the issue template. Could you update them if they are relevant in your case, or leave them as N/A? Thanks.
Have I written custom code
OS Platform and Distribution
Bazel version
CUDA/cuDNN version
GPU model and memory
Exact command to reproduce

marcelino-m · 2018-06-28T14:09:58Z

I updated the missing information.

asimshankar · 2018-06-29T01:40:46Z

Thanks for the report @marcelino-m - it surely shouldn't crash, so I'll look into that.

In the mean time, a workaround would be to explicitly convert to tensors before creating the dataset, with something like this:

import tensorflow as tf
import pandas as pd
import numpy as np
tf.enable_eager_execution()

train = pd.DataFrame({'A': [1, 2, 3], 'B': [4, 5, 6]}, index=[2, 1, 4])
label = pd.Series([7, 8, 9], index=[2, 1, 4])

np_train  = {k: np.array(v) for k, v in dict(train).items()}
np_label = np.array(label)
ds = tf.data.Dataset.from_tensor_slices((np_train, np_label))

marcelino-m · 2018-06-29T02:03:11Z

thanks for the tips. I did something like that.

tensorflowbutler assigned asimshankar Jun 28, 2018

tensorflowbutler added the stat:awaiting response Status - Awaiting response from author label Jun 28, 2018

asimshankar assigned jsimsa and rachellim and unassigned asimshankar Jun 28, 2018

tensorflowbutler removed the stat:awaiting response Status - Awaiting response from author label Jun 29, 2018

asimshankar assigned asimshankar and unassigned jsimsa and rachellim Jun 29, 2018

drpngx closed this as completed in 92f13f2 Jul 3, 2018

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Bug on create a Dataset from a DataFrame when index its not ordered #20347

Bug on create a Dataset from a DataFrame when index its not ordered #20347

marcelino-m commented Jun 27, 2018 •

edited

tensorflowbutler commented Jun 28, 2018

marcelino-m commented Jun 28, 2018

asimshankar commented Jun 29, 2018

marcelino-m commented Jun 29, 2018

Bug on create a Dataset from a DataFrame when index its not ordered #20347

Bug on create a Dataset from a DataFrame when index its not ordered #20347

Comments

marcelino-m commented Jun 27, 2018 • edited

System information

tensorflowbutler commented Jun 28, 2018

marcelino-m commented Jun 28, 2018

asimshankar commented Jun 29, 2018

marcelino-m commented Jun 29, 2018

marcelino-m commented Jun 27, 2018 •

edited