Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Bug on create a Dataset from a DataFrame when index its not ordered #20347

Closed
marcelino-m opened this issue Jun 27, 2018 · 4 comments
Closed
Assignees

Comments

@marcelino-m
Copy link

marcelino-m commented Jun 27, 2018

This code generate a segmentation fault

import pandas     as pd
import tensorflow as tf
import tensorflow.contrib.eager as tfe
tf.enable_eager_execution()

train = pd.DataFrame({'A': [1, 2, 3], 'B': [4, 5, 6]}, index=[2, 1, 4])
label = pd.Series([7, 8, 9], index=[2, 1, 4])

ds = tf.data.Dataset.from_tensor_slices((dict(train), label))

But this not, (only index it's changed) from [2, 1, 4] to [0, 1, 4]:

import pandas     as pd
import tensorflow as tf
import tensorflow.contrib.eager as tfe
tf.enable_eager_execution()


train = pd.DataFrame({'A': [1, 2, 3], 'B': [4, 5, 6]}, index=[0, 1, 4])
label = pd.Series([7, 8, 9], index=[0, 1, 4])

ds = tf.data.Dataset.from_tensor_slices((dict(train), label))

System information

  • Have I written custom code (as opposed to using a stock example script provided in TensorFlow): Yes
  • OS Platform and Distribution (e.g., Linux Ubuntu 16.04): Kubuntu 17.10 (Artful Aardvark), Kernel 4.13.0-21-generic
  • TensorFlow installed from (source or binary): binary
  • TensorFlow version (use command below): 1.8.0
  • Python version: 3.6.3
  • Bazel version (if compiling from source): N/A
  • GCC/Compiler version (if compiling from source): N/A
  • CUDA/cuDNN version: N/A
  • GPU model and memory: N/A
  • Exact command to reproduce: N/A
@tensorflowbutler tensorflowbutler added the stat:awaiting response Status - Awaiting response from author label Jun 28, 2018
@tensorflowbutler
Copy link
Member

Thank you for your post. We noticed you have not filled out the following field in the issue template. Could you update them if they are relevant in your case, or leave them as N/A? Thanks.
Have I written custom code
OS Platform and Distribution
Bazel version
CUDA/cuDNN version
GPU model and memory
Exact command to reproduce

@marcelino-m
Copy link
Author

I updated the missing information.

@asimshankar asimshankar assigned jsimsa and rachellim and unassigned asimshankar Jun 28, 2018
@tensorflowbutler tensorflowbutler removed the stat:awaiting response Status - Awaiting response from author label Jun 29, 2018
@asimshankar
Copy link
Contributor

Thanks for the report @marcelino-m - it surely shouldn't crash, so I'll look into that.

In the mean time, a workaround would be to explicitly convert to tensors before creating the dataset, with something like this:

import tensorflow as tf
import pandas as pd
import numpy as np
tf.enable_eager_execution()

train = pd.DataFrame({'A': [1, 2, 3], 'B': [4, 5, 6]}, index=[2, 1, 4])
label = pd.Series([7, 8, 9], index=[2, 1, 4])

np_train  = {k: np.array(v) for k, v in dict(train).items()}
np_label = np.array(label)
ds = tf.data.Dataset.from_tensor_slices((np_train, np_label))

@marcelino-m
Copy link
Author

thanks for the tips. I did something like that.

@asimshankar asimshankar assigned asimshankar and unassigned jsimsa and rachellim Jun 29, 2018
@drpngx drpngx closed this as completed in 92f13f2 Jul 3, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants