Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Go: SIGABRT when executing the same node more than once #11132

Closed
galeone opened this issue Jun 29, 2017 · 2 comments
Closed

Go: SIGABRT when executing the same node more than once #11132

galeone opened this issue Jun 29, 2017 · 2 comments
Assignees
Labels

Comments

@galeone
Copy link

galeone commented Jun 29, 2017

Problem

In Go, when we pass the same node to the fetches list more then once SIGABRT is raised.

Source code / logs

package poc_test

import (
        "fmt"
        tf "github.com/tensorflow/tensorflow/tensorflow/go"
        "github.com/tensorflow/tensorflow/tensorflow/go/op"
        "testing"
)

func TestFunc(t *testing.T) {
        // Create root scope
        root := op.NewScope()

        // Define graph

        // Create a constant matrix
        A := op.Const(root.SubScope("A"), [2][2]int32{{1, 2}, {-1, -2}})
        // Create a constant column vector
        b := op.Const(root.SubScope("b"), [2][1]int32{{10}, {100}})
        // Create a matmul operation
        mul := op.MatMul(root.SubScope("MatMul"), A, b)

        // Finalize the graph
        graph, _ := root.Finalize()

        // Create the session
        var sess *tf.Session
        sess, _ = tf.NewSession(graph, &tf.SessionOptions{})
        // Run
        var results []*tf.Tensor
        var err error
        if results, err = sess.Run(nil, []tf.Output{mul, mul}, nil); err != nil {
                t.Errorf(err.Error())
        }
        fmt.Println(results[0].Value())
}

Here's the output:

go test poc_test.go 
2017-06-29 10:46:09.154744: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:893] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2017-06-29 10:46:09.155330: I tensorflow/core/common_runtime/gpu/gpu_device.cc:938] Found device 0 with properties: 
name: GeForce GTX 1080 Ti
major: 6 minor: 1 memoryClockRate (GHz) 1.582
pciBusID 0000:03:00.0
Total memory: 10.91GiB
Free memory: 249.38MiB
2017-06-29 10:46:09.267778: W tensorflow/stream_executor/cuda/cuda_driver.cc:523] A non-primary context 0x1f22910 exists before initializing the StreamExecutor. We haven't verified StreamExecutor works with that.
2017-06-29 10:46:09.268001: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:893] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2017-06-29 10:46:09.268357: I tensorflow/core/common_runtime/gpu/gpu_device.cc:938] Found device 1 with properties: 
name: GeForce GTX 1060 6GB
major: 6 minor: 1 memoryClockRate (GHz) 1.7845
pciBusID 0000:01:00.0
Total memory: 5.93GiB
Free memory: 5.34GiB
2017-06-29 10:46:09.268390: I tensorflow/core/common_runtime/gpu/gpu_device.cc:830] Peer access not supported between device ordinals 0 and 1
2017-06-29 10:46:09.268399: I tensorflow/core/common_runtime/gpu/gpu_device.cc:830] Peer access not supported between device ordinals 1 and 0
2017-06-29 10:46:09.268409: I tensorflow/core/common_runtime/gpu/gpu_device.cc:959] DMA: 0 1 
2017-06-29 10:46:09.268415: I tensorflow/core/common_runtime/gpu/gpu_device.cc:969] 0:   Y N 
2017-06-29 10:46:09.268421: I tensorflow/core/common_runtime/gpu/gpu_device.cc:969] 1:   N Y 
2017-06-29 10:46:09.268433: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1028] Creating TensorFlow device (/gpu:0) -> (device: 0, name: GeForce GTX 1080 Ti, pci bus id: 0000:03:00.0)
2017-06-29 10:46:09.268440: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1028] Creating TensorFlow device (/gpu:1) -> (device: 1, name: GeForce GTX 1060 6GB, pci bus id: 0000:01:00.0)
2017-06-29 10:46:09.290295: F tensorflow/c/c_api.cc:488] Check failed: nelems == 0 (2 vs. 0)
SIGABRT: abort
PC=0x7fa8684dc670 m=0 sigcode=18446744073709551610
signal arrived during cgo execution

goroutine 5 [syscall, locked to thread]:
runtime.cgocall(0x50d580, 0xc420043d68, 0x530100)
        /usr/lib/go/src/runtime/cgocall.go:131 +0xe2 fp=0xc420043d20 sp=0xc420043ce0
github.com/tensorflow/tensorflow/tensorflow/go._Cfunc_TF_SessionRun(0x2beb8a0, 0x0, 0x0, 0x0, 0x0, 0xc42000ce40, 0xc420011040, 0xc400000002, 0x0, 0x0, ...)
        github.com/tensorflow/tensorflow/tensorflow/go/_obj/_cgo_gotypes.go:703 +0x45 fp=0xc420043d68 sp=0xc420043d20
github.com/tensorflow/tensorflow/tensorflow/go.(*Session).Run.func1(0x2beb8a0, 0x0, 0x0, 0x0, 0x0, 0xc42000ce40, 0xc420011040, 0xc400000002, 0x0, 0x0, ...)
        /home/pgaleone/projects/go/src/github.com/tensorflow/tensorflow/tensorflow/go/session.go:87 +0x23a fp=0xc420043dd8 sp=0xc420043d68
github.com/tensorflow/tensorflow/tensorflow/go.(*Session).Run(0xc42000ce20, 0x0, 0xc420043f50, 0x2, 0x2, 0x0, 0x0, 0x0, 0x0, 0x0, ...)
        /home/pgaleone/projects/go/src/github.com/tensorflow/tensorflow/tensorflow/go/session.go:91 +0x243 fp=0xc420043e70 sp=0xc420043dd8
command-line-arguments_test.TestFunc(0xc4200665b0)
        /home/pgaleone/projects/go/src/github.com/galeone/asd/poc_test.go:32 +0x35d fp=0xc420043fa8 sp=0xc420043e70
testing.tRunner(0xc4200665b0, 0x562278)
        /usr/lib/go/src/testing/testing.go:657 +0x96 fp=0xc420043fd0 sp=0xc420043fa8
runtime.goexit()
        /usr/lib/go/src/runtime/asm_amd64.s:2197 +0x1 fp=0xc420043fd8 sp=0xc420043fd0
created by testing.(*T).Run
        /usr/lib/go/src/testing/testing.go:697 +0x2ca

goroutine 1 [chan receive]:
testing.(*T).Run(0xc4200664e0, 0x559659, 0x8, 0x562278, 0xc420053d20)
        /usr/lib/go/src/testing/testing.go:698 +0x2f4
testing.runTests.func1(0xc4200664e0)
        /usr/lib/go/src/testing/testing.go:882 +0x67
testing.tRunner(0xc4200664e0, 0xc420053de0)
        /usr/lib/go/src/testing/testing.go:657 +0x96
testing.runTests(0xc42000cd80, 0x7ecf80, 0x1, 0x1, 0x4131ac)
        /usr/lib/go/src/testing/testing.go:888 +0x2c1
testing.(*M).Run(0xc420053f20, 0xc420053f20)
        /usr/lib/go/src/testing/testing.go:822 +0xfc
main.main()
        command-line-arguments/_test/_testmain.go:42 +0xf7

goroutine 17 [syscall, locked to thread]:
runtime.goexit()
        /usr/lib/go/src/runtime/asm_amd64.s:2197 +0x1

rax    0x0
rbx    0x6
rcx    0x7fa8684dc670
rdx    0x0
rdi    0x2
rsi    0x7ffe4515bf50
rbp    0x7ffe4515c1a0
rsp    0x7ffe4515bf50
r8     0x0
r9     0x7ffe4515bf50
r10    0x8
r11    0x246
r12    0x2
r13    0x2
r14    0x2bf8160
r15    0x20
rip    0x7fa8684dc670
rflags 0x246
cs     0x33
fs     0x0
gs     0x0
FAIL    command-line-arguments  0.544s

The same logic, in python, works without any issue:

import tensorflow as tf

A = tf.constant([[1,2], [-1, -2]])
b = tf.constant([[10], [100]])

mul = tf.matmul(A, b)

with tf.Session() as sess:
    print(sess.run([mul, mul]))

outputs

[array([[ 210],
       [-210]], dtype=int32), array([[ 210],
       [-210]], dtype=int32)]

as expected.

System information

  • Have I written custom code (as opposed to using a stock example script provided in TensorFlow): yes
  • OS Platform and Distribution (e.g., Linux Ubuntu 16.04): Archlinux
  • TensorFlow installed from (source or binary): source
  • TensorFlow version (use command below): 1.2.0
  • Bazel version (if compiling from source): 0.5.1
  • CUDA/cuDNN version: cuda 8, cudnn 5.1
  • GPU model and memory: GeForce GTX 1080
  • Exact command to reproduce: go test
@asimshankar
Copy link
Contributor

Thanks for the report, it certainly shouldn't fail with a SIGABRT.

(That said though, am curious about the use case for fetching the same value multiple times).

@asimshankar asimshankar added the type:bug Bug label Jun 29, 2017
@asimshankar asimshankar self-assigned this Jun 29, 2017
@galeone
Copy link
Author

galeone commented Jun 29, 2017

I was showing that in tfgo when one assigns a go variable to another go variable it needs to "clone" it before the assignment in order to create a different and new node in the graph.
Otherwise, the assignment only exists in Go but the underlying reference points to the same node in the graph.

(In short, I was showing how to use tf.assing and not the assignment operator of the lanuage used).

To empathize this, I'd like to show that those 2 Go variables when evaluated contain the same value.
But I can't because of that bug, thus I fallback showing it in another way. The first example here: https://github.com/galeone/tfgo#getting-started

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

2 participants