Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implementation of a classifier based on recursive trees #13

Open
Yan-Huang-Cam opened this issue Feb 21, 2017 · 5 comments
Open

Implementation of a classifier based on recursive trees #13

Yan-Huang-Cam opened this issue Feb 21, 2017 · 5 comments

Comments

@Yan-Huang-Cam
Copy link

Yan-Huang-Cam commented Feb 21, 2017

Hi,

Thank you very much for sharing this wonderful tool!

I tried to implement a classifier based on recursive trees. In general, I defined a recursive block that turns a tree into a predicted label, and then used a record block to pair the prediction from the recursive block with the correct label (y_). The record block was then compiled and a cross-entropy loss was calculated based on the output (y and y_) of the compiler. However, when I connected the loss to an optimizer, I got the following message:

../gradients_impl.py:92: UserWarning: Converting sparse IndexedSlices to a dense Tensor of unknown shape. This may consume a large amount of memory.
"Converting sparse IndexedSlices to a dense Tensor of unknown shape. "

The code was then stopped when it tried to process the result from compiler.build_loom_input.

Segmentation fault (core dumped)

I was wondering if it had something to do with the way that I paired the output of the recursive block and the correct label, or any other mistakes in defining the model? (The computation graph shows that the 'outputgather' node sends a tensor with unknown shape (?, 39) (39 is the dimension of y and y_) to the 'gradient' node).

My code is as follows:

    # Define the recursive block of 'process tree'
    expr_fwd = td.ForwardDeclaration(td.PyObjectType(), td.TensorType([word_dim,]))
    word_embedding_layer = td.Embedding(len(word_embedding_model) + 1, word_dim, initializer = we_values, trainable = False)
    leaf_case = td.InputTransform(lambda node: we_keys.get(node['w'].lower(), 0), name = 'leaf_input_transform') >> td.Scalar('int32') >> td.Function(word_embedding_layer, name = 'leaf_Function')
    dep_embedding_layer = td.Embedding(len(dep_dict), param['dep_dim'], name = 'dep_embedding_layer')
    get_dep_embedding = (td.InputTransform(lambda d_label: dep_dict.get(d_label), name = 'dep_input_transform') >> td.Scalar('int32') >> td.Function(dep_embedding_layer, name = 'dep_embedding'))
    fclayer = td.FC(word_dim, name = 'process_tree_FC')
    non_leaf_case = (td.Record({'child': expr_fwd(), 'me': expr_fwd(), 'd': get_dep_embedding}, name = 'non-leaf_record') >> td.Concat() >> td.Function(fclayer, name = 'non_leaf_function'))
    process_tree = td.OneOf(lambda node: node['is_leaf'], {True : leaf_case, False: non_leaf_case}, name = 'process_tree_one_of')
    expr_fwd.resolve_to(process_tree)

    # Define the block which pairs the label ('y_') with the prediction from the recursive tree
    fcplayer_hidden = td.FC(len(y_classes))
    block = td.Record({'x': process_tree >> td.Function(fcplayer_hidden), 'y_': td.Vector(len(y_classes), name = 'label_vector')}, name = 'my_block')

    # Compile the block
    compiler = td.Compiler.create(block)
    (y, y_) = compiler.output_tensors
    with tf.name_scope('cross_entropy') as scope:
      cross_entropy = tf.nn.softmax_cross_entropy_with_logits(logits = y, labels = y_)
    train_step = tf.train.AdamOptimizer(0.5).minimize(cross_entropy)

Thank you very much for your attention!

@delesley
Copy link
Contributor

delesley commented Feb 21, 2017 via email

@moshelooks
Copy link

Hrm, sorry you're getting a segfault! The code does seem fine, I don't know what's going wrong based on what you've provided. Maybe you could share the code that actually generated the segfault?

Also you could try running some of our example code (and some TF examples) to see if the problem is with your particular model vs. TF or Fold not running well in general on your machine.

@moshelooks
Copy link

moshelooks commented Feb 22, 2017

P.S. One more thing to check is to make sure that the segfault is actually being generated while the code is running, by e.g. adding a print statement at the very end of your code. I ask this because during development we encountered some issues due to the way TF does dynamic library loading that could cause a segfault when unlinking the library (which happens when the python interpreter exits). FWIW we never encountered any segfaults while code was being run, although ipython would occasionally segfault durring tab completion (this was not a Fold problem per se, TF did the same thing in some cases for unclear reasons).

@Yan-Huang-Cam
Copy link
Author

Thank you very much for your prompt replies and help! Following your advice, I found that the problem can be solved by adding a virtualenv (as is suggested by the installation document -- sorry I omitted this at the beginning). As a result, this problem seemed to result from a conflict between some python modules and TF or TF Fold.

However, another problem came up. While the code can run now, the batched cross entropy loss turned out to be the same for each example across all batches during training. I was wondering if there was anything wrong with the training code (a continuation of the code for defining and compiling the blocks in the original post):

     init = tf.global_variables_initializer()
     sess = tf.Session()
     sess.run(init)
     tf.summary.FileWriter('./tf_graph', graph = sess.graph)
     batch_size = 30

     train_set = compiler.build_loom_inputs(Input_train_tf)
     train_feed_dict = {}
     dev_feed_dict = compiler.build_feed_dict(Input_dev_tf)
     for epoch, shuffled in enumerate(td.epochs(train_set, epochs), 1):
       train_loss = 0.0
       for batch in td.group_by_batches(shuffled, batch_size):
        train_feed_dict[compiler.loom_input_tensor] = batch
         _, batch_loss = sess.run([train_step, cross_entropy], train_feed_dic    t)
         print batch_loss
         train_loss += np.sum(batch_loss)
      dev_loss = np.average(sess.run(cross_entropy, dev_feed_dict))
      print dev_loss

Otherwise, would it be possible for you to indicate how to diagnose this problem? Thanks a lot for your attention and time!

@moshelooks
Copy link

Train code looks ok. What I would recommend doing here is breaking the code for defining your model down into pieces and putting each inside a function. Then if you have e.g. a foo_block() function you can write unit tests against it and/or interactively debug it with

foo_block().eval(foo_input)

and see that each piece does what you expect.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants