Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

TensorFlow type system refactoring #139

Closed
wants to merge 11 commits into from

Conversation

karllessard
Copy link
Collaborator

@karllessard karllessard commented Nov 2, 2020

This is another attempt to refactor and simplify the type system of TensorFlow Java, in reflection to what was previously discussed in the preview pull request #92 plus the ongoing discussion on #115 .

I think this new version supports most of the optimizations we were targeting and I plan to clean it up and, if possible, make smaller PR out of it for merging in our current snapshot, if everyone agrees with this approach. Breaking changes compared to the actual version (0.2.0) include:

  • Leverages as much as possible Java standard type system instead of a custom one
  • Tensors instances now inherit directly from a type class, under org.tensorflow.types
  • tensor.data() to convert a tensor handle to a data structure is no longer required
  • DataType has been removed and the DataType proto, which is just an enum, is now used whenever needed
  • Adding autocloseable TensorList and TensorMap that also supports tensor type auto-casting (in progress)

Here are a few before&after examples:

// Before
Operand<TInt32> c = tf.cast(node, TInt32.DTYPE);
// After
Operand<TInt32> c = tf.cast(node, TInt32.class);

// Before
try (Tensor<TInt32> t = session.run().get(0).expect(TInt32.DTYPE)) {
    t.data().getInt();
}
// After
try (TInt32 t = session.run().single()) {
    t.getInt();
}

// Before
DataType<? extends TType> dtype = tensor.dataType();
if (dtype.isNumeric()) {
    ...
}
// After
Class<? extends TType> type = tensor.type();
if (TNumber.class.isAssignableFrom(type)) {
    ...
}

// Before
Tensor<?> t =  ...;
if (t.dataType().equals(TInt32.DTYPE)) {
    ((TInt32)t.data()).getInt();
}
// After
Tensor t = ...;
if (t instanceof TInt32) {
    ((TInt32)t).getInt();
}

CC\ @deansher , @Craigacp

@karllessard karllessard changed the title Tensor ttype TensorFlow type system refactoring Nov 2, 2020
@deansher
Copy link
Contributor

deansher commented Nov 3, 2020

I love the direction! I think you've created a package org.tensorflow.proto that you didn't commit.

@karllessard
Copy link
Collaborator Author

Oh, it looks like the whole generated source folder was missing, I've just pushed it back

@deansher
Copy link
Contributor

deansher commented Nov 3, 2020

Perhaps it would be worthwhile to carefully write down the intended semantics in the javadocs for a few cornerstone classes like Output? For example, interesting semantic issues are raised by this method:

  // in class Output<T>

  public <U extends TType> Output<U> expect(Class<U> tensorType) {
    if (tensorType != type()) {
      throw new IllegalArgumentException(
          "Cannot cast from output of " + type().getSimpleName() + " to output of type " + tensorType.getSimpleName());
    }
    return ((Output<U>) this);
  }

At compile time, would it be reasonable to have an Output<TFloating>? Surely yes. At runtime, that Output's type() method would return an exact tensor-type class like TFloat32.class. I think this reflects useful and sensible semantics, but perhaps surprising semantics for the unprepared.

Now suppose I have an Output<TNumber> and I want to expect(TFloating.class). The above method doesn't support this. It could be changed to support it by using tensorType.isAssignableFrom(type()). I suspect this is the right thing to do, but it depends on one's mental model of the tensorType. If one thinks of tensorType as "the java runtime representation of a TensorFlow data type", then maybe not.

We have a method whose implementation looks at it exactly that way, but whose javadoc does not:

  // in class TypeRegistry

  /**
   * Find a type registration from a type class
   *
   * @param typeClass class implementing {@link Tensor}
   * @return type registration
   * @throws IllegalArgumentException if the code matches no registered data type
   */
  public static <T extends TType> Type<T> find(Class<T> typeClass) {
    Type<?> entry = TYPES_BY_CLASS.get(typeClass);
    if (entry == null) {
      throw new IllegalArgumentException("Class \"" + typeClass.getName() + "\" is not a valid datatype class");
    }
    return (Type<T>)entry;
  }

The javadoc accurately describes typeClass's compile-time type (if we overlook Tensor versus TType), but does not accurately describe its legal values.

Perhaps if we carefully write down the intended semantics in the javadocs for a few cornerstone classes like Output, we could make sure we are comfortable with them and then make sure we consistently implement them? This might also help us discover additional ways to capture our semantics in the type system. For example, I find myself wondering whether we will want to define a ConcreteTType that is implemented only by concrete subclasses like TFloat32.

@karllessard
Copy link
Collaborator Author

karllessard commented Nov 3, 2020

Hi @deansher , this PR is still a draft and the Javadoc is something I did not went through yet, as I wanted to collect more comments and have a common agreement that we want to move in that direction before completing it. So right now, please only look at it at a high-level.

For example, I think what you mentioned about supporting Output.expect(TFloating.class) is pretty relevant and yes, we should to that using isAssignableFrom as you suggested.

@deansher
Copy link
Contributor

deansher commented Nov 3, 2020

Yeah, I'm just poking around for interesting corner cases to see if I can spot any reasons that using a Java class as the runtime representation of a tensor type is going to go badly.

@Craigacp
Copy link
Collaborator

Craigacp commented Jan 3, 2021

Is there anything left in this PR that hasn't landed in the others? As otherwise we should close this one.

@karllessard
Copy link
Collaborator Author

I think we are good to close it, yes

@karllessard karllessard closed this Jan 3, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants