Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

PEP 484 Type Annotations (feature request) #12345

Open
ed-alertedh opened this issue Aug 17, 2017 · 36 comments
Open

PEP 484 Type Annotations (feature request) #12345

ed-alertedh opened this issue Aug 17, 2017 · 36 comments
Labels
stat:contributions welcome Status - Contributions welcome type:feature Feature requests

Comments

@ed-alertedh
Copy link

ed-alertedh commented Aug 17, 2017

System information

N/A

Describe the problem

Background

PEP 484 [1] added support for type hints in Python. These are purely annotations and are not enforced by the interpreter, however there are tools such as mypy [2] which can be run to check for consistency in the annotations. The typeshed initiative [3] has started to build external collections of type annotations for commonly used libraries.

When adding type annotations to a codebase, it is best if you can achieve near 100% coverage, otherwise uncertainty propagates out from everywhere the "untyped" code is called. A codebase using TF would likely struggle to gain much benefit from type-checking in any of the core code built on top of TF.

Benefits of Adding Type Annotations

  • The expected inputs and outputs of functions become much clearer
  • Code completion is able to provide more useful suggestions, boosting productivity by reducing amount of time spent referring to docs
  • Static analysis can uncover latent bugs (case study here[5])

Difficulties/Drawbacks

  • People may be encouraged to overly constrain types, removing some of the flexibility of a dynamic language. But given that Google's Python style-guide discourages "Power Features" [4] I would argue that striving towards code that is explicit is a similar philosophy
  • The protobuf compiler would need to be augmented to generate type annotations.
  • The Tensorflow Python codebase is huge, so at this point adding the annotations would be a huge undertaking.
  • Tensorflow still supports python 2.7, 3.3 and 3.4 which do not have the type annotation syntax. So if this were implemented it would probably have to be in external *.pyi files, which is harder to maintain compared to inline type annotations in the source code.

Final thoughts

I realise that this would be a major undertaking and wouldn't be likely to ship any time soon, but I'm curious to gauge Google's thoughts on this new feature in Python. I'm about to start building a new codebase from scratch and was keen to use it as a chance to try out type annotations. I probably still will give it a shot, but I suspect that unless most of the common data science libs out there adopt this standard then its usefulness will be quite limited.

[1] https://www.python.org/dev/peps/pep-0484/
[2] http://mypy-lang.org/
[3] https://github.com/python/typeshed
[4] https://google.github.io/styleguide/pyguide.html#Power_Features
[5] http://blog.zulip.org/2016/10/13/static-types-in-python-oh-mypy/

@skye
Copy link
Member

skye commented Aug 17, 2017

This sounds like a good idea to me. We would of course need to add the type annotations incrementally, not all at once. This seems like the kind of thing that would benefit from community contributions as well.

@martinwicke am I missing anything? Please unmark "contributions welcome" if this is actually not feasible or prudent for some reason.

@skye skye added stat:contributions welcome Status - Contributions welcome type:feature Feature requests labels Aug 17, 2017
@skye
Copy link
Member

skye commented Aug 17, 2017

@aselle

@martinwicke
Copy link
Member

martinwicke commented Aug 17, 2017

@ed-alertedh
Copy link
Author

ed-alertedh commented Aug 18, 2017

Awesome to hear you're open to the idea :) One minor detail to add to the above - instead of external *.pyi files, another option to maintain syntax compatibility with earlier Python versions could be to embed "Type Comments" as defined in PEP 484. E.g. here is a snippet I just pulled from Zulip:

def add_request_metadata(report, request):
    # type: (Dict[str, Any], HttpRequest) -> None
    <method body starts here>

@martinwicke
Copy link
Member

martinwicke commented Aug 24, 2017

I prefer the comments over the extra files (if only because I don't like to have separately maintained files, they tend to rot).

We'll have to test whether all our infrastructure (doc generation and whatnot) works with this, but in principle this is a good idea.

@ngc92
Copy link
Contributor

ngc92 commented Sep 14, 2017

I think the most significant benefit of specifying types, improved auto-completion, can be achieved by just specifying the return types.
These are usually already there, but not in the correct format (at least for PyCharm). Simply changing

Returns:
    A `Tensor`...

to

Returns: 
    Tensor: A `Tensor` ...

would make these hints accessible to tools.

Finally, the automatic type inference just by the structure of the code should not be underestimated.
At least in PyCharm, if not isinstance(A, Type): raise TypeError() is enough for code completion to know that following this line, A is of type Type. It also follows types through attribute assignments in classes and function calls. This significantly reduces the amount of places where type hints become necessary.

While playing around with it, I found, for example, that for Tensor().dtype to be recognized correctly, it was sufficient to annotate the as_dtype function correctly.

@huan
Copy link
Contributor

huan commented Jan 25, 2018

Would be great if we can support Python 3 Typing in TensorFlow.

We will benefit from both

  1. Static Type Checking, and
  2. Hint Linting.

@jakubLangr
Copy link

jakubLangr commented May 9, 2018

Agreed. I see there's some stuff in contrib and couple of TODOs but has anything been agreed? Would make it a lot easier to integrate in production systems.

@darthdeus
Copy link

darthdeus commented May 22, 2018

Probably a silly question, but is there currently any way to integrate tensorflow with mypy? (or any other way to get typechecking with TF)

@shoyer
Copy link
Contributor

shoyer commented May 22, 2018

I think external type-stubs for TensorFlow would make sense, along the lines of what we've started doing for numpy with numpy-stubs. This would allow for integrating tensorflow with mypy (which is currently not supported).

@shuttle1987
Copy link

shuttle1987 commented Jun 8, 2018

We started writing some stubs for our project: https://github.com/persephone-tools/persephone/tree/master/stubs/tensorflow this works well enough for our needs but something more comprehensive would be a lot better.

@shuttle1987
Copy link

shuttle1987 commented Jun 14, 2018

We decided to move our attempt at making stubs into an external library to make it easier to collaborate: https://github.com/persephone-tools/tensorflow-stubs

@shoyer
Copy link
Contributor

shoyer commented Jun 14, 2018

@shuttle1987
Copy link

shuttle1987 commented Jun 14, 2018

@shoyer thanks for that suggestion, I created an issue for it persephone-tools/tensorflow-stubs#2

@MInner
Copy link

MInner commented Aug 12, 2018

Just for the information of @shuttle1987 and others in the thread, there exist a MonkeyType project that aims at generating stubs by tracing actual code execution. Considering vast tensorflow test suites and official repositories, encountering all possible output types for the majority of function calls seems likely. One can even have a validation split and check whether stubs inferred from 90% of the code cover the 10% validation portion to get an estimate of the overall coverage :)

@darthdeus
Copy link

darthdeus commented Aug 14, 2018

Just as a small note regarding MonkeyType, it does work quite well. The only "problem" I encountered so far was when you're various tuples as Iterable[T] (don't ask me why), but it infers Union[Tuple[T], Tuple[T, T], ...].

@shuttle1987
Copy link

shuttle1987 commented Aug 15, 2018

@MInner if you want to make some test cases that should pass and expected-fail a PR would be very welcome over on https://github.com/persephone-tools/tensorflow-stubs/ I see this as a really good use case for the MonkeyType that you mentioned

@malmaud
Copy link
Contributor

malmaud commented Feb 27, 2019

PyTorch just implemented this pytorch/pytorch#12500 via stubs and the code completion it enables is a huge convenience. No pressure though :)

@huan
Copy link
Contributor

huan commented Feb 28, 2019

It will be awesome if we could support this feature by stubs, or it will be more awesome if we could support typing natively!

@ngc92
Copy link
Contributor

ngc92 commented Mar 29, 2019

wouldn't it be possible to add python 3 style type hints, and just automatically remove them when building python2 wheels?

@srilman
Copy link

srilman commented Mar 13, 2020

Since TF2.1 is the last version to support Python 2, shouldn't all later versions support at least 3.5? Then the native syntax can be used instead of the comments.

Seems like https://github.com/persephone-tools/tensorflow-stubs/ hasn't been updated in a while. Would love to see (or even start making) some progress on this front!

@martinwicke
Copy link
Member

martinwicke commented Mar 14, 2020

@srilman
Copy link

srilman commented Mar 14, 2020

Funny enough @martinwicke I had just found that RFC right when you commented. Looks great to me! But from my understanding the RFC is to consolidate custom types into a single module rather than adding type hints to the public API. Is that true @mdanatg?

@mdanatg
Copy link

mdanatg commented Mar 14, 2020

@bhack
Copy link
Contributor

bhack commented Apr 8, 2020

I add also https://github.com/microsoft/pyright to the resources cause was not mentioned in the RFC.

@hoefling
Copy link

hoefling commented Aug 2, 2020

Now that tensorflow/community#208 is accepted, may we start submitting pull requests that add type hints to the code base?

@martinwicke
Copy link
Member

martinwicke commented Aug 3, 2020

@mdanatg I think type hints should be acceptable now? Or do we need to wait for the type classes?

@mdanatg
Copy link

mdanatg commented Aug 3, 2020

Yes, we can start adding type annotations, with a few caveats, see below. In addition, there is now TensorLike (also unparameterized for now) which captures some of the types that TensorFlow autoboxes. More will be added to the list.

The caveats:

  • some directories may need to be upgraded, but that's a straightforward change that we can make as PRs are being sent; that means initial PRs may take a bit longer to merge
  • only simple tf.Tensor annotations can be currently added. We're still working on letting you specify tf.Tensor[tf.Int32] or tf.Tensor[Any], but there are some compatibility issues delaying that. In the future, tf.Tensor will be equivalent with tf.Tensor[Any]
  • all this is still in early stages. Expect a few bugs until we've stabilized everything - would appreciate opening up new issues when coming across such bugs

@EPronovost
Copy link

EPronovost commented Sep 5, 2020

@mdanatg are there any examples of adding type annotations so far? I'd be interested in helping.

@mdanatg
Copy link

mdanatg commented Sep 5, 2020

@EPronovost I'm still working to sort out some blockers to making Tensor a generic class (#40921, which was temporarily rolled back as an internal test broke), and to extend DType so that it can be used in annotations (#40696). But until those land, we can still use a plain tf.Tensor as type annotation. There isn't an example yet, but would you be interested in creating one, for example by choosing a simple op (like tf.Identity) and trying to add annotations for it? It's mainly a matter of adding the annotation and making sure build/tests still work.

@EPronovost
Copy link

EPronovost commented Sep 6, 2020

@mdanatg I'd be interested in helping. Is mypy included as part of the TF test suite? If Tensor annotations are still a WIP, would it be easier to start with some of the higher-level APIs (e.g. Dataset or keras.Model methods)? Sorry if I'm missing something; I wasn't able to find much just searching.

@mdanatg
Copy link

mdanatg commented Sep 8, 2020

@EPronovost working on other types first sounds like a good idea too. Note that Keras is moving to a stand-alone repository, and TF will not depend on it. But things like Dataset, TensorShape, etc. all sound good to me. I think it would be a good idea to do some experimentation first.
Currently, TF only has internal tests for type annotations, which use pytype. Setting up a set of tests which use mypy would also be a good initial step.

@hmc-cs-mdrissi
Copy link

hmc-cs-mdrissi commented Jan 25, 2022

@mdanatg Would it be reasonable to work on type stubs (.pyi) files in typeshed that try to approximate public api while issues of tensor genericness/runtime get worked out.

As right now it's unclear what expected progress of this issue is. I think having runtime genericness while nice isn't needed initially. The primary motivation I have for types is a better IDE/type checker experience. Worst case I can use either from __future__ import annotations or string for generic tensors.

The other complicating aspect is how accurate the first set of types need to be. Which functions handle tf.Tensor vs SparseTensor vs EagerTensor vs etc/genericness of shape/data type/device. I think imperfect type stubs (as long as they lean towards false negatives instead of false positives) would be much better than no types given probable complexity of correct types on first try.

@mdanatg
Copy link

mdanatg commented Jan 25, 2022

I think working on stubs is reasonable to me but I think it's orthogonal to fixing the type hierarchy. Whether you add stubs or the annotations directly in the code (which is possible now), the main factor is how accurate those annotations will be. I think it's important for a type checker to be accurate and not have false positives or negatives, otherwise users will just start to ignore it.

@hmc-cs-mdrissi
Copy link

hmc-cs-mdrissi commented Jan 26, 2022

I view false negatives as an acceptable evil here. The current lack of types is essentially an extreme false negative of never report an error. If we add type stubs that detect some type errors, but miss others then any information they give to a user will at least help them fix something. It will miss things that they may expect a type checker to catch, but current untyped status quo misses everything. If you add fully correct types to a single file at a time you can call the codebase as partly typed but still be full of false negatives for the other files. So I can't see a situation where types with false negatives is worse than current status quo.

False positives are very bad as noisy fake error messages condition users to treat all errors as mistakes from the type checker.

Long term goal I agree is to have minimal false negatives/positives (ideally 0 within constraints of type system).

@nyngwang
Copy link

nyngwang commented Mar 6, 2022

@mdanatg [...] But things like Dataset, TensorShape, etc. all sound good to me. [...] link

Link with my issue: tensorflow/datasets#3822

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
stat:contributions welcome Status - Contributions welcome type:feature Feature requests
Projects
None yet
Development

No branches or pull requests