Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Update and align numpy versions in TensorFlow #48918

Closed
wants to merge 2 commits into from

Conversation

bhack
Copy link
Contributor

@bhack bhack commented May 5, 2021

This PR is just to explore CI error when we align and upgrade numpy in TF.

When finalized it will fix #47691

@google-ml-butler google-ml-butler bot added the size:M CL Change Size: Medium label May 5, 2021
@google-ml-butler
Copy link

Thanks for contributing to TensorFlow Lite Micro.

To keep this process moving along, we'd like to make sure that you have completed the items on this list:

We would like to have a discussion on the Github issue first to determine the best path forward, and then proceed to the PR review.

@review-notebook-app
Copy link

Check out this pull request on  ReviewNB

See visual diffs & provide feedback on Jupyter Notebooks.


Powered by ReviewNB

@google-cla google-cla bot added the cla: yes label May 5, 2021
@bhack bhack marked this pull request as ready for review May 5, 2021 20:22
@bhack
Copy link
Contributor Author

bhack commented May 5, 2021

Many of the Docker images with python < 3.7 will fail cause we don't have numpy v1.20 for older python version.
Python upgrade for the devel and devel derivated images is at #48371.

@NeilGirdhar
Copy link
Contributor

I don't know what the TensorFlow team is planning, but according to NEP 29, "all projects across the Scientific Python ecosystem" were supposed to drop Python 3.6 last June.

@bnavigator
Copy link
Contributor

This is wrong. You don't have to specify a minimum Numpy 1.20. Just don't require a maximum numpy, by not relying on features deprecated long ago and finally removed in 1.19 or 1.20.

@mihaimaruseac
Copy link
Collaborator

After the next release, we can drop py3.6 and then revisit this.

@bhack
Copy link
Contributor Author

bhack commented May 5, 2021

@bnavigator It is not the logic of this PR. This is just a CI exploration to expose public logs for all the users that are pushing for 1.20 in the ticket. I am forcing 1.20.x to see the effect on our VMs, Community VM, Docker images, and TF tests on all the platforms.

More in general, but probably @mihaimaruseac can comment better then me, I suppose that currently as we are under release, we don't have the bandwidth to control all the python env matrix for all the numpy versions and probably it will be easier to evaluate python 3.6 removal and python 2.7 minimal version to work with an updated version of numpy.
Then probably we could finetune what will be the minimal numpy version that we want to support considering also NEP29.

@bnavigator
Copy link
Contributor

I disagree. Comments like #48918 (comment) and #48918 (comment) show that this diverts energy into the wrong direction. Supporting NumPy 1.20 does have nothing to do with dropping 1.19 or python 3.6. Python 3.6 along with a compatible older NumPy (and SciPy and many more packages) are still the mainline on some major LTS Linux distros. Just relaxing the upper bounds on NumPy should suffice to let any reasonable resolver take the newest available version.

And why would you want to drop python 3.6 support but still talk about python 2.7? That one is really EOL.

@bhack
Copy link
Contributor Author

bhack commented May 5, 2021

2.7 it was just a typo as it is EOL it was 3.7 of course.

If you think that you have a better solution for your scope and you have the best min/max numpy ranges, feel free to open a new Pull request we we will review that without any problems.

Here I just want to verify CI VM and CI results with 1.20 and aligned numpy versions (they diverted in many places) as in the PR description:

This PR is just to explore CI error when we align and upgrade numpy in TF.

Then I want to point out that for many of us, activities like this are only voluntary activities as the TF internal teams commit directly from the internal system and generally they don't use PRs.

The code is available for everyone and I believe it is appropriate to have a more collaborative approach when we comment on tickets as it is defined in our community code of conduct.

@mihaimaruseac
Copy link
Collaborator

@bnavigator: Given how deep TF integrates with numpy, having a wide range on the dependency results in broken CI in the current build system. Easiest fix at the numpy 1.20 release time has been to force TF to use older numpy. @bhack is currently trying to unblock this / see what are the remaining breakages.

@gbaned gbaned self-assigned this May 6, 2021
@gbaned gbaned added the comp:micro Related to TensorFlow Lite Microcontrollers label May 6, 2021
@gbaned gbaned added this to Assigned Reviewer in PR Queue via automation May 6, 2021
@gbaned gbaned requested a review from advaitjain May 6, 2021 06:53
@fsx950223 fsx950223 mentioned this pull request May 8, 2021
copybara-service bot pushed a commit that referenced this pull request May 13, 2021
Imported from GitHub PR #49008

Related #48918
Copybara import of the project:

--
e9924bf by fsx950223 <fsx950223@outlook.com>:

fix numpy 1.20

--
509a10c by fsx950223 <fsx950223@outlook.com>:

add test case

PiperOrigin-RevId: 373618749
Change-Id: I925da021db2bdb5a1b35f0d187d0007b7802ff1c
@gbaned gbaned added the awaiting review Pull request awaiting review label May 20, 2021
@advaitjain advaitjain removed the comp:micro Related to TensorFlow Lite Microcontrollers label May 20, 2021
@tupui
Copy link

tupui commented May 25, 2021

It's not because you still want to support python3.6 that your CI must run everything on 3.6 and then you are not able to instal NumPy's latest.

Have a look at our CI for SciPy. We test a bunch of configurations and test the develop version of NumPy only in one particular workflow. This way we can ensure forward compatibility while still being able to test on older systems on other workflows. You could be fully compatible with NEP29 if wanted... Now you're just breaking everyones workflows.

@shoyer
Copy link
Contributor

shoyer commented Jun 11, 2021

I'll just note that it's also possible to test against the development version of upstream dependencies like NumPy. This is a pretty common practice for core projects in the scientific Python ecosystem and makes a huge difference, often identifying upstream bugs even before there's a release.

@tupui
Copy link

tupui commented Jun 12, 2021

Most of Google's CI is still on Python 3.6. Since there's no numpy 1.20 for that version, we cannot use it.

OC you can as you could have one workflow dedicated to forward compatibility check. You would test latest Python, NumPy, etc. Nothing complicated here, just willingness to do so... Again, cf SciPy's CI for example.

@bhack
Copy link
Contributor Author

bhack commented Jun 12, 2021

Most of Google's CI is still on Python 3.6. Since there's no numpy 1.20 for that version, we cannot use it.

OC you can as you could have one workflow dedicated to forward compatibility check. You would test latest Python, NumPy, etc. Nothing complicated here, just willingness to do so... Again, cf SciPy's CI for example.

If you are interested to contribute to the general discussion I've started a new thread in our forum: https://discuss.tensorflow.org/t/extend-the-collaboration-on-the-ci/1917

@bnavigator
Copy link
Contributor

If you are interested to contribute to the general discussion I've started a new thread in our forum:

Thanks for the initiative. I would comment there, but somehow the device 2FA for the google login does not succeed at them moment. 🤷‍♂️ (Plus, I actually don't want to use my personal gmail address for a public forum)

Most of Google's CI is still on Python 3.6. Since there's no numpy 1.20 for that version, we cannot use it.

I must say I am a bit confused. In the release notes about TF 2.5.0 the first bullet point is that you now support Python 3.9. How do you do that if you don't have a CI for it?

@bhack
Copy link
Contributor Author

bhack commented Jun 12, 2021

Thanks for the initiative. I would comment there, but somehow the device 2FA for the google login does not succeed at them moment. 🤷‍♂️ (Plus, I actually don't want to use my personal gmail address for a public forum)

If you need support with 2FA issue you could send a message to forum-help@tensorflow.org
Also your gmail address will not be disclosed.

/cc @theadactyl

@mihaimaruseac
Copy link
Collaborator

mihaimaruseac commented Jun 13, 2021

I'll just note that it's also possible to test against the development version of upstream dependencies like NumPy. This is a pretty common practice for core projects in the scientific Python ecosystem and makes a huge difference, often identifying upstream bugs even before there's a release.

Yes, it is possible, but very costly. Our support matrix is already not manageable, unfortunately.

@mihaimaruseac
Copy link
Collaborator

I must say I am a bit confused. In the release notes about TF 2.5.0 the first bullet point is that you now support Python 3.9. How do you do that if you don't have a CI for it?

Internal CI is locked to py3.6. Google has a one version policy: all tools, libraries and binaries used anywhere inside Google are pinned to the exact same policy, regardless of team and product.

Externally, we are building py3.6-py3.9, as you can see here

@tupui
Copy link

tupui commented Jun 14, 2021

I must say I am a bit confused. In the release notes about TF 2.5.0 the first bullet point is that you now support Python 3.9. How do you do that if you don't have a CI for it?

Internal CI is locked to py3.6. Google has a one version policy: all tools, libraries and binaries used anywhere inside Google are pinned to the exact same policy, regardless of team and product.

Externally, we are building py3.6-py3.9, as you can see here

Building for something else than 3.6 is not the same as having a CI for something else than 3.6 🤔
Sounds like a weird policy to prevent you to have a better CI... Again, matrices are for that.

@bnavigator
Copy link
Contributor

Again, matrices are for that.

Granted, if a single matrix entry takes half a day to build, you want a small matrix. On the other hand, having "internal" and "public" CIs doing duplicate work is a disservice to the planet's climate, too.

@tupui
Copy link

tupui commented Jun 14, 2021

Again, matrices are for that.

Granted, if a single matrix entry takes half a day to build, you want a small matrix. On the other hand, having "internal" and "public" CIs doing duplicate work is a disservice to the planet's climate, too.

This is why there are mechanism to have conditional CI or even skip some workflows (things like [skip ci] or [skip actions] in the commit message). The classic example being, if you only push docs, you don't run the tests.

If there is no discipline with CI, of course it's intractable. But well executed, you can add lot of exotic plateformes or setups at no extra cost.

@adriangb
Copy link
Contributor

Also caching of the C++ compilation, which I think @bhack was working on, would help a lot. In my experience, running the Python tests takes a fraction of the time of compilation (and even that could be sped up eg. by using pytest-xdist).

@mihaimaruseac
Copy link
Collaborator

mihaimaruseac commented Jun 14, 2021

I must say I am a bit confused. In the release notes about TF 2.5.0 the first bullet point is that you now support Python 3.9. How do you do that if you don't have a CI for it?

Internal CI is locked to py3.6. Google has a one version policy: all tools, libraries and binaries used anywhere inside Google are pinned to the exact same policy, regardless of team and product.
Externally, we are building py3.6-py3.9, as you can see here

Building for something else than 3.6 is not the same as having a CI for something else than 3.6 🤔
Sounds like a weird policy to prevent you to have a better CI... Again, matrices are for that.

Internal CI only supports 3.6
External CI runs on py3.6..py3.9

Matrices work but also result in combinatorial explosion.

Internal CI is needed due to differences between internal TF and OSS TF. Also because Google monorepo.

@tupui
Copy link

tupui commented Jun 14, 2021

External CI runs on py3.6..py3.9

I thought you said this is just for building. Which we all seem to interpret (thumbs up) as not running unit tests.

Matrices work but also result in combinatorial explosion.

Sure but nothing you cannot manage if conditionally skipped on some particular events. But I agree this adds some complexity.

As such a prominent package, I guess we all expect more rigour here. I mean, here we are just asking that you support latest NumPy and maybe check these in advance in your CI. Nothing esoteric which is not already done by all big packages.

@mihaimaruseac
Copy link
Collaborator

External CI runs on py3.6..py3.9

I thought you said this is just for building. Which we all seem to interpret (thumbs up) as not running unit tests.

My bad, I was not clear (though the linked CI scripts would have shown that we run tests). We have multiple CI types:

  1. PR presubmits: one single version of Python, just spreading on multiple OSes and configs (although not all of them are blocking)
  2. Continuous builds: whenever a new change lands and there is no continuous build running a new one is triggered at HEAD. This again covers only one version of Python and is spread across multiple OSes but is supposed to have buildcops monitoring these and immediately surfacing breakages.
  3. Nightly builds: 2 versions, all supported Python versions and OSes. Supposed to have buildcops to surface breakages too. The 2 versions are:
    • nonpip: just bazel test of the code base, just like all the builds above
    • pip: bazel build for the pip package, install the pip package, run tests against it
  4. Release builds: same as nightly builds but run on the release process, after branch cut (multiple times, at least ~2 times for each RC + final)

Between 1 and 2, during the PR import process there are presubmits builds inside Google that test the internal version of TF, on a single version of Python, single OS (Linux)

Step 2 is actually duplicated: we build from HEAD of OSS and we also build from HEAD of internal code exposing that to a local OSS repo.

Steps 3 and 4 have a few builds that are outside the pip/nonpip separation, but not relevant here. See the build scrips linked before for all details

Matrices work but also result in combinatorial explosion.

Sure but nothing you cannot manage if conditionally skipped on some particular events. But I agree this adds some complexity.

Some of these don't depend on our team. Changing the Google behemoth is very slow, but we're working on this.

As such a prominent package, I guess we all expect more rigour here. I mean, here we are just asking that you support latest NumPy and maybe check these in advance in your CI. Nothing esoteric which is not already done by all big packages.

This is understood and we're working on making things easier for OSS contributors. But these are processes that take months given the whole CI structure and org setup

@adriangb
Copy link
Contributor

adriangb commented Jun 14, 2021

We understand that Google is a large complex machine, and it is great that TF is even FOSS in the first place. I hope I speak for everyone in saying that we appreciate the work you do and just want to help however we can. As long as we keep the discussions going and things moving forward, we'll all benefit.

@gbaned
Copy link
Contributor

gbaned commented Jun 25, 2021

@bhack Can you please resolve conflicts? Thanks!

@gbaned gbaned added stat:awaiting response Status - Awaiting response from author and removed awaiting review Pull request awaiting review labels Jun 25, 2021
@bhack
Copy link
Contributor Author

bhack commented Jun 25, 2021

By now this is just a monitor PR. I am waiting for @mihaimaruseac feedback on when I need to resync.

@gbaned gbaned added stat:awaiting tensorflower Status - Awaiting response from tensorflower and removed stat:awaiting response Status - Awaiting response from author labels Jul 8, 2021
@gbaned gbaned added awaiting review Pull request awaiting review and removed stat:awaiting tensorflower Status - Awaiting response from tensorflower labels Aug 10, 2021
@bhack bhack closed this Oct 4, 2021
PR Queue automation moved this from Assigned Reviewer to Closed/Rejected Oct 4, 2021
@google-ml-butler google-ml-butler bot removed the awaiting review Pull request awaiting review label Oct 4, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
cla: yes size:M CL Change Size: Medium
Projects
PR Queue
  
Closed/Rejected
Development

Successfully merging this pull request may close these issues.

Numpy v1.20+ compatibility
9 participants