Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add support for Python 3.x #1

Closed
mrry opened this issue Nov 9, 2015 · 48 comments
Closed

Add support for Python 3.x #1

mrry opened this issue Nov 9, 2015 · 48 comments
Assignees

Comments

@mrry
Copy link
Contributor

mrry commented Nov 9, 2015

Currently we only support Python 2.7, but we should support Python 3.

@girving
Copy link
Contributor

girving commented Nov 9, 2015

Main things this involves: print -> print(), handle __floordiv__ / __truediv__ / __div__ correctly.

@cdnsteve
Copy link

cdnsteve commented Nov 9, 2015

👍 to this issue

@dvbuntu
Copy link
Contributor

dvbuntu commented Nov 9, 2015

👍

1 similar comment
@ashwin31
Copy link

ashwin31 commented Nov 9, 2015

👍

@girving girving self-assigned this Nov 9, 2015
@joestepp
Copy link

joestepp commented Nov 9, 2015

👍

@girving
Copy link
Contributor

girving commented Nov 9, 2015

We're working on it.

@kevinaloys
Copy link

Python 3 is a must have. 👍

@mgcdanny
Copy link

mgcdanny commented Nov 9, 2015

How do we contribute towards python3 support? Are there any specific tickets open? Is there a python3 development branch?

@cdnsteve
Copy link

cdnsteve commented Nov 9, 2015

@mgcdanny seems they require contributors to sign an agreement first.
https://github.com/tensorflow/tensorflow/blob/master/CONTRIBUTING.md

@MikalaiDrabovich
Copy link
Contributor

👍

@girving
Copy link
Contributor

girving commented Nov 10, 2015

I'm running futurize on the code at the moment; once that's done it'll be easier to parallelize the remaining work. So far futurize --stage1 is checked in and I'm working through futurize --stage2 now (checking each of our divisions very carefully :)).

Unfortunately our contribution process needs a bit of improvement (we're working on streamlining it), but I'll see if there are natural chunks of work to break off and ask for help in fixing once the initial futurize push is done. Before that it's tricky to parallelize.

The only non-obvious questions (so far) are what to do with unicode and where. Cc'ing @mrry since he was taking a look at that. Most of the "names" for things, such as names for ops and names for tensors in a graph, are already restricted to be roughly alphanumeric, so hopefully we should be able to leave them as C++ string while still accepting unicode input from Python.

@girving
Copy link
Contributor

girving commented Nov 10, 2015

Also, question for people that have done such conversions before / recently: is six still the recommended way to make code transparently support both? In particular, we need stuff like xrange and iteritems as symbols that exist in both 2.7 and 3.x.

@madisonmay
Copy link

@girving I would follow the guide written by python3 core dev Brett Cannon: https://docs.python.org/3.5/howto/pyporting.html. He's been heavily involved with the push to move existing libs to python3.

@girving
Copy link
Contributor

girving commented Nov 10, 2015

@madisonmay: Thanks, should have asked you earlier. Modernize probably would have been a better choice, since after reading that page I am essentially taking the output of futurize and then rewriting it into what modernize might have already spit out. Mostly done with that phase, though, so I'll probably just finish it up using futurize.

For the record: futurize --stage was mostly print_function and a few other safe trivialities. I'll write a detailed comment on what futurize --stage2 involved for us once I'm done with it (still in the middle of reading diffs).

@yuncliu
Copy link

yuncliu commented Nov 10, 2015

+1

@Peque
Copy link

Peque commented Nov 10, 2015

I think it is pretty clear that there are many people interested in this issue. However, I would suggest users to use the Subscribe button instead of adding a new 👍.

screenshot from 2015-11-10 10-36-53

This way developers (and all subscribed users) can get notifications on useful future updates. 😉

PS: If you already posted a 👍 you might as well consider removing it to keep the discussion cleaner.

@girving
Copy link
Contributor

girving commented Nov 10, 2015

Thank you @Peque! It would be great if further comments were limited to technical discussions about Python 3 support.

@goodmami
Copy link

From the README:

The TensorFlow Python API currently requires Python 2.7:
we are working on adding support for Python 3.0.

If you require 2.7 for the 2.X line, then why restrict yourselves to 3.0 (which was released about the time of 2.6)? If you target 3.3+, then you'll save yourselves a lot of headache, especially if you're using u'あ' unicode strings. (also see unicode_literals).

And 👍 for six for more complicated things.

@vrv
Copy link

vrv commented Nov 11, 2015

Sorry, I meant Python 3, not 3.0 -- was in a rush to fix :P. I'll fix this today

@girving girving changed the title Add support for Python 3 Add support for Python 3.x Nov 11, 2015
@girving
Copy link
Contributor

girving commented Nov 11, 2015

@goodmami: Yes, we will likely only support 3.3+, possibly 3.4+ (I'm not sure what the differences are, but will look that up). I also changed the issue title.

@jli05
Copy link

jli05 commented Nov 11, 2015

I'd really like you support from 3.3 up to the latest release of Python (currently 3.5; then 4.0 in the future). This piece of software is going to be the heart of many things in human life in the future.

@girving
Copy link
Contributor

girving commented Nov 12, 2015

@jli05: https://docs.python.org/3/whatsnew/3.4.html makes it look like supporting 3.3 won't be any harder than 3.4, so we should be good to go. It is possible our internal tests will be run only in 3.4 (and 2.7), but if anything ever breaks we'll be happy to accept patches.

@6iloveyou
Copy link

@girving now this version support python3.4 ??

chunnienc added a commit to chunnienc/tensorflow that referenced this issue Apr 26, 2023
copybara-service bot pushed a commit that referenced this issue Jul 25, 2023
…-example

Updated Install libffi7 package step
copybara-service bot pushed a commit that referenced this issue Nov 2, 2023
Imported from GitHub PR openxla/xla#6599

FP8 cublasLt matmul uses fast accumulation when both operands' precision are DEFAULT. Otherwise fall back to high precision acuumulation. Issue#openxla/xla#6168

This PR is closely related to Flax PR-![3416](google/flax#3416).
Copybara import of the project:

--
a4140da8ca08cd2d4796a7b8f032827867a361bc by shuw <shuw@nvidia.com>:

Add FP8 fast accumulation support for cublasLt.

--
96845683cc4b1e7b947bc919fbf97d8865abeac9 by shuw <shuw@nvidia.com>:

Improve based on review #1

--
e906d7620780d2cf1fe8433c933648dcb98dc61d by shuw <shuw@nvidia.com>:

Improve based on review #2

Merging this change closes #6599

PiperOrigin-RevId: 578948593
Rjasuja referenced this issue in Rjasuja/tensorflow Nov 20, 2023
copybara-service bot pushed a commit that referenced this issue Dec 15, 2023
Imported from GitHub PR openxla/xla#7751

Due to fast accumulation being turned on in the forward mode, the cublasLt fp8 gemm with gelu epilogue can efficiently operate with a fused kernel. Compared against the XLA-generated gelu kernel on H100, the performance demonstrates some improvement for size of [8192, 4096] x [4096, 16384] + gelu:

Execution time for matmul using cublasLt and gelu (XLA): 1.28ms
Execution time for matmul_gelu using cublasLt: 1.25ms
Copybara import of the project:

--
e8abce3b41f68cae1bb625cdecd5885413a0781d by Shu Wang <shuw@nvidia.com>:

Support cublasLt Fp8 Approx Gelu epilogue fusion.

--
818127cf582af7ceba014d88bdf027857fc8f0e5 by shuw <shuw@nvidia.com>:

Remove F32 check

--
5ce3108a9bc8459e20456d23a3ae493ef7a6a387 by shuw <shuw@nvidia.com>:

Improve based on review #1

Merging this change closes #7751

PiperOrigin-RevId: 591236441
vbenni pushed a commit to vbenni/tensorflow that referenced this issue Dec 27, 2023
Signed-off-by: Ritul Jasuja <ritul.jasuja@intel.com>
copybara-service bot pushed a commit that referenced this issue Feb 23, 2024
…execution scope

Instead of always constructing while operation conditional in the default scope use the scope of a while operation itself.

This generates correct CUDA graph: https://gist.github.com/ezhulenev/a84192fe8b46a4bf1a934a8baa08ea60

Memeset operation launched in a scope #1 is not synchronized with initial condition handle update

PiperOrigin-RevId: 609742672
copybara-service bot pushed a commit that referenced this issue May 1, 2024
copybara-service bot pushed a commit that referenced this issue May 1, 2024
copybara-service bot pushed a commit that referenced this issue May 1, 2024
copybara-service bot pushed a commit that referenced this issue May 13, 2024
… to Initialize()

Imported from GitHub PR openxla/xla#12228

The first time that a NormThunk is executed, it will build a cudnn execution plan. This build step can hang if a NCCL collective is running at the same time. To fix this, I've moved the build step to take place during thunk initialization. We only observe this hang when using cudnn 9.

Here's a backtrace from the hang that will be fixed:
```
Thread 585 (Thread 0x7fb9391ff640 (LWP 41364) "main.py"):
#0  0x00007fd3d17cffd9 in ?? () from /lib/x86_64-linux-gnu/libc.so.6
#1  0x00007fd3d17da24f in pthread_rwlock_wrlock () from /lib/x86_64-linux-gnu/libc.so.6
#2  0x00007fd070967dfe in ?? () from /lib/x86_64-linux-gnu/libcuda.so.1
#3  0x00007fd0709c928a in ?? () from /lib/x86_64-linux-gnu/libcuda.so.1
#4  0x00007f1970d76102 in ?? () from /lib/x86_64-linux-gnu/libcudnn_engines_precompiled.so.9.1.0
#5  0x00007f1970f2c999 in ?? () from /lib/x86_64-linux-gnu/libcudnn_engines_precompiled.so.9.1.0
#6  0x00007f1970a7d4ab in ?? () from /lib/x86_64-linux-gnu/libcudnn_engines_precompiled.so.9.1.0
#7  0x00007f1970d0a9cb in ?? () from /lib/x86_64-linux-gnu/libcudnn_engines_precompiled.so.9.1.0
#8  0x00007fce60b2a98c in cudnn::backend::ExecutionPlan::finalize_internal() () from /lib/x86_64-linux-gnu/libcudnn_graph.so.9.1.0
#9  0x00007fce60aefbb1 in cudnn::backend::Descriptor::finalize() () from /lib/x86_64-linux-gnu/libcudnn_graph.so.9.1.0
#10 0x00007fce60b15bec in cudnnBackendFinalize () from /lib/x86_64-linux-gnu/libcudnn_graph.so.9.1.0
#11 0x00007fd2521b8f39 in cudnn_frontend::ExecutionPlanBuilder_v8::build() () from /usr/local/lib/python3.10/dist-packages/jaxlib/xla_extension.so
#12 0x00007fd2521734ba in stream_executor::gpu::(anonymous namespace)::GetExecPlanFromHeuristics(cudnn_frontend::OperationGraph_v8&&, stream_executor::gpu::(anonymous namespace)::CudnnHandle const&, bool) () from /usr/local/lib/python3.10/dist-packages/jaxlib/xla_extension.so
#13 0x00007fd25216ff9b in stream_executor::gpu::CudnnSupport::NormRunnerFromDesc(stream_executor::Stream*, stream_executor::dnn::AlgorithmDesc const&, stream_executor::dnn::NormKind, double, stream_executor::dnn::TensorDescriptor const&, stream_executor::dnn::TensorDescriptor const&, stream_executor::dnn::TensorDescriptor const&, std::optional<stream_executor::dnn::TensorDescriptor>, std::optional<stream_executor::dnn::TensorDescriptor>, std::optional<stream_executor::dnn::TensorDescriptor>, std::optional<stream_executor::dnn::TensorDescriptor>, std::optional<stream_executor::dnn::TensorDescriptor>, std::optional<stream_executor::dnn::TensorDescriptor>) () from /usr/local/lib/python3.10/dist-packages/jaxlib/xla_extension.so
#14 0x00007fd24e36b88b in stream_executor::dnn::NormOp::RunnerFromAlgorithmDesc(stream_executor::dnn::AlgorithmDesc const&, stream_executor::dnn::NormOp::Config, stream_executor::Stream*) () from /usr/local/lib/python3.10/dist-packages/jaxlib/xla_extension.so
#15 0x00007fd24e36ae37 in stream_executor::dnn::LazyOpRunner<stream_executor::dnn::NormOp>::GetOrCreateRunner(stream_executor::dnn::NormOp::Config, stream_executor::Stream*)::{lambda()#1}::operator()() const () from /usr/local/lib/python3.10/dist-packages/jaxlib/xla_extension.so
#16 0x00007fd24e36adbc in void absl::lts_20230802::base_internal::CallOnceImpl<stream_executor::dnn::LazyOpRunner<stream_executor::dnn::NormOp>::GetOrCreateRunner(stream_executor::dnn::NormOp::Config, stream_executor::Stream*)::{lambda()#1}>(std::atomic<unsigned int>*, absl::lts_20230802::base_internal::SchedulingMode, stream_executor::dnn::LazyOpRunner<stream_executor::dnn::NormOp>::GetOrCreateRunner(stream_executor::dnn::NormOp::Config, stream_executor::Stream*)::{lambda()#1}&&) () from /usr/local/lib/python3.10/dist-packages/jaxlib/xla_extension.so
#17 0x00007fd24e36a9bd in stream_executor::dnn::LazyOpRunner<stream_executor::dnn::NormOp>::GetOrCreateRunner(stream_executor::dnn::NormOp::Config, stream_executor::Stream*) () from /usr/local/lib/python3.10/dist-packages/jaxlib/xla_extension.so
#18 0x00007fd24e369d29 in xla::gpu::RunGpuNorm(xla::gpu::GpuNormConfig const&, stream_executor::DeviceMemoryBase const&, stream_executor::DeviceMemoryBase const&, stream_executor::DeviceMemoryBase const&, std::optional<stream_executor::DeviceMemoryBase>, std::optional<stream_executor::DeviceMemoryBase>, std::optional<stream_executor::DeviceMemoryBase>, std::optional<stream_executor::DeviceMemoryBase>, std::optional<stream_executor::DeviceMemoryBase>, std::optional<stream_executor::DeviceMemoryBase>, stream_executor::DeviceMemoryBase const&, stream_executor::Stream*, xla::gpu::RunNormOptions) () from /usr/local/lib/python3.10/dist-packages/jaxlib/xla_extension.so
#19 0x00007fd24e368be6 in xla::gpu::NormThunk::ExecuteOnStream(xla::gpu::Thunk::ExecuteParams const&) () from /usr/local/lib/python3.10/dist-packages/jaxlib/xla_extension.so
```
Copybara import of the project:

--
f53533087ba1ddcf65ad7cc6268ee89de4690d15 by Trevor Morris <tmorris@nvidia.com>:

Fix hang with cudnn layer norm by moving cudnn init to Initialize()

Merging this change closes #12228

FUTURE_COPYBARA_INTEGRATE_REVIEW=openxla/xla#12228 from trevor-m:tmorris-norm-init f53533087ba1ddcf65ad7cc6268ee89de4690d15
PiperOrigin-RevId: 633207155
copybara-service bot pushed a commit that referenced this issue May 13, 2024
… to Initialize()

Imported from GitHub PR openxla/xla#12228

The first time that a NormThunk is executed, it will build a cudnn execution plan. This build step can hang if a NCCL collective is running at the same time. To fix this, I've moved the build step to take place during thunk initialization. We only observe this hang when using cudnn 9.

Here's a backtrace from the hang that will be fixed:
```
Thread 585 (Thread 0x7fb9391ff640 (LWP 41364) "main.py"):
#0  0x00007fd3d17cffd9 in ?? () from /lib/x86_64-linux-gnu/libc.so.6
#1  0x00007fd3d17da24f in pthread_rwlock_wrlock () from /lib/x86_64-linux-gnu/libc.so.6
#2  0x00007fd070967dfe in ?? () from /lib/x86_64-linux-gnu/libcuda.so.1
#3  0x00007fd0709c928a in ?? () from /lib/x86_64-linux-gnu/libcuda.so.1
#4  0x00007f1970d76102 in ?? () from /lib/x86_64-linux-gnu/libcudnn_engines_precompiled.so.9.1.0
#5  0x00007f1970f2c999 in ?? () from /lib/x86_64-linux-gnu/libcudnn_engines_precompiled.so.9.1.0
#6  0x00007f1970a7d4ab in ?? () from /lib/x86_64-linux-gnu/libcudnn_engines_precompiled.so.9.1.0
#7  0x00007f1970d0a9cb in ?? () from /lib/x86_64-linux-gnu/libcudnn_engines_precompiled.so.9.1.0
#8  0x00007fce60b2a98c in cudnn::backend::ExecutionPlan::finalize_internal() () from /lib/x86_64-linux-gnu/libcudnn_graph.so.9.1.0
#9  0x00007fce60aefbb1 in cudnn::backend::Descriptor::finalize() () from /lib/x86_64-linux-gnu/libcudnn_graph.so.9.1.0
#10 0x00007fce60b15bec in cudnnBackendFinalize () from /lib/x86_64-linux-gnu/libcudnn_graph.so.9.1.0
#11 0x00007fd2521b8f39 in cudnn_frontend::ExecutionPlanBuilder_v8::build() () from /usr/local/lib/python3.10/dist-packages/jaxlib/xla_extension.so
#12 0x00007fd2521734ba in stream_executor::gpu::(anonymous namespace)::GetExecPlanFromHeuristics(cudnn_frontend::OperationGraph_v8&&, stream_executor::gpu::(anonymous namespace)::CudnnHandle const&, bool) () from /usr/local/lib/python3.10/dist-packages/jaxlib/xla_extension.so
#13 0x00007fd25216ff9b in stream_executor::gpu::CudnnSupport::NormRunnerFromDesc(stream_executor::Stream*, stream_executor::dnn::AlgorithmDesc const&, stream_executor::dnn::NormKind, double, stream_executor::dnn::TensorDescriptor const&, stream_executor::dnn::TensorDescriptor const&, stream_executor::dnn::TensorDescriptor const&, std::optional<stream_executor::dnn::TensorDescriptor>, std::optional<stream_executor::dnn::TensorDescriptor>, std::optional<stream_executor::dnn::TensorDescriptor>, std::optional<stream_executor::dnn::TensorDescriptor>, std::optional<stream_executor::dnn::TensorDescriptor>, std::optional<stream_executor::dnn::TensorDescriptor>) () from /usr/local/lib/python3.10/dist-packages/jaxlib/xla_extension.so
#14 0x00007fd24e36b88b in stream_executor::dnn::NormOp::RunnerFromAlgorithmDesc(stream_executor::dnn::AlgorithmDesc const&, stream_executor::dnn::NormOp::Config, stream_executor::Stream*) () from /usr/local/lib/python3.10/dist-packages/jaxlib/xla_extension.so
#15 0x00007fd24e36ae37 in stream_executor::dnn::LazyOpRunner<stream_executor::dnn::NormOp>::GetOrCreateRunner(stream_executor::dnn::NormOp::Config, stream_executor::Stream*)::{lambda()#1}::operator()() const () from /usr/local/lib/python3.10/dist-packages/jaxlib/xla_extension.so
#16 0x00007fd24e36adbc in void absl::lts_20230802::base_internal::CallOnceImpl<stream_executor::dnn::LazyOpRunner<stream_executor::dnn::NormOp>::GetOrCreateRunner(stream_executor::dnn::NormOp::Config, stream_executor::Stream*)::{lambda()#1}>(std::atomic<unsigned int>*, absl::lts_20230802::base_internal::SchedulingMode, stream_executor::dnn::LazyOpRunner<stream_executor::dnn::NormOp>::GetOrCreateRunner(stream_executor::dnn::NormOp::Config, stream_executor::Stream*)::{lambda()#1}&&) () from /usr/local/lib/python3.10/dist-packages/jaxlib/xla_extension.so
#17 0x00007fd24e36a9bd in stream_executor::dnn::LazyOpRunner<stream_executor::dnn::NormOp>::GetOrCreateRunner(stream_executor::dnn::NormOp::Config, stream_executor::Stream*) () from /usr/local/lib/python3.10/dist-packages/jaxlib/xla_extension.so
#18 0x00007fd24e369d29 in xla::gpu::RunGpuNorm(xla::gpu::GpuNormConfig const&, stream_executor::DeviceMemoryBase const&, stream_executor::DeviceMemoryBase const&, stream_executor::DeviceMemoryBase const&, std::optional<stream_executor::DeviceMemoryBase>, std::optional<stream_executor::DeviceMemoryBase>, std::optional<stream_executor::DeviceMemoryBase>, std::optional<stream_executor::DeviceMemoryBase>, std::optional<stream_executor::DeviceMemoryBase>, std::optional<stream_executor::DeviceMemoryBase>, stream_executor::DeviceMemoryBase const&, stream_executor::Stream*, xla::gpu::RunNormOptions) () from /usr/local/lib/python3.10/dist-packages/jaxlib/xla_extension.so
#19 0x00007fd24e368be6 in xla::gpu::NormThunk::ExecuteOnStream(xla::gpu::Thunk::ExecuteParams const&) () from /usr/local/lib/python3.10/dist-packages/jaxlib/xla_extension.so
```
Copybara import of the project:

--
f53533087ba1ddcf65ad7cc6268ee89de4690d15 by Trevor Morris <tmorris@nvidia.com>:

Fix hang with cudnn layer norm by moving cudnn init to Initialize()

Merging this change closes #12228

PiperOrigin-RevId: 633220207
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests