-
Google Inc.
- Kirkland, WA
- http://yibozhu.com
Stars
Official Repo for Open-Reasoner-Zero
Latency and Memory Analysis of Transformer Models for Training and Inference
A model compilation solution for various hardware
A high-performance, extensible Python AOT compiler.
CV-CUDA™ is an open-source, GPU accelerated library for cloud-scale image processing and computer vision.
The Torch-MLIR project aims to provide first class support from the PyTorch ecosystem to the MLIR ecosystem.
PipeSwitch: Fast Pipelined Context Switching for Deep Learning Applications
A performant and modular runtime for TensorFlow
Jittor is a high-performance deep learning framework based on JIT compiling and meta-operators.
Enabling PyTorch on XLA Devices (e.g. Google TPU)
User space software for Intel(R) Resource Director Technology
A high performance and generic framework for distributed DNN training
bytedance / incubator-mxnet
Forked from apache/mxnetLightweight, Portable, Flexible Distributed/Mobile Deep Learning with Dynamic, Mutation-aware Dataflow Dep Scheduler; for Python, R, Julia, Scala, Go, Javascript and more
bytedance / ps-lite
Forked from dmlc/ps-liteA lightweight parameter server interface
Slim: OS Kernel Support for a Low-Overhead Container Overlay Network
Keras implementation of BERT with pre-trained weights
Implementation of BERT that could load official pre-trained models for feature extraction and prediction
Distributed training framework for TensorFlow, Keras, PyTorch, and Apache MXNet.
Disseminated, Distributed OS for Hardware Resource Disaggregation. USENIX OSDI 2018 Best Paper.
High performance container overlay networks on Linux. Enabling RDMA (on both InfiniBand and RoCE) and accelerating TCP to bare metal performance. Freeflow requires zero modification on application …
run any binary and augment its output and periods of inactivity with memory usage differentials (LD_PRELOAD hax)