Skip to content
View bobzhuyb's full-sized avatar

Organizations

@bytedance

Block or report bobzhuyb

Block user

Prevent this user from interacting with your repositories and sending you notifications. Learn more about blocking users.

You must be logged in to block users.

Please don't include any personal information such as legal names or email addresses. Maximum 100 characters, markdown supported. This note will be visible to only you.
Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse
Showing results

Official Repo for Open-Reasoner-Zero

Python 1,841 91 Updated Apr 8, 2025
Python 4,148 336 Updated Mar 12, 2025

Running BERT without Padding

C++ 471 54 Updated Mar 18, 2022
Python 64 8 Updated Feb 13, 2022

Latency and Memory Analysis of Transformer Models for Training and Inference

Python 403 46 Updated Mar 4, 2025

A model compilation solution for various hardware

MLIR 419 47 Updated Apr 9, 2025

A high-performance, extensible Python AOT compiler.

C++ 422 39 Updated Sep 26, 2023

CV-CUDA™ is an open-source, GPU accelerated library for cloud-scale image processing and computer vision.

C++ 2,489 227 Updated Mar 26, 2025

The Torch-MLIR project aims to provide first class support from the PyTorch ecosystem to the MLIR ecosystem.

C++ 1,491 532 Updated Apr 11, 2025

Arbitrary offloads for RDMA NICs

C 89 20 Updated Apr 25, 2022
Python 214 24 Updated Aug 17, 2023

PipeSwitch: Fast Pipelined Context Switching for Deep Learning Applications

Python 126 34 Updated May 9, 2022

A performant and modular runtime for TensorFlow

C++ 759 122 Updated Feb 21, 2025

Jittor is a high-performance deep learning framework based on JIT compiling and meta-operators.

Python 3,137 316 Updated Mar 18, 2025

Enabling PyTorch on XLA Devices (e.g. Google TPU)

Python 2,584 504 Updated Apr 12, 2025

User space software for Intel(R) Resource Director Technology

C 711 187 Updated Mar 25, 2025

A high performance and generic framework for distributed DNN training

Python 3,675 491 Updated Oct 3, 2023

Lightweight, Portable, Flexible Distributed/Mobile Deep Learning with Dynamic, Mutation-aware Dataflow Dep Scheduler; for Python, R, Julia, Scala, Go, Javascript and more

C++ 4 4 Updated Nov 9, 2019

A lightweight parameter server interface

C++ 76 24 Updated Jan 13, 2023

Slim: OS Kernel Support for a Low-Overhead Container Overlay Network

C 114 20 Updated Oct 30, 2020

Keras implementation of BERT with pre-trained weights

Python 814 196 Updated Jul 26, 2019

Implementation of BERT that could load official pre-trained models for feature extraction and prediction

Python 2,421 511 Updated Jan 22, 2022

Distributed training framework for TensorFlow, Keras, PyTorch, and Apache MXNet.

Python 14,447 2,255 Updated Feb 1, 2025

Disseminated, Distributed OS for Hardware Resource Disaggregation. USENIX OSDI 2018 Best Paper.

C 487 75 Updated May 6, 2021

Efficient RPCs for datacenter networks

C++ 879 140 Updated May 9, 2024

Run *any* binary in *any* container.

C 50 9 Updated Dec 22, 2015

High performance container overlay networks on Linux. Enabling RDMA (on both InfiniBand and RoCE) and accelerating TCP to bare metal performance. Freeflow requires zero modification on application …

C 620 93 Updated Jun 12, 2023

run any binary and augment its output and periods of inactivity with memory usage differentials (LD_PRELOAD hax)

C 35 8 Updated Oct 2, 2024

:bowtie:Yet Another BGP Python Implementation

Python 239 70 Updated Apr 10, 2025
Next
Showing results