Skip to content

issues Search Results · repo:pytorch/ao language:Python

Filter by

432 results
 (69 ms)

432 results

inpytorch/ao (press backspace or delete to remove)

Hi, I am following this example and want to save the INT8 static quantization result, but it’s failing. Could you take a look, thanks! ... # quantized linear represented as an nn.Linear with ...
  • yiliu30
  • Opened 
    2 days ago
  • #1950

placeholder, TODO fill me out
float8
  • vkuzo
  • Opened 
    3 days ago
  • #1945

Hi folks, not a bug. In torchtune, importing the library takes ~7s. When I profile it, majority is coming from torchao imports. just a simple import torchao takes ~4s import time start = time.perf_counter() ...
  • felipemello1
  • 1
  • Opened 
    3 days ago
  • #1944

When using FSDP2 for Float8 training, an issue occurs when the number of GPUs exceeds the out_features of an nn.Linear layer. Specifically, FSDP2 splits the weight tensor into a shape of [0, in_features] ...
float8
  • HIT-cwh
  • 3
  • Opened 
    4 days ago
  • #1938

In meta kernels in torchao/experimental/ops, we do things like: return torch::empty({num_out, k}).to( meta ); We should create meta tensors directly if possible.
  • metascroy
  • 4
  • Opened 
    6 days ago
  • #1936

I am having some strange issue with low bit optimizer and the combination of FSDP2 and CPU Offloading: torch._dynamo.exc.TorchRuntimeError: Dynamo failed to run FX node with fake tensors: call_method ...
  • psinger
  • 1
  • Opened 
    6 days ago
  • #1931

Hi, I did some benchmark on LLM models with int4_weight_only on CPU/GPU/XPU and expected to see models have E2E speed up compared with pure bf16/fp16. From the aspect of kernel, int4 GEMM kernels are ...
  • LuFinch
  • 11
  • Opened 
    7 days ago
  • #1930

When I fp8 quantize a model and then shard it using FSDP2, it reports an error: [rank1]: Traceback (most recent call last): [rank1]: File /mnt/teams/algo-teams/shared/code/wanx-inference/generate.py ...
  • happynear
  • 2
  • Opened 
    8 days ago
  • #1929

Grouped GEMM kernels (https://github.com/fanshiqing/grouped_gemm) are used in many MoE models. I just wander does torchao support FP8 kernels for Grouped GEMM, such like the three commonly used ops: ...
float8
  • zigzagcai
  • 5
  • Opened 
    8 days ago
  • #1928

We ve come up with a training recipe for 2:4 activation sparsity, which is outlined in this paper: https://openreview.net/pdf?id=O5feVk7p6Y The gist of this approach is that: 1) we find high level of ...
good first issue
  • jcaip
  • Opened 
    9 days ago
  • #1920
Issue origami icon

Learn how you can use GitHub Issues to plan and track your work.

Save views for sprints, backlogs, teams, or releases. Rank, sort, and filter issues to suit the occasion. The possibilities are endless.Learn more about GitHub Issues
ProTip! 
Restrict your search to the title by using the in:title qualifier.
Issue origami icon

Learn how you can use GitHub Issues to plan and track your work.

Save views for sprints, backlogs, teams, or releases. Rank, sort, and filter issues to suit the occasion. The possibilities are endless.Learn more about GitHub Issues
ProTip! 
Press the
/
key to activate the search input again and adjust your query.
Issue search results · GitHub