FEAT: support fp4 and int8 quantization for pytorch model #238

pangyoki · 2023-07-24T09:22:18Z

issue #230 and #239

xinference/model/llm/pytorch/core.py

support fp4 and int8 quantization for pytorch model

71e2440

XprobeBot added the feature label Jul 24, 2023

XprobeBot added this to the v0.1.0 milestone Jul 24, 2023

support quantization if system is not linux or device is not cuda

6c79197

pangyoki force-pushed the pytorch_support_quantization branch from f752c49 to 6c79197 Compare July 25, 2023 04:11

fix int8 cache_dir

5be5e5a

pangyoki force-pushed the pytorch_support_quantization branch from 37b69d1 to 5be5e5a Compare July 25, 2023 06:30

UranusSeven reviewed Jul 25, 2023

View reviewed changes

xinference/model/llm/pytorch/core.py Outdated Show resolved Hide resolved

xinference/model/llm/pytorch/core.py Outdated Show resolved Hide resolved

fix

765ce16

UranusSeven approved these changes Jul 26, 2023

View reviewed changes

UranusSeven merged commit e4115d1 into xorbitsai:main Jul 26, 2023
9 checks passed

Provide feedback