Skip to content
This repository was archived by the owner on Oct 9, 2024. It is now read-only.

Add configs to run int4 inference #37

Open
wants to merge 5 commits into
base: main
Choose a base branch
from
Open
Changes from 1 commit
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Prev Previous commit
Next Next commit
change the quantization config format to work with the new style at D…
…eepSpeed
  • Loading branch information
Reza Yazdani committed Nov 19, 2022
commit 99cd7c9d8b2f0228145d0134d0a9570a6ac8cf71
5 changes: 4 additions & 1 deletion bloom-inference-scripts/bloom-ds-inference.py
Original file line number Diff line number Diff line change
@@ -174,7 +174,10 @@ def write_checkponts_json():
kwargs = dict(replace_with_kernel_inject=True)
# specify number of bits to choose between in4/int8
if args.dtype == 'int8' or args.dtype == 'int4':
kwargs.update({'quantization_bits': 8 if args.dtype == 'int8' else 4})
quant_config = "{'quant': {'enabled':True, 'weight':{'num_bits': 8}}}"
kwargs.update(eval(quant_config))
if args.dtype == 'int4':
kwargs['quant']['weight']['num_bits'] = 4
else:
kwargs = dict(injection_policy={BloomBlock: ("self_attention.dense", "mlp.dense_4h_to_h")})