float16 quantization + granular quantization #3340

JesseFarebro · 2020-05-28T05:06:29Z

As discussed in #3221 this enables support for quantizing to float16 over the wire. Some models perform better with float16 quantization versus the current integer-based affine scaling method.

I also needed the ability so support a form of granular quantization where I want to quantize only certain values. This PR also includes these changes to the converter, no changes are required to the core in order to support this.

To recap the major changes are:

Converter

--quantization_bytes is no longer a flag in the converter. It didn't make sense as we support two methods for 2 bytes. I added three new flags: --quantize_float16, --quantize_uint8, --quantize_uint16. These flags can be supplied on their own or you can specify specific weight names or groups of weights using a wildcard expansion. For example,

--quantize_float16=conv/*/weights
--quantize_uint16=conv/*/biases
--quantize_uint8

Would quantize weights that match the prefix conv/ to float16, all biases with the prefix conv/ using 2 byte affine quantization, and all remaining weights with 1 byte affine quantization.

Weight Decoding

The only other addition revolves around decoding the float16 values in Javascript to float32 values. JS doesn't support any form of Float16Array at this time so I manually decode the weights using a table-based lookup method that is referenced from http://www.fox-toolkit.org/ftp/fasthalffloatconversion.pdf

Testing

I tested this myself with my own models and was able to deterministically reproduce my own results from Python TF. I wrote some tests for the float16 decoding edge cases and a couple of basic decoding test cases.
All the converter test cases are passing and I wrote a couple more tests for the new quantization scheme.

I haven't really tested the converter wizard yet, it doesn't run locally for me (tf.app is not supported?) and I haven't gotten around to building the pip package to test it out. I think the wizard quantization flow is kind of weird and will probably require some modifications.

This change is

pyu10055

Awesome work, thank you for the contribution. I did a quick round of review, overall the PR looks good, there are couple questions, and there are some conflicts you need to resolve. Thanks!

Reviewable status: 0 of 1 approvals obtained (waiting on @JesseFarebro and @pyu10055)

tfjs-converter/python/tensorflowjs/quantization.py, line 117 at r1 (raw file):

def dequantize_weights(

can you add support for float16 dequantization?

tfjs-converter/python/tensorflowjs/quantization_test.py, line 50 at r1 (raw file):

    d = np.ones(5, dtype=np.float32)
    q, metadata = quantization.quantize_weights(d, np.uint8)
    assert 'scale' in metadata and 'min' in metadata

Can you add test for float16 quantization?

tfjs-converter/python/tensorflowjs/read_weights.py, line 194 at r1 (raw file):

        offset += dtype.itemsize * value.size
      if quant_info and dtype in [np.uint8, np.uint16]:
        value = quantization.dequantize_weights(

float16 weight should also convert back to the original dtype

tfjs-converter/python/tensorflowjs/write_weights_test.py, line 701 at r1 (raw file):

    manifest = write_weights.write_weights(
        groups, TMP_DIR, shard_size_bytes=1024,
        quantization_dtype_map={'uint8': '*'})

can you add test for float16?

tfjs-converter/python/tensorflowjs/converters/converter.py, line 421 at r1 (raw file):

  if float16 is not None:
    quantization_dtype_map[quantization.QUANTIZATION_DTYPE_FLOAT16] = \
      float16.split(',')

if any of the entry is empty, it will generate an empty list, will it translate to match all weights?

tfjs-converter/python/tensorflowjs/converters/wizard.py, line 471 at r1 (raw file):

          'default': False
      },
      {

this is bit confusing, should user need all three types? maybe for wizard flow, have the user choose one compression type first, then specify the name in the second question? what do you think?

tfjs-core/src/io/io_utils.ts, line 144 at r1 (raw file):

            `Weight ${spec.name} has unknown ` +
            `quantization dtype ${quantization.dtype}. ` +
            `Supported quantization dtypes are: 'uint8' and 'uint16'.`);

add 'float16'

tfjs-core/src/io/io_utils_test.ts, line 630 at r1 (raw file):

});

describe('float16', () => {

can you add couple conversion tests for non-zero numbers?

JesseFarebro

Reviewable status: 0 of 1 approvals obtained (waiting on @JesseFarebro and @pyu10055)

tfjs-converter/python/tensorflowjs/quantization.py, line 117 at r1 (raw file):

Previously, pyu10055 (Ping Yu) wrote…

can you add support for float16 dequantization?

Done.

tfjs-converter/python/tensorflowjs/quantization_test.py, line 50 at r1 (raw file):

Previously, pyu10055 (Ping Yu) wrote…

Can you add test for float16 quantization?

Done. I also refactored these test cases to be more clear on what we're testing here.

tfjs-converter/python/tensorflowjs/read_weights.py, line 194 at r1 (raw file):

Previously, pyu10055 (Ping Yu) wrote…

float16 weight should also convert back to the original dtype

Done. float16 is supported in dequantize_weights and I removed the check for uint8 and uint16 here. I also added a new test case to make sure we decode to the proper dtype.

tfjs-converter/python/tensorflowjs/write_weights_test.py, line 701 at r1 (raw file):

Previously, pyu10055 (Ping Yu) wrote…

can you add test for float16?

Done. I just added on to this test case so there is an example of quantizing different weights.

tfjs-converter/python/tensorflowjs/converters/converter.py, line 421 at r1 (raw file):

Previously, pyu10055 (Ping Yu) wrote…

if any of the entry is empty, it will generate an empty list, will it translate to match all weights?

I'm not quite sure what you mean here, are you referring to supplying the flag --quantize_float16 without providing any weight names? If so, this is handled by argparse which would provide the default value of * if nargs == 0.

tfjs-converter/python/tensorflowjs/converters/wizard.py, line 471 at r1 (raw file):

Previously, pyu10055 (Ping Yu) wrote…

this is bit confusing, should user need all three types? maybe for wizard flow, have the user choose one compression type first, then specify the name in the second question? what do you think?

I think this is a good idea, if one requires more fine-grained control they can supply these arguments using tensorflowjs_converter. I feel like the wizard flow for quantization will catch most use cases.

tfjs-core/src/io/io_utils.ts, line 144 at r1 (raw file):

Previously, pyu10055 (Ping Yu) wrote…

add 'float16'

Done.

JesseFarebro · 2020-05-29T07:24:35Z

Hi @pyu10055,

Thanks for taking a look, I resolved most of the issues you outlined. I need to resolve a couple more things before it's ready to be looked at again. More specifically, I need a better way of resolving the case where you want to catch all other weights with a specific quantization value, i.e.,

--quantize_float16=conv/*/weights
--quantize_uint16=conv/*/biases
--quantize_uint8

won't catch all remaining variables and quantize to uint8. I do want to support this though. The way I invasion this is if you supply two patterns that resolve to the same weight it will fail. But if you supply a pattern that matches a weight and you supply a fallthrough like --quantize_uint8 (without any pattern supplied) it won't fail and will quantize all remaining weights with affine uint8.

It won't take much work but I do want to provide sufficient tests for this mapping scheme. I'll let you know when it's ready to be looked at again.

pyu10055

Thank you, this is great. One suggestion, can you keep the current quantization flag, since when we deprecate flag, we want to give user some time, for example couple releases before we finally delete them.
In this case, we shall show a warning message if the old qunatization flag is used, and I think those old flag can be mapped to your new quantization map, and leave the real code the same. Only the converter.py wrapper code need to be changed. For wizard, since it is interactive, we can remove the flag without a problem.

Reviewable status: 0 of 1 approvals obtained (waiting on @JesseFarebro)

tfjs-converter/python/tensorflowjs/converters/converter.py, line 421 at r1 (raw file):

Previously, JesseFarebro (Jesse Farebrother) wrote…

I'm not quite sure what you mean here, are you referring to supplying the flag --quantize_float16 without providing any weight names? If so, this is handled by argparse which would provide the default value of * if nargs == 0.

got it, thanks for explaining.

JesseFarebro

I rebased off the latest changes and re-added the ---quantization_bytes flag that just converts to the new quantization map. I added some tests for the fallthrough method I discussed above. These changes are accompanied by more tests for the quantization map parsing. I also added some more float16 decoding tests and fixed up the failing WASM tests.

@pyu10055 if you could take another look now I think this is in a good shape.

Reviewable status: 0 of 1 approvals obtained (waiting on @pyu10055)

tfjs-core/src/io/io_utils_test.ts, line 630 at r1 (raw file):

Previously, pyu10055 (Ping Yu) wrote…

can you add couple conversion tests for non-zero numbers?

Done.

pyu10055

Looking really good, I have approved this PR with couple minor comments. Thanks, this is awesome.

Reviewed 2 of 18 files at r1, 16 of 16 files at r2.
Reviewable status: complete! 1 of 1 approvals obtained (waiting on @JesseFarebro)

tfjs-converter/README.md, line 158 at r3 (raw file):

|`--strip_debug_ops`   | Strips out TensorFlow debug operations `Print`, `Assert`, `CheckNumerics`. Defaults to `True`.|
|`--quantization_bytes`  | (Deprecated) How many bytes to optionally quantize/compress the weights to. Valid values are 1 and 2. which will quantize int32 and float32 to 1 or 2 bytes respectively. The default (unquantized) size is 4 bytes.|
|`--quantize_float16`  | Comma separated list of node names to apply foat16 quantization. You can use the wildcard symbol (*) to apply quantization to multiple nodes. (e.g., conv/*/weights) |

float16

tfjs-converter/README.md, line 158 at r3 (raw file):

|`--strip_debug_ops`   | Strips out TensorFlow debug operations `Print`, `Assert`, `CheckNumerics`. Defaults to `True`.|
|`--quantization_bytes`  | (Deprecated) How many bytes to optionally quantize/compress the weights to. Valid values are 1 and 2. which will quantize int32 and float32 to 1 or 2 bytes respectively. The default (unquantized) size is 4 bytes.|
|`--quantize_float16`  | Comma separated list of node names to apply foat16 quantization. You can use the wildcard symbol (*) to apply quantization to multiple nodes. (e.g., conv/*/weights) |

maybe say, you can also use wildcard symbol, since the default behavior is to match all weights.

tfjs-converter/README.md, line 160 at r3 (raw file):

|`--quantize_float16`  | Comma separated list of node names to apply foat16 quantization. You can use the wildcard symbol (*) to apply quantization to multiple nodes. (e.g., conv/*/weights) |
|`--quantize_uint8`  | Comma separated list of node names to apply 1 byte affine integer quantization. You can use the wildcard symbol (*) to apply quantization to multiple nodes. (e.g., conv/*/weights) |
|`--quantize_uint16`  | Comma separated list of node names to apply affine integer quantization. You can use the wildcard symbol (*) to apply quantization to multiple nodes. (e.g., conv/*/weights) |

2 bytes affine integer quantization

tfjs-converter/README.md, line 411 at r3 (raw file):

This will exclude biases and any weights that don't begin with conv/.
This can be a powerful tool to reduce model size while trying to maximize
performance.

this is awesome new feature, thank you!

tfjs-converter/python/tensorflowjs/quantization.py, line 109 at r3 (raw file):

    data: A numpy array of dtype 'float32' or 'int32'.
    quantization_dtype: A numpy dtype to quantize weights to. Only np.uint8 and
      np.uint16 are supported.

add np.float32

tfjs-converter/python/tensorflowjs/quantization.py, line 114 at r3 (raw file):

    quantized_data: The quantized weights as a numpy array with dtype
      `quantization_dtype`.
    scale: The linearly scaling constant used for quantization.

add description for scale and min_val will be empty for np.float32

tfjs-converter/python/tensorflowjs/converters/converter.py, line 424 at r3 (raw file):

    print(
        'Warning: --quantization_bytes will be deprecated in a future release\n'
        'Please consider using --quantization_uint16, --quantization_uint8, '

nitpick: move --quantization_uint8 before --quantization_uint16

tfjs-converter/python/tensorflowjs/converters/converter.py, line 431 at r3 (raw file):

    dtype = quantization.QUANTIZATION_BYTES_TO_DTYPES[quantization_bytes]
    quantization_dtype_map[dtype] = True

should this be True or '*'?

tfjs-converter/python/tensorflowjs/converters/converter.py, line 509 at r3 (raw file):

      '--%s' % common.QUANTIZATION_TYPE_FLOAT16,
      type=str,
      default=None,

should all of these args default to '*' ?

JesseFarebro

Reviewable status: complete! 1 of 1 approvals obtained (waiting on @pyu10055)

tfjs-converter/README.md, line 158 at r3 (raw file):

Previously, pyu10055 (Ping Yu) wrote…

float16

Done.

tfjs-converter/README.md, line 158 at r3 (raw file):

you can also use wildcard symbol, since the default behavior is to match all weights.

I revised the wording for the flags, hopefully we can both agree on the new format. I wanted to try and mention that it can be provided as a flag without any nodes.

tfjs-converter/README.md, line 160 at r3 (raw file):

Previously, pyu10055 (Ping Yu) wrote…

2 bytes affine integer quantization

Done.

tfjs-converter/README.md, line 411 at r3 (raw file):

Previously, pyu10055 (Ping Yu) wrote…

this is awesome new feature, thank you!

Done.

tfjs-converter/python/tensorflowjs/quantization.py, line 109 at r3 (raw file):

Previously, pyu10055 (Ping Yu) wrote…

add np.float32

Done.

tfjs-converter/python/tensorflowjs/quantization.py, line 114 at r3 (raw file):

Previously, pyu10055 (Ping Yu) wrote…

add description for scale and min_val will be empty for np.float32

Reformatted the docstring to be more concise with the return value.

tfjs-converter/python/tensorflowjs/converters/converter.py, line 424 at r3 (raw file):

Previously, pyu10055 (Ping Yu) wrote…

nitpick: move --quantization_uint8 before --quantization_uint16

I like the nitpick 😛 . I changed it!

tfjs-converter/python/tensorflowjs/converters/converter.py, line 431 at r3 (raw file):

Previously, pyu10055 (Ping Yu) wrote…

should this be True or '*'?

It could be either but I'll stick with True as with the new map parsing it signifies that dtype will be the fallthrough value for all weights.

tfjs-converter/python/tensorflowjs/converters/converter.py, line 509 at r3 (raw file):

Previously, pyu10055 (Ping Yu) wrote…

should all of these args default to '*' ?

I don't think so. Isn't default the value in the absence of the flag. The value of const=True below is what we use when only the flag is provided, i.e., --quantize_float16 (fallthrough case). Defaulting to None essentially means that if no quantization flag is provided they will default to None, I have this check in _parse_quantization_dtype_map where you can see the is not None check.

JesseFarebro · 2020-06-05T15:31:44Z

Thanks for the careful reviews @pyu10055. Do you have any idea what the timeframe for a new release will look like?

googlebot added the cla: yes label May 28, 2020

pyu10055 self-requested a review May 28, 2020 22:52

pyu10055 requested changes May 28, 2020

View reviewed changes

JesseFarebro commented May 29, 2020

View reviewed changes

pyu10055 reviewed May 29, 2020

View reviewed changes

JesseFarebro commented Jun 4, 2020

View reviewed changes

pyu10055 approved these changes Jun 4, 2020

View reviewed changes

JesseFarebro added 2 commits June 4, 2020 18:53

Converter float16 quantization support

0b30984

IO decode support for float16 quantization

2fea4ee

JesseFarebro commented Jun 5, 2020

View reviewed changes

pyu10055 merged commit c9cedf4 into tensorflow:master Jun 5, 2020

josephrocca mentioned this pull request Nov 14, 2021

float16 support in web runtimes microsoft/onnxruntime#9758

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

float16 quantization + granular quantization #3340

float16 quantization + granular quantization #3340

Uh oh!

JesseFarebro commented May 28, 2020 •

edited by nsthorat

Loading

Uh oh!

pyu10055 left a comment

Uh oh!

JesseFarebro left a comment

Uh oh!

JesseFarebro commented May 29, 2020

Uh oh!

pyu10055 left a comment

Uh oh!

JesseFarebro left a comment

Uh oh!

pyu10055 left a comment

Uh oh!

JesseFarebro left a comment

Uh oh!

JesseFarebro commented Jun 5, 2020

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

float16 quantization + granular quantization #3340

float16 quantization + granular quantization #3340

Uh oh!

Conversation

JesseFarebro commented May 28, 2020 • edited by nsthorat Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Converter

Weight Decoding

Testing

Uh oh!

pyu10055 left a comment

Choose a reason for hiding this comment

Uh oh!

JesseFarebro left a comment

Choose a reason for hiding this comment

Uh oh!

JesseFarebro commented May 29, 2020

Uh oh!

pyu10055 left a comment

Choose a reason for hiding this comment

Uh oh!

JesseFarebro left a comment

Choose a reason for hiding this comment

Uh oh!

pyu10055 left a comment

Choose a reason for hiding this comment

Uh oh!

JesseFarebro left a comment

Choose a reason for hiding this comment

Uh oh!

JesseFarebro commented Jun 5, 2020

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

JesseFarebro commented May 28, 2020 •

edited by nsthorat

Loading