[FEATURE] Fuse dequantize with convolution#20816
Conversation
|
Hey @DominikaJedynak , Thanks for submitting the PR
CI supported jobs: [centos-cpu, unix-cpu, clang, miscellaneous, website, windows-cpu, windows-gpu, sanity, unix-gpu, centos-gpu, edge] Note: |
|
@mxnet-bot run ci [centos-gpu, windows-gpu] |
|
Jenkins CI successfully triggered : [windows-gpu, centos-gpu] |
|
@mxnet-bot run ci [all] |
|
Jenkins CI successfully triggered : [unix-cpu, windows-cpu, windows-gpu, centos-cpu, website, centos-gpu, unix-gpu, sanity, clang, edge, miscellaneous] |
| // When only fused quantized operator and requantize, set min/max_cablib_range, | ||
| // When fused quantized operator + requantize + dequantize, set dequantize flag to true. | ||
| if (dequantize_node != nullptr) { | ||
| if ((dequantize_node != nullptr && (no_enable_float_output.count(fuse_node->op()) == 0))) { |
There was a problem hiding this comment.
Doubled brackets.
| if ((dequantize_node != nullptr && (no_enable_float_output.count(fuse_node->op()) == 0))) { | |
| if (dequantize_node != nullptr && (no_enable_float_output.count(fuse_node->op()) == 0)) { |
| @mx.util.use_np | ||
| @pytest.mark.parametrize('data_shape', DATA_SHAPE) | ||
| @pytest.mark.parametrize('no_bias', [True, False]) | ||
| @pytest.mark.parametrize('out_type', ['int8', 'auto']) |
There was a problem hiding this comment.
Following settings for other tests in this file, I do not test it as it is scenario which is not used.
| @mx.util.use_np | ||
| @pytest.mark.parametrize('data_shape', DATA_SHAPE) | ||
| @pytest.mark.parametrize('no_bias', [True, False]) | ||
| @pytest.mark.parametrize('out_type', ['int8', 'auto']) |
| 2 + (conv_param.no_bias ? 0 : 1) + (dnnl_param.with_bn ? 4 : 0) + | ||
| (dnnl_param.with_sum ? 1 : 0) + | ||
| (dnnl_param.quantized ? 2 + (full_conv_param.dnnl_param.with_sum ? 2 : 0) : 0); | ||
| size_t input_size = 2 + (conv_param.no_bias ? 0 : 1) + (dnnl_param.with_bn ? 4 : 0) + |
There was a problem hiding this comment.
I think we can skip this calculation and use only calculated from idx below. If we wish to double check it, it will be enough to do it in the assert. so we can replace CHECK_EQ in line 167 with assert with calculation from here.
| if (param.full_conv_param.dnnl_param.quantized) { | ||
| if (param.full_conv_param.dnnl_param.enable_float_output) | ||
| return std::vector<std::string>{"output"}; | ||
| else | ||
| return std::vector<std::string>{"output", "output_min", "output_max"}; | ||
| } else { | ||
| return std::vector<std::string>{"output"}; | ||
| } |
There was a problem hiding this comment.
It could be simplfied:
| if (param.full_conv_param.dnnl_param.quantized) { | |
| if (param.full_conv_param.dnnl_param.enable_float_output) | |
| return std::vector<std::string>{"output"}; | |
| else | |
| return std::vector<std::string>{"output", "output_min", "output_max"}; | |
| } else { | |
| return std::vector<std::string>{"output"}; | |
| } | |
| if (param.full_conv_param.dnnl_param.quantized && | |
| !param.full_conv_param.dnnl_param.enable_float_output ) { | |
| return std::vector<std::string>{"output", "output_min", "output_max"}; | |
| } else { | |
| return std::vector<std::string>{"output"}; | |
| } |
| net = ConvAdd(use_bias=True) | ||
| check_quantize(net, data_shape, out_type) | ||
|
|
||
|
|
There was a problem hiding this comment.
What about test with convolution, activation and sum ?
There was a problem hiding this comment.
They were already there ConvActAdd, ConvBNSumAct
|
@DominikaJedynak could you resolve conflict? |
6cb7574 to
07c80f4
Compare
07c80f4 to
f0c1a0f
Compare
|
@mxnet-bot run ci [unix-cpu, centos-gpu] |
|
Jenkins CI successfully triggered : [unix-cpu, centos-gpu] |
|
@mxnet-bot run ci [centos-gpu] |
|
Jenkins CI successfully triggered : [centos-gpu] |
|
@mxnet-bot run ci [centos-gpu] |
|
Jenkins CI successfully triggered : [centos-gpu] |
Description
This PR adds the possibility to fuse dequantize node with convolution node, what in practice enables us to avoid unnecessary multiplying and then dividing all entries of a convolution by the same scaling factor.
Speedup on various data sizes:
Measured on instance c6i.12xlarge (Intel Xeon Platinum 8375C), ami-04505e74c0741db8d (Canonical, Ubuntu, 20.04 LTS)
Script: