Skip to content
This repository was archived by the owner on Nov 17, 2023. It is now read-only.

[2.0] Gluon2.0: switch to use forward interface#20262

Merged
leezu merged 85 commits intoapache:masterfrom
barry-jin:issue-19138
Jun 21, 2021
Merged

[2.0] Gluon2.0: switch to use forward interface#20262
leezu merged 85 commits intoapache:masterfrom
barry-jin:issue-19138

Conversation

@barry-jin
Copy link
Copy Markdown
Contributor

@barry-jin barry-jin commented May 12, 2021

Description

#19138

Checklist

Essentials

  • PR's title starts with a category (e.g. [BUGFIX], [MODEL], [TUTORIAL], [FEATURE], [DOC], etc)
  • Changes are complete (i.e. I finished coding on this PR)
  • All changes have test coverage
  • Code is well-documented

Changes

  • Adopt packed_func based ffi on npx.group_norm
  • Gluon2.0 upgrade: use forward interface in blocks
    • gluon/data/vision/*
    • gluon/loss.py
    • gluon/model_zoo/*
    • gluon/nn/*
    • gluon/rnn/*
  • use np/npx interface in gluon/metric.py
  • Implement infer_shape method
    • gluon/nn/basic_layers.py::Dense
    • gluon/nn/basic_layers.py::BatchNorm
    • gluon/nn/basic_layers.py::InstanceNorm
    • gluon/nn/basic_layers.py::LayerNorm
    • gluon/nn/basic_layers.py::GroupNorm
    • gluon/nn/conv_layers.py::_Conv
    • gluon/nn/conv_layers.py::DeformableConvolution
    • gluon/nn/conv_layers.py::ModulatedDeformableConvolution
  • Remove hybrid mode with F in gluon/probability/*
  • gluon/rnn/*
    • conv_rnn_cell.py: implement forward, infer_shape, np/npx
    • rnn_cell.py: implement forward, np/npx; use special infer_shape method based on layer, input_size and if it's bidirectional.
    • rnn_layer.py: implement forward, infer_shape, np/npx
  • fix issue np.average return ADT type; npx.pooling
  • Copy control flow ops(loop_while, cond, foreach) from ndarray.contrib to mx.npx.control_flow
  • Register some legacy ops in npx
    • stes_op
    • sync_batch_norm
    • legacy pad (np.pad doesn't have backward computation for 'reflect" mode)
    • Some rnn related: sequence_last, sequence_reverse, slice_channel, broadcast_greater, softsign
  • Use forward, np/npx for all the gluon related tests; remove gluon tests with symbol inputs
    • remove test_gluon_data_vision.py and test_gluon_probability_v1.py as related tests are covered in test_numpy_gluon_data_vision.py and test_gluon_probability_v2.py
    • Test test_numpy_op.py::test_np_nan_to_num only for copy argument is set to True, since Inplace operations are not supported when recording in deferred compute mode
  • Remove hybrid_block interface in gluon/block.py
  • Remove hybrid_block interface in documentation and docstring
    • update docs/python_docs/python/tutorials/packages/gluon/blocks/custom_layers
    • remove python_tutorials/packages/gluon/blocks/custom_layers_beginners.md as it's duplicate to custom_layers
    • remove docs/python_docs/python/docstutorials/packages/legacy/ndarray/sparse/train_gluon as gluon2.0 do not support sparse
    • update docs/python_docs/python/tutorials/packages/gluon/blocks/custom_loss
    • update docs/python_docs/python/tutorials/packages/gluon/blocks/hybridize
  • Turn on NumPy mode by default ([NumPy] turn on set_np #18631)
  • Fix gluon2.0 reference leak.
  • Migrate control flow operators to npx namespace
    • Foreach
    • while_loop
    • cond

Some skipped tests

  • tests/python/mkl/subgraphs/test_conv_subgraph.py::test_pos_concat_scale_align
    • Reason: Scale doesn't align in numpy for numpy operators
    def check_qsym_scale_align(qsym):
      assert ''.join(qsym.attr_dict().keys()).find('quantized_sg_mkldnn_conv') != -1
      init = False
      for k, v in qsym.attr_dict().items():
        if k.find('quantized_sg_mkldnn_conv') != -1:
          assert 'min_calib_range' in v
          assert 'max_calib_range' in v
          if not init:
            min_calib_range = v['min_calib_range']
            max_calib_range = v['max_calib_range']
            init = True
          else:
>           assert min_calib_range == v['min_calib_range']
E           AssertionError
  • tests/python/mkl/subgraphs/test_fc_subgraph.py::test_fc_eltwise
    • Reason: Operator square, square_root, abs, exp cannot be found in numpy mode
    def check_fusion(net_original, data_shape, attrs_dict, check_fp32_fusion=True, check_quantization=True,
                     out_types=['uint8', 'int8', 'auto'], dedup_subgraph=True):
      net_original.initialize()
      net_original.hybridize(static_alloc=False, static_shape=False)
      data = mx.np.random.uniform(size=data_shape, dtype='float32', ctx=mx.current_context())
      net_original(data)
      net_fusion = copy.copy(net_original)
      sym, params = net_original.export(None)
    
      if check_fp32_fusion:
        data_min = -1.0
        data_max = 1.0
        if ''.join(sym.get_internals().list_outputs()).find('sqrt') != -1:
          check_quantization = False
          data_min = 0
    
        sym_sg = sym.optimize_for(SG_PASS_NAME, dedup_subgraph=dedup_subgraph, skip_infer=True)
        for name, attrs in attrs_dict.items():
          if name in config:
            op_name = config[name][OP_NAME]
          else:
            op_name = name
          assert ''.join(sym_sg.get_internals().list_outputs()).find(op_name) != -1
          if len(attrs):
              found = False
              for k, v in sym_sg.attr_dict().items():
                if k.find(op_name) != -1:
                  found = True
                  for attr_name, attr_value in attrs.items():
                    assert v[attr_name].lower() == attr_value.lower()
>             assert found
E             AssertionError

@barry-jin barry-jin requested a review from szha as a code owner May 12, 2021 19:29
@mxnet-bot
Copy link
Copy Markdown

Hey @barry-jin , Thanks for submitting the PR
All tests are already queued to run once. If tests fail, you can trigger one or more tests again with the following commands:

  • To trigger all jobs: @mxnet-bot run ci [all]
  • To trigger specific jobs: @mxnet-bot run ci [job1, job2]

CI supported jobs: [website, windows-cpu, windows-gpu, miscellaneous, centos-gpu, edge, unix-gpu, sanity, unix-cpu, clang, centos-cpu]


Note:
Only following 3 categories can trigger CI :PR Author, MXNet Committer, Jenkins Admin.
All CI tests must pass before the PR can be merged.

@mseth10 mseth10 added the pr-work-in-progress PR is still work in progress label May 12, 2021
out = super().__call__(*args)
flatten_out, self._out_format = _flatten(out, "output")
symbol_outputs = dc.get_symbol(flatten_out, sym_cls=type(symbol_inputs[0]))
dc.clear(flatten_out)
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

dc.clear here is called on the copied symbol? Previously you mentioned that the copied graph does not contain DCInfo. If so, dc.clear here would be a no-op. If the copied symbol however contains DCInfo, then you could clear it already before returning the symbol. WDYT?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The dc.clear here is called on the ndarray outputs, which contain the node_entry(deferredcompute_entry_) with DCInfo. The copied symbol doesn't contain DCInfo, so I try to clear DCInfo from ndarray outputs.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry I misread the flatten_out as symbol_outputs. "so I try to clear DCInfo from ndarray outputs" still shouldn't be needed and garbage collection of the ndarray should clear the entry. So it would still be helpful to point out the temporary nature of the clear call

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, current dc.clear is just a workaround. I investigate into the node destructor and it looks like there is no clear call for node->info.

rnn_param_concat = F.np._internal.rnn_param_concat if is_np_array()\
else F._internal._rnn_param_concat
params = rnn_param_concat(*params, dim=0)
params = np.concatenate(params, axis=0)
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

just to be clear, there still is a performance penalty for this call. it's ok to address it in a separate PR.

@ptrendx
Copy link
Copy Markdown
Member

ptrendx commented Jun 17, 2021

@leezu @barry-jin A small general question - how does this deferred compute flow work when I have a network which has some static and some dynamic elements (e.g. I have 1 part that is static, then I do something on the output that can't be hybridized, then I have another hybridizable portion). In the previous interface I could do a Block containing 2 HybridBlocks and a non-hybridized portion inbetween. Can I have something similar now or I am left with not being able to hybridize anything (or having 2 models entirely)?

@szha
Copy link
Copy Markdown
Member

szha commented Jun 17, 2021

The dynamic part needs to be expressed with control flow ops if possible. Otherwise, Block containing 2 HybridBlocks should continue to work.

@barry-jin
Copy link
Copy Markdown
Contributor Author

barry-jin commented Jun 17, 2021

@leezu @barry-jin A small general question - how does this deferred compute flow work when I have a network which has some static and some dynamic elements (e.g. I have 1 part that is static, then I do something on the output that can't be hybridized, then I have another hybridizable portion). In the previous interface I could do a Block containing 2 HybridBlocks and a non-hybridized portion inbetween. Can I have something similar now or I am left with not being able to hybridize anything (or having 2 models entirely)?

Hi @ptrendx . I think the scenario you talked about is like this.

import mxnet as mx
from mxnet.gluon import HybridBlock, Block
from mxnet import autograd as ag

class Block1(HybridBlock):
    def __init__(self):
        super(Block1, self).__init__()

    def forward(self, a, b):
        return a - b

class Block2(HybridBlock):
    def __init__(self):
        super(Block2, self).__init__()

    def forward(self, a, b):
        if a > b:
            ret = a
        else:
            ret = b
        return ret

class Block3(HybridBlock):
    def __init__(self):
        super(Block3, self).__init__()

    def forward(self, a, b):
        return a * b

class MixedBlock(Block):
    def __init__(self, block1, block2, block3):
        super(MixedBlock, self).__init__()
        self.block1 = block1
        self.block2 = block2
        self.block3 = block3

    def forward(self, a, b):
        out1 = self.block1(a, b)
        out2 = self.block2(out1, b)
        out3 = self.block3(out2, a)
        return out3

block1, block2, block3 = Block1(), Block2(), Block3()
mix = MixedBlock(block1, block2, block3)
mix.initialize()
block1.hybridize()
block3.hybridize()
x_list = [mx.np.array([i]) for i in range(5)]
y_list = [mx.np.array([1])]*5
for x, y in zip(x_list, y_list):
    x.attach_grad()
    y.attach_grad()
    with ag.record():
        out = mix(x, y)
    out.backward()
    print(out, x.grad, y.grad)

And the output is

[0.] [1.] [0.]
[1.] [1.] [1.]
[2.] [1.] [2.]
[6.] [5.] [-3.]
[12.] [7.] [-4.]

Only block1 and block3 are hybridized while block2 has if-else statements so I do not hybridize it.
Deferred compute and tracing work in hybrid_block and its child hybrid_blocks. So, in the situation that your model is Block and it contains mixed hybridized and non-hybridized hybrid_blocks, deferred compute and tracing will only work in the hybridized child blocks. However, if your model is a hybrid_block and the children are mixed hybrid and non-hybrid blocks, then it does't work, because deferred compute and tracing created cachedop based on the data flow of the first input data. The following situation will not work.

import mxnet as mx
from mxnet.gluon import HybridBlock
from mxnet import autograd as ag

class Block1(HybridBlock):
    def __init__(self):
        super(Block1, self).__init__()

    def forward(self, a, b):
        return a - b

class Block2(HybridBlock):
    def __init__(self):
        super(Block2, self).__init__()

    def forward(self, a, b):
        if a > b:
            ret = a
        else:
            ret = b
        return ret

class Block3(HybridBlock):
    def __init__(self):
        super(Block3, self).__init__()

    def forward(self, a, b):
        return a * b

class MixedBlock(HybridBlock):
    def __init__(self, block1, block2, block3):
        super(MixedBlock, self).__init__()
        self.block1 = block1
        self.block2 = block2
        self.block3 = block3

    def forward(self, a, b):
        out1 = self.block1(a, b)
        out2 = self.block2(out1, b)
        out3 = self.block3(out2, a)
        return out3

block1, block2, block3 = Block1(), Block2(), Block3()
mix = MixedBlock(block1, block2, block3)
mix.initialize()
mix.hybridize()
# block1.hybridize()
# block3.hybridize()
block2.hybridize(active=False)
x_list = [mx.np.array([i]) for i in range(5)]
y_list = [mx.np.array([1])]*5
for x, y in zip(x_list, y_list):
    x.attach_grad()
    y.attach_grad()
    with ag.record():
        out = mix(x, y)
    out.backward()
    print(out, x.grad, y.grad)

The Output is

[0.] [1.] [0.]
[1.] [1.] [1.]
[2.] [1.] [2.]
[3.] [1.] [3.]
[4.] [1.] [4.]

Then you will need to use npx.cond this control flow operator for block2 like this

class Block2(HybridBlock):
    def __init__(self):
        super(Block2, self).__init__()

    def forward(self, a, b):
        return npx.cond(lambda a, b: a > b,
                        lambda a, b: a,
                        lambda a, b: b,
                        [a, b])

@ptrendx
Copy link
Copy Markdown
Member

ptrendx commented Jun 17, 2021

@barry-jin Thanks for the examples, I just tested changing MixedBlock and Block2 to gluon.Block in your second example and then it works (so it actually looks the same as in 1.x) :-).

@barry-jin
Copy link
Copy Markdown
Contributor Author

@mxnet-bot run ci [centos-cpu, centos-gpu, unix-cpu]

@mxnet-bot
Copy link
Copy Markdown

Jenkins CI successfully triggered : [centos-gpu, centos-cpu, unix-cpu]

@barry-jin
Copy link
Copy Markdown
Contributor Author

@mxnet-bot run ci [all]

@mxnet-bot
Copy link
Copy Markdown

Jenkins CI successfully triggered : [clang, centos-gpu, unix-cpu, website, sanity, edge, centos-cpu, unix-gpu, windows-cpu, miscellaneous, windows-gpu]

Copy link
Copy Markdown
Contributor

@TristonC TristonC left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I vote for the forward interface compare to hybrid_forward interface. Thanks.

import mxnet as mx
import numpy as np
from mxnet import gluon
import numpy as onp
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is the 'o' in onp as original as this is original numpy? It will be confusing as np being well known as numpy for short.

Copy link
Copy Markdown
Contributor Author

@barry-jin barry-jin Jun 19, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, 'o' in onp is 'official', which is used to distinguish between official numpy and MXNet numpy. Usually, user will do
from mxnet import np and build their models with numpy operators from MXNet. This will provide numpy-compatible coding experience in MXNet for users.

times.append((tock - tick) * 1000.0)
times = times[args.warmup_rounds: ]
print("Time used: mean = %.3f ms, std = %.3f ms" % (np.mean(times), np.std(times)))
print("Time used: mean = %.3f ms, std = %.3f ms" % (onp.mean(times), onp.std(times)))
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will mxnet np provide the mean and std function?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes. mean and std operators are implemented as mxnet.np.mean and mxnet.np.std

Looking into implementation of [existing layers](https://mxnet.apache.org/api/python/gluon/nn.html), one may find that more often a block inherits from a [HybridBlock](https://github.com/apache/incubator-mxnet/blob/master/python/mxnet/gluon/block.py#L428), instead of directly inheriting from `Block`.

The reason for that is that `HybridBlock` allows to write custom layers that can be used in imperative programming as well as in symbolic programming. It is convinient to support both ways, because the imperative programming eases the debugging of the code and the symbolic one provides faster execution speed. You can learn more about the difference between symbolic vs. imperative programming from [this article](https://mxnet.apache.org/api/architecture/overview.html).
The reason for that is that `HybridBlock` allows to write custom layers in imperative programming style, while computing in a symbolic way. It unifies the flexibility of imperative programming with the performance benefits of symbolic programming. You can learn more about the difference between symbolic vs. imperative programming from [this article](https://mxnet.apache.org/api/architecture/overview.html).
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

between ... and for " the difference between symbolic vs. imperative programming"

Usually, a layer has a set of associated parameters, sometimes also referred as weights. This is an internal state of a layer. Most often, these parameters are the ones, that we want to learn during backpropogation step, but sometimes these parameters might be just constants we want to use during forward pass.

All parameters of a block are stored and accessed via [ParameterDict](https://github.com/apache/incubator-mxnet/blob/master/python/mxnet/gluon/parameter.py#L508) class. This class helps with initialization, updating, saving and loading of the parameters. Each layer can have multiple set of parameters, and all of them can be stored in a single instance of the `ParameterDict` class. On a block level, the instance of the `ParameterDict` class is accessible via `self.params` field, and outside of a block one can access all parameters of the network via [collect_params()](https://mxnet.apache.org/api/python/gluon/gluon.html#mxnet.gluon.Block.collect_params) method called on a `container`. `ParameterDict` uses [Parameter](https://mxnet.apache.org/api/python/gluon/gluon.html#mxnet.gluon.Parameter) class to represent parameters inside of Apache MxNet neural network. If parameter doesn't exist, trying to get a parameter via `self.params` will create it automatically.
Usually, a layer has a set of associated parameters, sometimes also referred as weights. This is an internal state of a layer. Most often, these parameters are the ones, that we want to learn during backpropogation step, but sometimes these parameters might be just constants we want to use during forward pass. The parameters are usually represented as [Parameter](https://mxnet.apache.org/api/python/gluon/gluon.html#mxnet.gluon.Parameter) class inside of Apache MxNet neural network.
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

MxNet -> MXNet

net.add(Dense(5)) # Add Dense layer with 5 neurons
net.add(NormalizationHybridLayer(hidden_units=5,
scales = nd.array([2]))) # Add our custom layer
scales = np.array([2]))) # Add our custom layer
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What about just say # Add a customer layer

SymbolHandle *out);

/*!
* \brief Clear the info node associated with the arrays.
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The brief is not obvious with the function name. It is more about how the deferred compute is handled.

* \brief Clear the info node associated with the arrays.
* \param arrays array handles of arrays
* \param num number of arrays
* \return 0 when success, -1 when failure happens
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

-1 otherwise

super(HybridBlock, self).__init__()
self._v2 = inspect.unwrap(self.hybrid_forward.__func__) is HybridBlock.hybrid_forward
assert hasattr(self, "hybrid_forward") is False, (
"Starting from MXNet2.0, Gluon2.0 with forward interface will be used instead of "
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does both MXNet2.0 and Gluon2.0 need to be met at the same time? Propose:
'forward' instead of 'hybrid_forward' interfaces needs to be used starting from Gluon 2.0. ......

@barry-jin
Copy link
Copy Markdown
Contributor Author

@TristonC Thanks for your suggestions on improving the documentation!

@barry-jin
Copy link
Copy Markdown
Contributor Author

@mxnet-bot run ci [centos-cpu]

@mxnet-bot
Copy link
Copy Markdown

Jenkins CI successfully triggered : [centos-cpu]

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.

Labels

pr-awaiting-review PR is waiting for code review

Projects

None yet

Development

Successfully merging this pull request may close these issues.

7 participants