[2.0] Gluon2.0: switch to use forward interface#20262

barry-jin · 2021-05-12T19:29:42Z

Description

#19138

Checklist

Essentials

PR's title starts with a category (e.g. [BUGFIX], [MODEL], [TUTORIAL], [FEATURE], [DOC], etc)
Changes are complete (i.e. I finished coding on this PR)
All changes have test coverage
Code is well-documented

Changes

Some skipped tests

tests/python/mkl/subgraphs/test_conv_subgraph.py::test_pos_concat_scale_align
- Reason: Scale doesn't align in numpy for numpy operators

    def check_qsym_scale_align(qsym):
      assert ''.join(qsym.attr_dict().keys()).find('quantized_sg_mkldnn_conv') != -1
      init = False
      for k, v in qsym.attr_dict().items():
        if k.find('quantized_sg_mkldnn_conv') != -1:
          assert 'min_calib_range' in v
          assert 'max_calib_range' in v
          if not init:
            min_calib_range = v['min_calib_range']
            max_calib_range = v['max_calib_range']
            init = True
          else:
>           assert min_calib_range == v['min_calib_range']
E           AssertionError

tests/python/mkl/subgraphs/test_fc_subgraph.py::test_fc_eltwise
- Reason: Operator square, square_root, abs, exp cannot be found in numpy mode

    def check_fusion(net_original, data_shape, attrs_dict, check_fp32_fusion=True, check_quantization=True,
                     out_types=['uint8', 'int8', 'auto'], dedup_subgraph=True):
      net_original.initialize()
      net_original.hybridize(static_alloc=False, static_shape=False)
      data = mx.np.random.uniform(size=data_shape, dtype='float32', ctx=mx.current_context())
      net_original(data)
      net_fusion = copy.copy(net_original)
      sym, params = net_original.export(None)
    
      if check_fp32_fusion:
        data_min = -1.0
        data_max = 1.0
        if ''.join(sym.get_internals().list_outputs()).find('sqrt') != -1:
          check_quantization = False
          data_min = 0
    
        sym_sg = sym.optimize_for(SG_PASS_NAME, dedup_subgraph=dedup_subgraph, skip_infer=True)
        for name, attrs in attrs_dict.items():
          if name in config:
            op_name = config[name][OP_NAME]
          else:
            op_name = name
          assert ''.join(sym_sg.get_internals().list_outputs()).find(op_name) != -1
          if len(attrs):
              found = False
              for k, v in sym_sg.attr_dict().items():
                if k.find(op_name) != -1:
                  found = True
                  for attr_name, attr_value in attrs.items():
                    assert v[attr_name].lower() == attr_value.lower()
>             assert found
E             AssertionError

mxnet-bot · 2021-05-12T19:29:46Z

Hey @barry-jin , Thanks for submitting the PR
All tests are already queued to run once. If tests fail, you can trigger one or more tests again with the following commands:

To trigger all jobs: @mxnet-bot run ci [all]
To trigger specific jobs: @mxnet-bot run ci [job1, job2]

CI supported jobs: [website, windows-cpu, windows-gpu, miscellaneous, centos-gpu, edge, unix-gpu, sanity, unix-cpu, clang, centos-cpu]

Note:
Only following 3 categories can trigger CI :PR Author, MXNet Committer, Jenkins Admin.
All CI tests must pass before the PR can be merged.

…x && adopt forward interface in contrib tests

leezu · 2021-06-10T14:04:07Z

python/mxnet/gluon/block.py

                out = super().__call__(*args)
            flatten_out, self._out_format = _flatten(out, "output")
            symbol_outputs = dc.get_symbol(flatten_out, sym_cls=type(symbol_inputs[0]))
+            dc.clear(flatten_out)


dc.clear here is called on the copied symbol? Previously you mentioned that the copied graph does not contain DCInfo. If so, dc.clear here would be a no-op. If the copied symbol however contains DCInfo, then you could clear it already before returning the symbol. WDYT?

The dc.clear here is called on the ndarray outputs, which contain the node_entry(deferredcompute_entry_) with DCInfo. The copied symbol doesn't contain DCInfo, so I try to clear DCInfo from ndarray outputs.

Sorry I misread the flatten_out as symbol_outputs. "so I try to clear DCInfo from ndarray outputs" still shouldn't be needed and garbage collection of the ndarray should clear the entry. So it would still be helpful to point out the temporary nature of the clear call

Yes, current dc.clear is just a workaround. I investigate into the node destructor and it looks like there is no clear call for node->info.

szha · 2021-06-11T17:50:12Z

python/mxnet/gluon/rnn/rnn_layer.py

-        rnn_param_concat = F.np._internal.rnn_param_concat if is_np_array()\
-            else F._internal._rnn_param_concat
-        params = rnn_param_concat(*params, dim=0)
+        params = np.concatenate(params, axis=0)


just to be clear, there still is a performance penalty for this call. it's ok to address it in a separate PR.

/ q

ptrendx · 2021-06-17T18:22:08Z

@leezu @barry-jin A small general question - how does this deferred compute flow work when I have a network which has some static and some dynamic elements (e.g. I have 1 part that is static, then I do something on the output that can't be hybridized, then I have another hybridizable portion). In the previous interface I could do a Block containing 2 HybridBlocks and a non-hybridized portion inbetween. Can I have something similar now or I am left with not being able to hybridize anything (or having 2 models entirely)?

szha · 2021-06-17T18:40:48Z

The dynamic part needs to be expressed with control flow ops if possible. Otherwise, Block containing 2 HybridBlocks should continue to work.

barry-jin · 2021-06-17T19:10:16Z

@leezu @barry-jin A small general question - how does this deferred compute flow work when I have a network which has some static and some dynamic elements (e.g. I have 1 part that is static, then I do something on the output that can't be hybridized, then I have another hybridizable portion). In the previous interface I could do a Block containing 2 HybridBlocks and a non-hybridized portion inbetween. Can I have something similar now or I am left with not being able to hybridize anything (or having 2 models entirely)?

Hi @ptrendx . I think the scenario you talked about is like this.

import mxnet as mx
from mxnet.gluon import HybridBlock, Block
from mxnet import autograd as ag

class Block1(HybridBlock):
    def __init__(self):
        super(Block1, self).__init__()

    def forward(self, a, b):
        return a - b

class Block2(HybridBlock):
    def __init__(self):
        super(Block2, self).__init__()

    def forward(self, a, b):
        if a > b:
            ret = a
        else:
            ret = b
        return ret

class Block3(HybridBlock):
    def __init__(self):
        super(Block3, self).__init__()

    def forward(self, a, b):
        return a * b

class MixedBlock(Block):
    def __init__(self, block1, block2, block3):
        super(MixedBlock, self).__init__()
        self.block1 = block1
        self.block2 = block2
        self.block3 = block3

    def forward(self, a, b):
        out1 = self.block1(a, b)
        out2 = self.block2(out1, b)
        out3 = self.block3(out2, a)
        return out3

block1, block2, block3 = Block1(), Block2(), Block3()
mix = MixedBlock(block1, block2, block3)
mix.initialize()
block1.hybridize()
block3.hybridize()
x_list = [mx.np.array([i]) for i in range(5)]
y_list = [mx.np.array([1])]*5
for x, y in zip(x_list, y_list):
    x.attach_grad()
    y.attach_grad()
    with ag.record():
        out = mix(x, y)
    out.backward()
    print(out, x.grad, y.grad)

And the output is

[0.] [1.] [0.]
[1.] [1.] [1.]
[2.] [1.] [2.]
[6.] [5.] [-3.]
[12.] [7.] [-4.]

Only block1 and block3 are hybridized while block2 has if-else statements so I do not hybridize it.
Deferred compute and tracing work in hybrid_block and its child hybrid_blocks. So, in the situation that your model is Block and it contains mixed hybridized and non-hybridized hybrid_blocks, deferred compute and tracing will only work in the hybridized child blocks. However, if your model is a hybrid_block and the children are mixed hybrid and non-hybrid blocks, then it does't work, because deferred compute and tracing created cachedop based on the data flow of the first input data. The following situation will not work.

import mxnet as mx
from mxnet.gluon import HybridBlock
from mxnet import autograd as ag

class Block1(HybridBlock):
    def __init__(self):
        super(Block1, self).__init__()

    def forward(self, a, b):
        return a - b

class Block2(HybridBlock):
    def __init__(self):
        super(Block2, self).__init__()

    def forward(self, a, b):
        if a > b:
            ret = a
        else:
            ret = b
        return ret

class Block3(HybridBlock):
    def __init__(self):
        super(Block3, self).__init__()

    def forward(self, a, b):
        return a * b

class MixedBlock(HybridBlock):
    def __init__(self, block1, block2, block3):
        super(MixedBlock, self).__init__()
        self.block1 = block1
        self.block2 = block2
        self.block3 = block3

    def forward(self, a, b):
        out1 = self.block1(a, b)
        out2 = self.block2(out1, b)
        out3 = self.block3(out2, a)
        return out3

block1, block2, block3 = Block1(), Block2(), Block3()
mix = MixedBlock(block1, block2, block3)
mix.initialize()
mix.hybridize()
# block1.hybridize()
# block3.hybridize()
block2.hybridize(active=False)
x_list = [mx.np.array([i]) for i in range(5)]
y_list = [mx.np.array([1])]*5
for x, y in zip(x_list, y_list):
    x.attach_grad()
    y.attach_grad()
    with ag.record():
        out = mix(x, y)
    out.backward()
    print(out, x.grad, y.grad)

The Output is

[0.] [1.] [0.]
[1.] [1.] [1.]
[2.] [1.] [2.]
[3.] [1.] [3.]
[4.] [1.] [4.]

Then you will need to use npx.cond this control flow operator for block2 like this

class Block2(HybridBlock):
    def __init__(self):
        super(Block2, self).__init__()

    def forward(self, a, b):
        return npx.cond(lambda a, b: a > b,
                        lambda a, b: a,
                        lambda a, b: b,
                        [a, b])

ptrendx · 2021-06-17T20:48:25Z

@barry-jin Thanks for the examples, I just tested changing MixedBlock and Block2 to gluon.Block in your second example and then it works (so it actually looks the same as in 1.x) :-).

barry-jin · 2021-06-18T03:04:08Z

@mxnet-bot run ci [centos-cpu, centos-gpu, unix-cpu]

mxnet-bot · 2021-06-18T03:04:15Z

Jenkins CI successfully triggered : [centos-gpu, centos-cpu, unix-cpu]

barry-jin · 2021-06-18T19:21:38Z

@mxnet-bot run ci [all]

mxnet-bot · 2021-06-18T19:21:49Z

Jenkins CI successfully triggered : [clang, centos-gpu, unix-cpu, website, sanity, edge, centos-cpu, unix-gpu, windows-cpu, miscellaneous, windows-gpu]

TristonC

I vote for the forward interface compare to hybrid_forward interface. Thanks.

TristonC · 2021-06-18T23:32:28Z

benchmark/python/control_flow/rnn.py

 import mxnet as mx
-import numpy as np
-from mxnet import gluon
+import numpy as onp


Is the 'o' in onp as original as this is original numpy? It will be confusing as np being well known as numpy for short.

Yes, 'o' in onp is 'official', which is used to distinguish between official numpy and MXNet numpy. Usually, user will do
from mxnet import np and build their models with numpy operators from MXNet. This will provide numpy-compatible coding experience in MXNet for users.

TristonC · 2021-06-18T23:33:52Z

benchmark/python/control_flow/rnn.py

            times.append((tock - tick) * 1000.0)
        times = times[args.warmup_rounds: ]
-        print("Time used: mean = %.3f ms, std = %.3f ms" % (np.mean(times), np.std(times)))
+        print("Time used: mean = %.3f ms, std = %.3f ms" % (onp.mean(times), onp.std(times)))


Will mxnet np provide the mean and std function?

Yes. mean and std operators are implemented as mxnet.np.mean and mxnet.np.std

TristonC · 2021-06-18T23:36:17Z

docs/python_docs/python/tutorials/packages/gluon/blocks/custom-layer.md

 Looking into implementation of [existing layers](https://mxnet.apache.org/api/python/gluon/nn.html), one may find that more often a block inherits from a [HybridBlock](https://github.com/apache/incubator-mxnet/blob/master/python/mxnet/gluon/block.py#L428), instead of directly inheriting from `Block`.

-The reason for that is that `HybridBlock` allows to write custom layers that can be used in imperative programming as well as in symbolic programming. It is convinient to support both ways, because the imperative programming eases the debugging of the code and the symbolic one provides faster execution speed. You can learn more about the difference between symbolic vs. imperative programming from [this article](https://mxnet.apache.org/api/architecture/overview.html).
+The reason for that is that `HybridBlock` allows to write custom layers in imperative programming style, while computing in a symbolic way. It unifies the flexibility of imperative programming with the performance benefits of symbolic programming. You can learn more about the difference between symbolic vs. imperative programming from [this article](https://mxnet.apache.org/api/architecture/overview.html).


between ... and for " the difference between symbolic vs. imperative programming"

TristonC · 2021-06-19T00:16:27Z

docs/python_docs/python/tutorials/packages/gluon/blocks/custom-layer.md

-Usually, a layer has a set of associated parameters, sometimes also referred as weights. This is an internal state of a layer. Most often, these parameters are the ones, that we want to learn during backpropogation step, but sometimes these parameters might be just constants we want to use during forward pass.
-
-All parameters of a block are stored and accessed via [ParameterDict](https://github.com/apache/incubator-mxnet/blob/master/python/mxnet/gluon/parameter.py#L508) class. This class helps with initialization, updating, saving and loading of the parameters. Each layer can have multiple set of parameters, and all of them can be stored in a single instance of the `ParameterDict` class. On a block level, the instance of the `ParameterDict` class is accessible via `self.params` field, and outside of a block one can access all parameters of the network via [collect_params()](https://mxnet.apache.org/api/python/gluon/gluon.html#mxnet.gluon.Block.collect_params) method called on a `container`. `ParameterDict` uses [Parameter](https://mxnet.apache.org/api/python/gluon/gluon.html#mxnet.gluon.Parameter) class to represent parameters inside of Apache MxNet neural network. If parameter doesn't exist, trying to get a parameter via `self.params` will create it automatically.
+Usually, a layer has a set of associated parameters, sometimes also referred as weights. This is an internal state of a layer. Most often, these parameters are the ones, that we want to learn during backpropogation step, but sometimes these parameters might be just constants we want to use during forward pass. The parameters are usually represented as [Parameter](https://mxnet.apache.org/api/python/gluon/gluon.html#mxnet.gluon.Parameter) class inside of Apache MxNet neural network.


MxNet -> MXNet

TristonC · 2021-06-19T00:19:02Z

docs/python_docs/python/tutorials/packages/gluon/blocks/custom-layer.md

 net.add(Dense(5))                                         # Add Dense layer with 5 neurons
 net.add(NormalizationHybridLayer(hidden_units=5, 
-                                    scales = nd.array([2]))) # Add our custom layer
+                                 scales = np.array([2]))) # Add our custom layer


What about just say # Add a customer layer

TristonC · 2021-06-19T00:28:22Z

include/mxnet/c_api.h

                                                SymbolHandle *out);

+/*!
+ * \brief Clear the info node associated with the arrays.


The brief is not obvious with the function name. It is more about how the deferred compute is handled.

TristonC · 2021-06-19T00:28:37Z

include/mxnet/c_api.h

+ * \brief Clear the info node associated with the arrays.
+ * \param arrays array handles of arrays
+ * \param num number of arrays
+ * \return 0 when success, -1 when failure happens


-1 otherwise

TristonC · 2021-06-19T00:33:49Z

python/mxnet/gluon/block.py

        super(HybridBlock, self).__init__()
-        self._v2 = inspect.unwrap(self.hybrid_forward.__func__) is HybridBlock.hybrid_forward
+        assert hasattr(self, "hybrid_forward") is False, (
+            "Starting from MXNet2.0, Gluon2.0 with forward interface will be used instead of "


Does both MXNet2.0 and Gluon2.0 need to be met at the same time? Propose:
'forward' instead of 'hybrid_forward' interfaces needs to be used starting from Gluon 2.0. ......

barry-jin · 2021-06-19T01:26:20Z

@TristonC Thanks for your suggestions on improving the documentation!

barry-jin · 2021-06-21T04:13:26Z

@mxnet-bot run ci [centos-cpu]

mxnet-bot · 2021-06-21T04:13:32Z

Jenkins CI successfully triggered : [centos-cpu]

use forward

2fcc643

barry-jin requested a review from szha as a code owner May 12, 2021 19:29

mseth10 added the pr-work-in-progress PR is still work in progress label May 12, 2021

barry-jin added 8 commits May 12, 2021 13:56

fix lint

d70041a

update

6a28465

update

ee41074

fix lint

74a668b

remove symbol related test from gluon2.0 blocks

d0d9ff9

fix lint

687e366

remove legacy tests on gluon

7c9ff36

fix lint

07f1c26

szha mentioned this pull request May 18, 2021

[Development] v2.0.0 beta 0 release #19139

Open

6 tasks

barry-jin added 17 commits May 18, 2021 11:09

add group_norm in npx && fix some tests

cf3bb33

fix lint

2a8a79d

gluon2.0 rnn

fa3992e

fix lint

a2f4f3b

update rnn_layer infer_shape

ddad98a

gluon2.0 probability clean up F

a0b625e

update probability

47ba77e

fix lint

2f7ef9f

update

2a0b8e9

use forward interface in tests

ca2c1ff

merge master

f166621

fix operator np.average

1560a91

copy contrib.foreach/while_loop/cond contrib.sign_ste/round_ste to np…

c711aed

…x && adopt forward interface in contrib tests

fix tests

7a07bee

rnn states set context

3a9914e

fix lint

37a92ef

update block.py && fix tests

0fde664

barry-jin added 2 commits June 9, 2021 20:18

dc.clear

2b1ad5f

update

0145e63

leezu reviewed Jun 10, 2021

View reviewed changes

szha reviewed Jun 11, 2021

View reviewed changes

barry-jin mentioned this pull request Jun 15, 2021

[Gluon2.0] Skipped tests in #20262 #20354

Closed

barry-jin added 7 commits June 16, 2021 12:48

Create DC compatible control flow operators in npx namespace

a7db1f3

fix lint'

2cf07e6

/ q

update

ec1f788

update

aa3dd06

update foreach operator

f1e8aeb

update rnn_cell.py

132c667

add control flow operators in amp lists

0a7d1da

dc.clear after creating graph with dc in control flow operators

ed04d7e

barry-jin added 2 commits June 17, 2021 22:24

Merge remote-tracking branch 'upstream/master' into issue-19138

3fc1dbb

upgrade test_quantization.py to use gluon2.0

4cc1f7e

TristonC reviewed Jun 19, 2021

View reviewed changes

improve documentation

e9f185b

leezu approved these changes Jun 21, 2021

View reviewed changes

Zha0q1 mentioned this pull request Jul 11, 2021

TypeError: Block's inputs must be of type mxnet.numpy.ndarray ... #20442

Closed

Conversation

barry-jin commented May 12, 2021 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Description

Checklist

Essentials

Changes

Some skipped tests

Uh oh!

mxnet-bot commented May 12, 2021

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

ptrendx commented Jun 17, 2021

Uh oh!

szha commented Jun 17, 2021

Uh oh!

barry-jin commented Jun 17, 2021 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

ptrendx commented Jun 17, 2021

Uh oh!

barry-jin commented Jun 18, 2021

Uh oh!

mxnet-bot commented Jun 18, 2021

Uh oh!

barry-jin commented Jun 18, 2021

Uh oh!

mxnet-bot commented Jun 18, 2021

Uh oh!

TristonC left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

barry-jin Jun 19, 2021 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

barry-jin commented Jun 19, 2021

Uh oh!

barry-jin commented Jun 21, 2021

Uh oh!

mxnet-bot commented Jun 21, 2021

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

7 participants

barry-jin commented May 12, 2021 •

edited

Loading

barry-jin commented Jun 17, 2021 •

edited

Loading

barry-jin Jun 19, 2021 •

edited

Loading