[2.0] Gluon2.0: switch to use forward interface#20262
Conversation
|
Hey @barry-jin , Thanks for submitting the PR
CI supported jobs: [website, windows-cpu, windows-gpu, miscellaneous, centos-gpu, edge, unix-gpu, sanity, unix-cpu, clang, centos-cpu] Note: |
…x && adopt forward interface in contrib tests
| out = super().__call__(*args) | ||
| flatten_out, self._out_format = _flatten(out, "output") | ||
| symbol_outputs = dc.get_symbol(flatten_out, sym_cls=type(symbol_inputs[0])) | ||
| dc.clear(flatten_out) |
There was a problem hiding this comment.
dc.clear here is called on the copied symbol? Previously you mentioned that the copied graph does not contain DCInfo. If so, dc.clear here would be a no-op. If the copied symbol however contains DCInfo, then you could clear it already before returning the symbol. WDYT?
There was a problem hiding this comment.
The dc.clear here is called on the ndarray outputs, which contain the node_entry(deferredcompute_entry_) with DCInfo. The copied symbol doesn't contain DCInfo, so I try to clear DCInfo from ndarray outputs.
There was a problem hiding this comment.
Sorry I misread the flatten_out as symbol_outputs. "so I try to clear DCInfo from ndarray outputs" still shouldn't be needed and garbage collection of the ndarray should clear the entry. So it would still be helpful to point out the temporary nature of the clear call
There was a problem hiding this comment.
Yes, current dc.clear is just a workaround. I investigate into the node destructor and it looks like there is no clear call for node->info.
| rnn_param_concat = F.np._internal.rnn_param_concat if is_np_array()\ | ||
| else F._internal._rnn_param_concat | ||
| params = rnn_param_concat(*params, dim=0) | ||
| params = np.concatenate(params, axis=0) |
There was a problem hiding this comment.
just to be clear, there still is a performance penalty for this call. it's ok to address it in a separate PR.
|
@leezu @barry-jin A small general question - how does this deferred compute flow work when I have a network which has some static and some dynamic elements (e.g. I have 1 part that is static, then I do something on the output that can't be hybridized, then I have another hybridizable portion). In the previous interface I could do a |
|
The dynamic part needs to be expressed with control flow ops if possible. Otherwise, Block containing 2 HybridBlocks should continue to work. |
Hi @ptrendx . I think the scenario you talked about is like this. import mxnet as mx
from mxnet.gluon import HybridBlock, Block
from mxnet import autograd as ag
class Block1(HybridBlock):
def __init__(self):
super(Block1, self).__init__()
def forward(self, a, b):
return a - b
class Block2(HybridBlock):
def __init__(self):
super(Block2, self).__init__()
def forward(self, a, b):
if a > b:
ret = a
else:
ret = b
return ret
class Block3(HybridBlock):
def __init__(self):
super(Block3, self).__init__()
def forward(self, a, b):
return a * b
class MixedBlock(Block):
def __init__(self, block1, block2, block3):
super(MixedBlock, self).__init__()
self.block1 = block1
self.block2 = block2
self.block3 = block3
def forward(self, a, b):
out1 = self.block1(a, b)
out2 = self.block2(out1, b)
out3 = self.block3(out2, a)
return out3
block1, block2, block3 = Block1(), Block2(), Block3()
mix = MixedBlock(block1, block2, block3)
mix.initialize()
block1.hybridize()
block3.hybridize()
x_list = [mx.np.array([i]) for i in range(5)]
y_list = [mx.np.array([1])]*5
for x, y in zip(x_list, y_list):
x.attach_grad()
y.attach_grad()
with ag.record():
out = mix(x, y)
out.backward()
print(out, x.grad, y.grad)And the output is Only block1 and block3 are hybridized while block2 has if-else statements so I do not hybridize it. import mxnet as mx
from mxnet.gluon import HybridBlock
from mxnet import autograd as ag
class Block1(HybridBlock):
def __init__(self):
super(Block1, self).__init__()
def forward(self, a, b):
return a - b
class Block2(HybridBlock):
def __init__(self):
super(Block2, self).__init__()
def forward(self, a, b):
if a > b:
ret = a
else:
ret = b
return ret
class Block3(HybridBlock):
def __init__(self):
super(Block3, self).__init__()
def forward(self, a, b):
return a * b
class MixedBlock(HybridBlock):
def __init__(self, block1, block2, block3):
super(MixedBlock, self).__init__()
self.block1 = block1
self.block2 = block2
self.block3 = block3
def forward(self, a, b):
out1 = self.block1(a, b)
out2 = self.block2(out1, b)
out3 = self.block3(out2, a)
return out3
block1, block2, block3 = Block1(), Block2(), Block3()
mix = MixedBlock(block1, block2, block3)
mix.initialize()
mix.hybridize()
# block1.hybridize()
# block3.hybridize()
block2.hybridize(active=False)
x_list = [mx.np.array([i]) for i in range(5)]
y_list = [mx.np.array([1])]*5
for x, y in zip(x_list, y_list):
x.attach_grad()
y.attach_grad()
with ag.record():
out = mix(x, y)
out.backward()
print(out, x.grad, y.grad)The Output is Then you will need to use npx.cond this control flow operator for block2 like this class Block2(HybridBlock):
def __init__(self):
super(Block2, self).__init__()
def forward(self, a, b):
return npx.cond(lambda a, b: a > b,
lambda a, b: a,
lambda a, b: b,
[a, b]) |
|
@barry-jin Thanks for the examples, I just tested changing |
|
@mxnet-bot run ci [centos-cpu, centos-gpu, unix-cpu] |
|
Jenkins CI successfully triggered : [centos-gpu, centos-cpu, unix-cpu] |
|
@mxnet-bot run ci [all] |
|
Jenkins CI successfully triggered : [clang, centos-gpu, unix-cpu, website, sanity, edge, centos-cpu, unix-gpu, windows-cpu, miscellaneous, windows-gpu] |
TristonC
left a comment
There was a problem hiding this comment.
I vote for the forward interface compare to hybrid_forward interface. Thanks.
| import mxnet as mx | ||
| import numpy as np | ||
| from mxnet import gluon | ||
| import numpy as onp |
There was a problem hiding this comment.
Is the 'o' in onp as original as this is original numpy? It will be confusing as np being well known as numpy for short.
There was a problem hiding this comment.
Yes, 'o' in onp is 'official', which is used to distinguish between official numpy and MXNet numpy. Usually, user will do
from mxnet import np and build their models with numpy operators from MXNet. This will provide numpy-compatible coding experience in MXNet for users.
| times.append((tock - tick) * 1000.0) | ||
| times = times[args.warmup_rounds: ] | ||
| print("Time used: mean = %.3f ms, std = %.3f ms" % (np.mean(times), np.std(times))) | ||
| print("Time used: mean = %.3f ms, std = %.3f ms" % (onp.mean(times), onp.std(times))) |
There was a problem hiding this comment.
Will mxnet np provide the mean and std function?
There was a problem hiding this comment.
Yes. mean and std operators are implemented as mxnet.np.mean and mxnet.np.std
| Looking into implementation of [existing layers](https://mxnet.apache.org/api/python/gluon/nn.html), one may find that more often a block inherits from a [HybridBlock](https://github.com/apache/incubator-mxnet/blob/master/python/mxnet/gluon/block.py#L428), instead of directly inheriting from `Block`. | ||
|
|
||
| The reason for that is that `HybridBlock` allows to write custom layers that can be used in imperative programming as well as in symbolic programming. It is convinient to support both ways, because the imperative programming eases the debugging of the code and the symbolic one provides faster execution speed. You can learn more about the difference between symbolic vs. imperative programming from [this article](https://mxnet.apache.org/api/architecture/overview.html). | ||
| The reason for that is that `HybridBlock` allows to write custom layers in imperative programming style, while computing in a symbolic way. It unifies the flexibility of imperative programming with the performance benefits of symbolic programming. You can learn more about the difference between symbolic vs. imperative programming from [this article](https://mxnet.apache.org/api/architecture/overview.html). |
There was a problem hiding this comment.
between ... and for " the difference between symbolic vs. imperative programming"
| Usually, a layer has a set of associated parameters, sometimes also referred as weights. This is an internal state of a layer. Most often, these parameters are the ones, that we want to learn during backpropogation step, but sometimes these parameters might be just constants we want to use during forward pass. | ||
|
|
||
| All parameters of a block are stored and accessed via [ParameterDict](https://github.com/apache/incubator-mxnet/blob/master/python/mxnet/gluon/parameter.py#L508) class. This class helps with initialization, updating, saving and loading of the parameters. Each layer can have multiple set of parameters, and all of them can be stored in a single instance of the `ParameterDict` class. On a block level, the instance of the `ParameterDict` class is accessible via `self.params` field, and outside of a block one can access all parameters of the network via [collect_params()](https://mxnet.apache.org/api/python/gluon/gluon.html#mxnet.gluon.Block.collect_params) method called on a `container`. `ParameterDict` uses [Parameter](https://mxnet.apache.org/api/python/gluon/gluon.html#mxnet.gluon.Parameter) class to represent parameters inside of Apache MxNet neural network. If parameter doesn't exist, trying to get a parameter via `self.params` will create it automatically. | ||
| Usually, a layer has a set of associated parameters, sometimes also referred as weights. This is an internal state of a layer. Most often, these parameters are the ones, that we want to learn during backpropogation step, but sometimes these parameters might be just constants we want to use during forward pass. The parameters are usually represented as [Parameter](https://mxnet.apache.org/api/python/gluon/gluon.html#mxnet.gluon.Parameter) class inside of Apache MxNet neural network. |
| net.add(Dense(5)) # Add Dense layer with 5 neurons | ||
| net.add(NormalizationHybridLayer(hidden_units=5, | ||
| scales = nd.array([2]))) # Add our custom layer | ||
| scales = np.array([2]))) # Add our custom layer |
There was a problem hiding this comment.
What about just say # Add a customer layer
include/mxnet/c_api.h
Outdated
| SymbolHandle *out); | ||
|
|
||
| /*! | ||
| * \brief Clear the info node associated with the arrays. |
There was a problem hiding this comment.
The brief is not obvious with the function name. It is more about how the deferred compute is handled.
include/mxnet/c_api.h
Outdated
| * \brief Clear the info node associated with the arrays. | ||
| * \param arrays array handles of arrays | ||
| * \param num number of arrays | ||
| * \return 0 when success, -1 when failure happens |
python/mxnet/gluon/block.py
Outdated
| super(HybridBlock, self).__init__() | ||
| self._v2 = inspect.unwrap(self.hybrid_forward.__func__) is HybridBlock.hybrid_forward | ||
| assert hasattr(self, "hybrid_forward") is False, ( | ||
| "Starting from MXNet2.0, Gluon2.0 with forward interface will be used instead of " |
There was a problem hiding this comment.
Does both MXNet2.0 and Gluon2.0 need to be met at the same time? Propose:
'forward' instead of 'hybrid_forward' interfaces needs to be used starting from Gluon 2.0. ......
|
@TristonC Thanks for your suggestions on improving the documentation! |
|
@mxnet-bot run ci [centos-cpu] |
|
Jenkins CI successfully triggered : [centos-cpu] |
Description
#19138
Checklist
Essentials
Changes
Some skipped tests