-
-
Notifications
You must be signed in to change notification settings - Fork 11.8k
WIP: additional revision for NEP-18 (__array_function__)
#11374
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks great. Only a persistent note about types. Given what you write, a set is more logical, but I still would favour a dict with types as the keys and list of arguments of that type as a value... I.e., in your notebook, I would do
from collections import defaultdict
import numpy as np
ndarray = np.ndarray
ndarray_array_function = ndarray.ndarray.__array_function__
def get_overloaded_types_and_args(relevant_args):
"""Returns a list of arguments on which to call __array_function__.
__array_function__ implementations should be called in order on the return
values from this function.
"""
# Runtime is O(num_arguments * num_unique_types)
overloaded_types = defaultdict(list)
overloaded_args = []
for arg in relevant_args:
arg_type = type(arg)
if arg_type is ndarray or (getattr(arg_type, '__array_function__',
ndarray_array_function) is
ndarray_array_function):
continue
if arg_type not in overloaded_types:
# Are we a subclass of something already present?
for i, old_arg in enumerate(overloaded_args):
if issubclass(arg_type, type(old_arg)):
# We are a true subclass here, given upper if statement.
overloaded_args.insert(i, arg)
break
else:
overloaded_args.append(arg)
overloaded_types[arg_type].append(arg)
return overloaded_types, overloaded_args
| As a convenience for ``__array_function__`` implementors, ``types`` contains a | ||
| list of argument types with an ``'__array_function__'`` attribute. This allows | ||
| downstream implementations to quickly determine if they are likely able to | ||
| support the operation. There are no guarantees about the order of argument types in ``types``: ``__array_funtion__`` implementations that rely on such behavior |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should types be a set if we don't guarantee order? Would be easier to implement (no checking needed for whether a given class was already seen).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm pretty sure that eventually we'll implement try_array_function_override in C, so we're definitely not making a types a Python set/list object until we're sure we need to call __array_function__. So convenient implementation isn't really a concern, but I agree that it's a better for our interface to enforce invariants, as long as we don't pay a performance penalty. And indeed, the speed difference for creating a small set vs a list vs a tuple is pretty small:
class A:
pass
class B:
pass
x = [A, B]
%timeit tuple(x) # 158 ns per loop
%timeit list(x) # 186 ns per loop
%timeit set(x) # 225 ns per loop
Purely for speed of construction of Python objects, a
I agree, this would definitely be maximally useful to My concern is that this would be difficult to implement without a (small) performance penalty:
So instead, I'll propose that we should pass dispatcher functions as part of the |
|
Hmm, perhaps I was overthinking this; I'd guess in most implementations one can would call one's own routine and things are in the right place already. My sense would be not to pass on the dispatcher, though it might be good if there was a programmatic way to retrieve it for those that would like it. For instance, perhaps the decorated function could have a |
If implementations need access to the disptacher (it would be good to see use-cases), then my inclination would be to include it directly. Most but not all functions will use the decorator (see the updated NEP for details), so it may not always be easy to attach a Separate question: should I rename the |
| if success: | ||
| return value | ||
| return func(*args, **kwargs) | ||
| return new_func |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
To attach the dispatcher, wouldn't one just do
new_func.dispatcher = dispatcher
return new_func
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This doesn't work for "builtin" functions written in C like np.concatenate:
>>> np.concatenate.foo = 1
---------------------------------------------------------------------------
AttributeError Traceback (most recent call last)
<ipython-input-2-a0d69ae14b06> in <module>()
----> 1 np.concatenate.foo = 1
AttributeError: 'builtin_function_or_method' object has no attribute 'foo'
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
But in __array_ufunc__, aren't we passing on a wrapped concatenate? One can definitely assign to the wrapped function which would become the np.concatenate that people actually see:
def new_concat(*args, **kwargs):
return np.concatenate(*args, **kwargs)
nc = functools.wraps(np.concatenate)(new_concat)
nc.dispatch = 'a'
(obviously, the same would be possible in C)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It depends on how we choose to implement np.concatenate with dispatching. Potentially we could do it either way.
I'm sure it's also possible to define a normal Python function from C directly rather than a builtin. I don't know the trade-offs there.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it would be wise to keep the wrapped and unwrapped versions separate, so that, e.g., __array_function__ can directly call the unwrapped one. (I still hope to do the same for the ufuncs).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
p.s. one of the points of the wrapping in the example is to have access to dispatch without actually having to pass that in; it therefore must always be possible to expose it!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
p.s. one of the points of the wrapping in the example is to have access to dispatch without actually having to pass that in; it therefore must always be possible to expose it!
I'm not sure I follow here: what is "it"? :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
dispatch: your wrapped function must be able to call dispatch and therefore has access to it, and must thus be able to expose it to the outside. Of course, this does rely on having the wrapped function be the one in the numpy namespace and identical to the one passed on in __array_function__.
Maybe most relevant to the NEP: maybe good to state a preference for C functions being wrapped, not having dispatch cooked inside.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
By dispatch do you mean the operation specific function that returns arguments to check (what I call dispatcher) or the decorator @array_function_dispatch?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry, yes, I meant the dispatcher used inside new_func
|
See in-line comment about attaching (and, no, I'm not sure about use cases either - I'd need to start writing a About the name: yes, |
OK, in that case I'll probably put a note about passing |
| resembling the `decorator library <https://github.com/micheles/decorator>`_ | ||
| to automatically write an overloaded function like the manually written | ||
| implemenations above with the exact same signature as the original. | ||
| Unfortunately, using the ``inspect`` module instead of code generation would |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Presumably the inspect module would be needed for code generation anyway, so these are not mutually exclusive. Perhaps "Using the inspect module at runtime instead of at code-generation time"?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks. I haven't gotten to revising this section yet (I just moved it unchanged down to "Alternatives")
|
Is there a clear need for |
|
@mattip - the difference with But I'm not sure the dispatcher is necessarily all that helpful indeed, since it will still not tell me where the argument came from and hence I cannot easily, say, replace it with a properly adjusted As I don't have a good use case, let's just mention it somewhere and move on - I'm quite excited by the idea of actually being able to try this! |
|
CC @skrah also. |
|
@mhvk surfacing your inline comment, we could potentially add the "non-dispatching" version of NumPy functions to the protocol, e.g., I'm inclined to put these in the same category of "possible future extensions" as exposing |
|
@shoyer - thanks for resurfacing - I think the non-override function has clearer use than the dispatcher - there should be a fair group of classes that work via Maybe things like these can be in a section "to be decided based on actual implementation" or "possible enhancements"? |
I agree, but just like case for ufuncs, it also works to call the numpy API function directly again, after handling all inputs to remove arguments handled by a particular
Will do. |
|
|
||
| We propose to do so by writing "dispatcher" functions for each overloaded | ||
| NumPy function: | ||
| - These functions will be called with the exact same arguments that were passed |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Empty line needed before first list item
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This still needs fixing.
hameerabbasi
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Might be worth sending to the mailing list soon-ish.
| docstring to note which arguments are checked for overloads. | ||
|
|
||
| It's particularly worth calling out the decorator's use of | ||
| ``functools.wraps``: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It might be worth mentioning that this isn't a catch-all thing, it doesn't always work on arbitrary callables if they aren't Python functions. I ran into this trying to nicely wrap ufuncs. Here's a quick demo, it shows that repr and string don't work, but __doc__ does. It's worth noting that this can be worked around, see https://github.com/pydata/sparse/blob/b51d74924d62ff6537b15ce4e1dd4e56080a3b6f/sparse/utils.py#L187-L191
It might be worth investigating this as a Python issue, but I'm not well-versed enough in Python internals to say if this is actually a bug.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, I think this only works well Python functions. For classes/methods, we would need another solution.
But I think functions are mostly what we want to cover in this NEP.
| to work well with static analysis tools, which could report invalid arguments. | ||
| Likewise, there is a price in readability: these optional arguments won't be | ||
| included in the docstrings for NumPy functions. These downsides mean that users | ||
| should make use of this feature with care. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I wrote this section to justify passing on unrecognized **kwargs, but now that I've considered the downsides I no longer think it's a good idea! I will probably move this to the section on "Alternatives".
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I actually agree with you. If someone needs implementation-specific kwargs, they can just call the library's implementation directly, instead of NumPy's.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also agreed - better to keep it simple.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
| ~~~~~~~~~~~~~ | ||
|
|
||
| Within NumPy | ||
| '''''''''''' |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You're skipping a heading level here, which is making CircleCI fail. Above, you use ~, then ^ and then '. Here, you're skipping ^. The order of the heading levels must be consistent.
|
After the latest revisions, I think this is basically ready to be sent back to the list. Did I miss anything? |
hameerabbasi
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Minor typo/clarity comment. Feel free to ignore. This is good to go from my side.
| optional arguments to NumPy functions without breaking working code already | ||
| relying on ``__array_function__``. | ||
| optional arguments to NumPy functions without breaking code that already | ||
| relyies on ``__array_function__``. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
s/relyies/relies
| def __array_function__(self, func, types, args, kwargs): | ||
| if func not in HANDLED_FUNCTIONS: | ||
| return NotImplemented | ||
| # Note: we use MyArray instead of type(self) for issubclass to allow |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
instead of type(self) -> instead of comparison with type(self)
|
|
||
| An important virtue of this approach is that it allows for adding new | ||
| optional arguments to NumPy functions without breaking code that already | ||
| relyies on ``__array_function__``. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
s/relyies/relies
|
@mrocklin please take a look when you have the chance |
Everything here looks good to me. My apologies for being absent for much of this conversation. I would be fine if you all wanted to replace my name in the authors list with the names of @hameerabbasi and @mhvk, both of whom have been much more active here than I have been. |
|
@hameerabbasi @mhvk @eric-wieser I would love to include all of you as co-authors on this proposal! I did most of the writing, but the design has been a real collaborative effort. Please let me know if you're OK with that (e.g., 👍on this comment). @mrocklin You made significant contributions to the first draft of this, so I would prefer to keep you on, too. |
|
For a future PR, I'd be willing to do the writing. |
|
@shoyer waiting for additional author attributions change before merging |
|
@eric-wieser are you OK with being included as an author? Please let me know either way :). |
|
Sure! |
|
OK, merging. I'll send another email to the list shortly :). |
|
I had an idea of how to handle array construction methods in this NEP. Just commenting here to be reminded on the list later. |
__array_function__)
Adjusts the text to describe a variant of @eric-wieser's "explicit decorator" proposal from #11303 (comment)
There are a still a number of TODOs from the last PR, and I haven't even gotten to revising the "Alternatives" or "Drawbacks" sections yet, but I thought I might as well put this up. I promise we'll send this back to the mailing list for discussion soon!
CCing the usual @mrocklin @mhvk @hameerabbasi @ngoldbaum @mattip