-
-
Notifications
You must be signed in to change notification settings - Fork 33.9k
bpo-21145: Add cached_property decorator in functools #6982
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
(Failing CI builds are asyncio flakiness, not related.) |
Lib/functools.py
Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
When cls is None? When __get__ is called without cls parameter?
Btw in docs for descriptors, the third argument is called owner IIRC.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
AFAIK every call to __get__ generated by attribute lookup machinery will include the second (not including self) argument, but both the descriptor documentation (https://docs.python.org/3/reference/datamodel.html#descriptors) and the descriptor howto guide (https://docs.python.org/3/howto/descriptor.html) explicitly include an example of manually calling a descriptor's __get__ method providing only one argument. Also, all of the code examples in the HOWTO show the second argument as optional. Given that the method can be called directly and cls is unused in this implementation, it seems reasonable to follow that lead and make it optional.
Regarding the argument name, considering both the Data Model documentation and the HOWTO, we see examples in Python's own documentation of descriptors whose __get__ second argument is named owner, type, or objtype. All three of these are found in the standard library, along with cls, klass, ownerclass, and _type. There doesn't seem to be a standard usage. Personally I find cls or klass or objtype clearer, as owner doesn't clarify the key difference from the other argument, which is that it's a type. Perhaps naming the two arguments obj and objtype does the best job of clearly indicating their relationship. That said, I don't feel strongly and will rename to owner if you prefer that.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I doubt if howto is 100% correct.
datamodel doesn't allow None for owner, AFAIK Python aways pass a class type.
Using default for always specified argument confuses me.
datamodel uses owner, please do it as well.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If we need to change recommended argument name -- let's discuss it in another issue.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Changed signature of __get__ to (self, instance, owner) to match Data Model documentation.
Lib/functools.py
Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
__getitem__ / setdefault are atomic operations.
The code could be modified in the following way:
- Get attribute value without locking.
- If cached attr is not found -- acquire the lock, get the value again under the lock and calculate it if not present.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sure, will update. Thanks for the review!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
looks good
|
A Python core developer has requested some changes be made to your pull request before we can consider merging it. If you could please address their requests along with any other requests in other reviews from core developers that would be appreciated. Once you have made the requested changes, please leave a comment on this pull request containing the phrase |
|
I have made the requested changes; please review again. |
|
Thanks for making the requested changes! @asvetlov: please review the changes made to this pull request. |
|
I have made the requested changes; please review again. |
|
Thanks for making the requested changes! @asvetlov: please review the changes made to this pull request. |
Lib/test/test_functools.py
Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks like the method is never called
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, that’s expected. The point of this class is just to test the error message; cached property on classes with slots is not supported. Would you find it clearer if I used pass as the body? I just wrote it the way someone realistically might, to make it clear that the error is not due to a mistake in the implementation of the method.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
raise RuntimeError('never executed') is even more clear.
|
I have made the requested changes; please review again. |
|
Thanks for making the requested changes! @asvetlov: please review the changes made to this pull request. |
asvetlov
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
Would be nice to get confirmation from code owners, @rhettinger and @ncoghlan
|
Ooop. @carljm please resolve merging conflicts. |
sir-sigurd
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It doesn't work properly in this case:
In [75]: class A:
...: def f(self, x=1):
...: return x
...:
...: cached_f = cached_property(f)
In [76]: a = A()
In [77]: a.f(2)
Out[77]: 2
In [78]: a.cached_f
Out[78]: 1
In [79]: a.f(2)
---------------------------------------------------------------------------
TypeError Traceback (most recent call last)
<ipython-input-79-3226073480ca> in <module>()
----> 1 a.f(2)
TypeError: 'int' object is not callable
So __set_name__() should be use to determine attrname.
Lib/test/test_functools.py
Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this needed? It seems to be always shadowed in __init__().
Lib/test/test_functools.py
Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Redundant line?
Doc/library/functools.rst
Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I prefer _data to imply dataset.data = newdata is not supported.
|
Thanks for the comments! Will update next week. |
|
@carljm ping? |
|
I have made the requested changes; please review again. |
Lib/functools.py
Outdated
| class cached_property: | ||
| def __init__(self, func): | ||
| self.func = func | ||
| self.attrname = func.__name__ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This looks redundant since attrname is set in __set_name__()
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What do you suggest to do instead? This is the most reasonable fallback in case __set_name__ is not called in some unusual use of this class, and it avoids the object ever being in an incompletely initialized state.
Lib/functools.py
Outdated
| self.lock = RLock() | ||
|
|
||
| def __set_name__(self, owner, name): | ||
| self.attrname = name |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Might make sense to raise error in this case, since a will never be cached:
class C:
@cached_property
def a(self):
pass
b = a
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OK. I guess that also answers the above question; we will raise error in case __set_name__ is ever called twice, and also I guess raise error if __get__ is called and __set_name__ was not.
|
@sir-sigurd How does that look? |
|
@rhettinger Would you review this? |
Lib/functools.py
Outdated
| else: | ||
| raise TypeError( | ||
| "Cannot assign the same cached_property to two names " | ||
| f"({self.attrname!r} and {name!r}) on the same class." |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This message is inaccurate because the exception is also raised when it's assigned on the different classes.
I'm not sure that we should disallow to assign it on the different classes under the same name.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good catch. I'd like to avoid tracking owners, but it should be sufficient to just allow repeated __set_name__ as long as the name remains the same.
|
@sir-sigurd Better? |
|
@carljm 👍 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A couple of minor comments and error checking suggestion related to the technicalities of why __dict__ may not exist, and why it may not support __setitem__, but otherwise this looks good to me.
Doc/library/functools.rst
Outdated
|
|
||
| .. note:: | ||
|
|
||
| This decorator does not support classes which define ``__slots__``. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd word this note slightly differently, as there are multiple ways to end up with instances that don't have a mutable mapping as their __dict__ attribute, and you can also include __dict__ in a __slots__ list if you want the fast C level access for selected fields, but aren't worried about the per-instance dict memory overhead.
Something like:
This decorator requires that the
__dict__attribute on each instance be a mutable mapping. This means it will not work with some types, such as metaclasses (since the__dict__attributes ontypeinstances are read-only proxies for the class namespace), and those that specify__slots__without including__dict__as one of the defined slots (as such classes don't provide a__dict__attribute at all).
Lib/functools.py
Outdated
| "Cannot use cached_property instance without calling __set_name__ on it.") | ||
| try: | ||
| cache = instance.__dict__ | ||
| except AttributeError: # objects with __slots__ have no __dict__ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Tweak the comment here along similar lines to the note in the docs
Lib/functools.py
Outdated
| # check if another thread filled cache while we awaited lock | ||
| val = cache.get(self.attrname, _NOT_FOUND) | ||
| if val is _NOT_FOUND: | ||
| val = cache[self.attrname] = self.func(instance) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Similar to the AttributeError above, it's likely worth wrapping TypeError here to account for cases like:
>>> str.__dict__["x"] = 1
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
TypeError: 'mappingproxy' object does not support item assignment
* master: (107 commits) bpo-22057: Clarify eval() documentation (pythonGH-8812) bpo-34318: Convert deprecation warnings to errors in assertRaises() etc. (pythonGH-8623) bpo-22602: Raise an exception in the UTF-7 decoder for ill-formed sequences starting with "+". (pythonGH-8741) bpo-34415: Updated logging.Formatter docstring. (pythonGH-8811) bpo-34432: doc Mention complex and decimal.Decimal on str.format not about locales (pythonGH-8808) bpo-34381: refer to 'Running & Writing Tests' in README.rst (pythonGH-8797) Improve error message when mock.assert_has_calls fails (pythonGH-8205) Warn not to set SIGPIPE to SIG_DFL (python#6773) bpo-34419: selectmodule.c does not compile on HP-UX due to bpo-31938 (pythonGH-8796) bpo-34418: Fix HTTPErrorProcessor documentation (pythonGH-8793) bpo-34391: Fix ftplib test for TLS 1.3 (pythonGH-8787) bpo-34217: Use lowercase for windows headers (pythonGH-8472) bpo-34395: Fix memory leaks caused by incautious usage of PyMem_Resize(). (pythonGH-8756) bpo-34405: Updated to OpenSSL 1.1.0i for Windows builds. (pythonGH-8775) bpo-34384: Fix os.readlink() on Windows (pythonGH-8740) closes bpo-34400: Fix undefined behavior in parsetok(). (pythonGH-4439) bpo-34399: 2048 bits RSA keys and DH params (python#8762) Make regular expressions in test_tasks.py raw strings. (pythonGH-8759) smtplib documentation fixes (pythonGH-8708) Fix misindented yaml in logging how to example (pythonGH-8604) ...
|
@ncoghlan Thanks for the comments! I think they are all addressed in the latest update. |
|
@ncoghlan: Please replace |
|
haha, why not remove the underscore or rename it into |
|
Thanks @carljm for this one, sensible to get it in stdlib since every man and his dog were writing their own versions. Is there any PyPI backport avail for people that want to use this stdlib implementation on older Python runtimes? Does the code use any datamodel features that may make copy-pasting the implementation unsafe (apart from the obvious f-strings)? |
|
@wimglenn It's not exactly a backport, since (like many other versions) it predated this PR, and it has some additional bells and whistles not found here, but I'd probably use https://pypi.org/project/cached-property/ |
https://bugs.python.org/issue21145