9

I've tried to always declare class attributes inside the __init__ for clarity and organizational reasons. Recently, I've learned that strictly following this practice has extra non-aesthetic benefits too thanks to PEP 412 being added for Python 3.3. Specifically, if all attributes are defined in the __init__, then the objects can reduce space by sharing their keys and hashes.

My question is, does object key-sharing happen when attributes are declared in a function that is called by __init__?

Here's an example:

class Dog:
    def __init__(self):
        self.height = 5
        self.weight = 25

class Cat:
    def __init__(self):
        self.set_shape()

    def set_shape(self):
        self.height = 2
        self.weight = 10

In this case, all instances of Dog would share the keys height and weight. Do instances of Cat also share the keys height and weight (among each other, not with Dogs of course).

As an aside, how would you test this?

Note that Brandon Rhodes said this about key-sharing in his Dictionary Even Mightier talk:

If a single key is added that is not in the prototypical set of keys, you loose the key sharing

1
  • 1
    Not answering your question directly, but PyCharm has inspections which warn if you create new attributes outside of __init__. So they, perhaps, consider it a bad practice. Commented Jun 22, 2017 at 17:16

2 Answers 2

9

I think you are referring to the following paragraph of the PEP (in the Split-Table dictionaries section):

When resizing a split dictionary it is converted to a combined table. If resizing is as a result of storing an instance attribute, and there is only instance of a class, then the dictionary will be re-split immediately. Since most OO code will set attributes in the __init__ method, all attributes will be set before a second instance is created and no more resizing will be necessary as all further instance dictionaries will have the correct size.

So a dictionary keys will remain shared, no matter what additions are made, before a second instance can be created. Doing so in __init__ is the most logical method of achieving this.

This does not mean that attributes set at a later time are not shared; they can still be shared between instances; so long as you don't cause any of the dictionaries to be combined. So after you create a second instance, the keys stop being shared only if any of the following happens:

  • a new attribute causes the dictionary to be resized.
  • a new attribute is not a string attribute (dictionaries are highly optimised for the common all-keys-are-strings case).
  • an attribute is inserted in a different order; for example a.foo = None is set first, and then second instance b sets b.bar = None first, here b has an incompatible insertion order, as the shared dictionary has foo first.
  • an attribute is deleted. This kills sharing even for one instance. Don't delete attributes if you care about shared dictionaries.

Python 3.11 improved shared-key dictionaries considerably, however. The values for the shared dictionary are inlined into an array as part of the instance, as long as there are fewer than 30 unique attributes (across all instances of a class), and deleting an attribute or inserting keys in a different order no longer affect dictionary key sharing.

So the moment you have two instances (and two dictionaries sharing keys), the keys won't be re-split as long as you don't trigger any of the above cases, your instances will continue to share keys.

It also means that delegating setting attributes to a helper method called from __init__ is not going to affect the above scenario, those attributes are still set before a second instance is created. After all __init__ won't be able to return yet before that second method has returned.

In other words, you should not worry too much about where you set your attributes. Setting them in the __init__ method lets you avoid combining scenarios more easily, but any attribute set before a second instance is created is guaranteed to be part of the shared keys.

There is no good way to detect if a dictionary is split or combined from Python, not reliably across versions. We can, however, access the C implementation details by using the ctypes module. Given a pointer to a dictionary and the C header definition of a dictionary, you can test if the ma_values field is NULL. If not, it is a shared dictionary:

import ctypes

class PyObject(ctypes.Structure):
    """Python object header"""
    _fields_ = [
        ("ob_refcnt", ctypes.c_ssize_t),
        ("ob_type", ctypes.c_void_p),  # PyTypeObject*
    ]

class PyDictObject(ctypes.Structure):
    """A dictionary object."""
    _fields_ = [
        ("ob_base", PyObject),
        ("ma_used", ctypes.c_ssize_t),
        ("ma_version_tag", ctypes.c_uint64),
        ("ma_keys", ctypes.c_void_p),  # PyDictKeysObject*
        ("ma_values", ctypes.c_void_p),  # PyObject** or PyDictValues*
    ]

Py_TPFLAGS_MANAGED_DICT = 1 << 4

def has_inlined_attributes(obj):
    """Test if an instance has inlined attributes (Python 3.11)"""
    if not type(obj).__flags__ & Py_TPFLAGS_MANAGED_DICT:
        return False
    # the (inlined) values pointer is stored in the pre-header at offset -4
    # (-3 is the dict pointer, remainder is the GC header)
    return bool(ctypes.cast(id(a), ctypes.POINTER(ctypes.c_void_p))[-4])

def is_shared(d):
    """Test if the __dict__ of an instance is a PEP 412 shared dictionary"""
    # Python 3.11 inlines the (shared dictionary) values as an array, unless you
    # access __dict__. Don't clobber the inlined values.
    if has_inlined_attributes(d):
        return True
    cdict = ctypes.cast(id(d.__dict__), ctypes.POINTER(PyDictObject)).contents
    # if the ma_values pointer is not null, it's a shared dictionary
    return bool(cdict.ma_values)

A quick demo (using Python 3.10):

>>> class Foo:
...     pass
...
>>> a, b = Foo(), Foo()  # two instances
>>> is_shared(a), is_shared(b)  # they both share the keys
(True, True)
>>> a.bar = 'baz'  # adding a single key
>>> is_shared(a), is_shared(b)  # no change, the keys are still shared!
(True, True)
>>> a.spam, a.ham, a.monty, a.eric = (
...     'eggs', 'eggs and spam', 'python',
...     'idle')  # more keys still
>>> is_shared(a), is_shared(b)  # no change, the keys are still shared!
(True, True)
>>> a.holy, a.bunny, a.life = (
...     'grail', 'of caerbannog',
...     'of brian')  # more keys, resize time
>>> is_shared(a), is_shared(b)  # oops, we killed it
(False, True)

Only when the threshold was reached (for an empty dictionary with 8 spare slots, the resize takes place when you add a 6th key), did the dictionary for instance a loose the shared property. (Later Python releases may push that resize point out further).

Dictionaries are resized when they are about 2/3rd full, and a resize generally doubles the table size. So the next resize will take place when the 11th key is added, then at 22, then 43, etc. So for a large instance dictionary, you have a lot more breathing room.

For Python 3.11, it takes a little longer still before is_shared() returns False; you need to insert 30 attributes:

>>> import sys, secrets
>>> sys.version_info
sys.version_info(major=3, minor=11, micro=0, releaselevel='final', serial=0)
>>> class Foo: pass
...
>>> a = Foo()
>>> count = 0
>>> while is_shared(a):
...     count += 1
...     setattr(a, secrets.token_urlsafe(), 42)
...
>>> count
30
Sign up to request clarification or add additional context in comments.

6 Comments

@NathanielSaul: I've clarified my answer. The point is that the 'cut-off point' is creating a second instance, at which point all keys that already exist in the shared table are seen as the 'low water mark'. Any new keys added to this can cause the shared dictionary to be turned into a regular (combined) dictionary, but only if it is resized.
Image
Thinking about the 'cut-off point' like that makes it much clearer. Thank you!
Thanks Martjin. Running the code in the demo, I got different results in Python3.10.6. I think things have changed right?
@S.B: a lot changed in 3.10 and especially in 3.11, and the shared() function is no longer an accurate detector of shared dictionaries. I'm looking into finding other methods. One thing I noticed in 3.11, for example, is that the the Foo instance __dict__ object is empty but larger than a plain {} or dict(vars(instance)) object. That's almost certainly the new bitvector added to allow attributes to be set in arbitrary order.
@S.B: Right. The sys.getsizeof() method was never reliable. I've replaced it with ctypes-facilitated access to the C implementation details instead. The new version of is_shared() now correctly reports on a dictionary being shared across Python versions up to and including 3.11 (I'm not making any claims about future 3.12 compatibility at this point).
|
7

does object key-sharing happen when attributes are declared in a function that is called by __init__?

Yes, regardless of where you set the attributes from, granted that after initialization both have the same set of keys, instance dictionaries use a shared-key dictionary implementation. Both cases presented have a reduced memory footprint.

You can test this by using sys.getsizeof to grab the size of the instance dictionary and then compare it with a similar dict created from it. dict.__sizeof__'s implementation discriminates based on this to return different sizes:

# on 64bit version of Python 3.6.1
print(sys.getsizeof(vars(c)))
112
print(getsizeof(dict(vars(c))))
240

so, to find out, all you need to do is compare these.

As for your edit:

"If a single key is added that is not in the prototypical set of keys, you loose the key sharing"

Correct, this is one of the two things I've (currently) found that break the shared-key usage:

  1. Using a non-string key in the instance dict. This can only be done in silly ways. (You could do it using vars(inst).update)
  2. The contents of the dictionaries of two instances of the same class deviating, this can be done by altering instance dictionaries. (single key added to that is not in the prototypical set of keys)

    I'm not certain if this happens when a single key is added, this is an implementation detail that might change. (addendum: see Martijn's comments)

For a related discussion on this see a Q&A I did here: Why is the __dict__ of instances so small in Python 3?

Both these things will cause CPython to use a 'normal' dictionary instead. This, of course, is an implementation detail that shouldn't be relied upon. You might or might not find it in other implementations of Python and or future versions of CPython.

8 Comments

Image
You do still have to add the attribute in __init__ first though, right?
Image
@NathanielSaul if you want to. Point is, it makes no difference where you add them from as long as they have string keys and the dictionary of two instances doesn't deviate much in content, you get a shared-key dict.
Image
@NathanielSaul it all has to do with the contents for two given instances. After __init__, if you don't have any conditional branches that might lead to different attributes getting set, you get a shared-key dict. If you deviate, you'll fall-back to a plain dict (which actually isn't that bad memory wise with 3.6 that gives us a memory efficient implementation)
It will happen with a single key. What happens is: any mutation to the set of keys for the dictionary will cause it to be 'combined' (separated from the per-type shared keys). But if that dictionary is the only __dict__ instance namespace, it'll immediately be re-split. So as long as there is only one instance, it doesn't matter how often you mutate the keys (add or remove), nor does the type of key matter. It'll be re-split each time.
Hah, indeed, adding a non-string key can kill sharing instantly. But only for that one instance. A resize kills it for all instances.
|

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.