Class with changing hash still works with dictionary access

Question

What I did is obviously not something that one would want to do, rather, I was just testing out implementing __hash__ for a given class.

I wanted to see if adding a phony 'hashable' class to a dictionary, then changing it's hash value would then result in it not being able to access it.

My class looks like this:

class PhonyHash:

    def __hash__(self):
        val = list("A string")
        return id(val)  # always different

Executing the following in my IPython console:

>>> p = PhonyHash()
>>> d = { p: "a value"}
>>> hash(p)  # changes hash

and then trying to access the element with d[p] works:

>>> d[p]
"a value"

I understand, this is not something that should be done, I'm really just curious as to why it works. Doesn't dict use the hash() of an object to store/retrieve it? Why is this working?

edit: as noted in the comments by @VPfB sets behave as expected, for some reason:

>>> p = PhonyHash()
>>> s = {p}
>>> p in s
False

The id of p does not change. That's also something to consider — Moses Koledoye
– Moses Koledoye, Commented Jul 14, 2016 at 12:16
That I am aware of but is not something that troubles me. ids of mutable objects also doesn't change after you mutate them. — root
– root, Commented Jul 14, 2016 at 12:20
It looks like this does not work reliable. If you add many other objects to the dict you will get an KeyError. I would assume that the mechanism that deals with hash-collisions "accidentally" makes this work. — syntonym
– syntonym, Commented Jul 14, 2016 at 12:21
It behaves as expected in a set. s={p} and then p in s returns False. — VPfB
– VPfB, Commented Jul 14, 2016 at 15:26

Community · Accepted Answer · 2020-06-20 09:12:55Z

5

This is a strange bit of fate. Several bits of CPython machinery have thwarted you. The three issues at play are:

The initial size of the array that backs a dict is 8 ^[1]
All objects in CPython have memory addresses that are modulo 8 ^[2]
The dict class has an optimisation that checks keys are the same object and stops there if true (otherwise it will check if they are equal according to the __eq__ method) ^[3]

What this means is that despite your object always producing different hash values, the first slot of the backing array that will be examined is the same. If it wasn't you'd get a key error because the slot was empty. The dict then decides it has the right key because it has the exact same object, not just an equal object.

class PhonyHash:
    _hash = 1
    def __hash__(self):
        return self._hash

p = PhonyHash()
d = {p: "val"}
print(p in d) # True
p._hash = 2
print(p in d) # False
p._hash = 9 # 9 % 8 == 1
print(p in d) # True

Links to CPython sources

The dict struct defines ma_table, which starts as ma_smalltable, which is of length PyDict_MinSize.
This is documented in Objects/obmalloc.c
Can be seen in the lookup function here and here

edited Jun 20, 2020 at 9:12

CommunityBot

11 silver badge

answered Jul 14, 2016 at 14:17

Dunes

42.5k7 gold badges89 silver badges107 bronze badges

Sign up to request clarification or add additional context in comments.

7 Comments

Moses Koledoye Over a year ago

You should see what OP's case: id(list('a string')) returns with multiple calls

Dunes Over a year ago

Hmm, very true. Just found out why this weird fluke works though.

Ilja Everilä Over a year ago

The id does actually change between calls in OP's case, at least on my python 3 installation, but: it doesn't change enough and with such a small dictionary the differences don't matter, as the larger hash is compacted to the dictionary's smaller size. Try using the PhonyHash object as key with a dictionary created with d = dict.fromkeys(range(1406185)).

Dunes Over a year ago

I found the reason why it changes and still works. The minimum size of the table that backs a dict is 8. And more often than not, objects have the same modulo 8 result. So despite the hash being different, the first place the dict looks is where the object is, they are equal by reference so the dict doesn't bother checking for hash equality or __eq__ equality.

wim Over a year ago

How about some cpython source links for 1. - 3. ?

|

VPfB · Accepted Answer · 2016-07-14 16:20:25Z

3

I have a possible explanation:

According to this source: http://www.laurentluce.com/posts/python-dictionary-implementation/ only few last bits of the hash are used when the table holding the dict elements is small.

The id() number is ususally a machine address and is very probably alligned to some memory address boundary. So the last few bits are always zero and not random at all. In the result, you are always hitting table[0] element.

Trying a different source of a random phony hash changes the situation and the expected KeyError is thrown.

EDIT: Dunes answered the question the same way and he was quicker than me.

edited Jul 14, 2016 at 16:20

answered Jul 14, 2016 at 15:49

VPfB

18.2k8 gold badges56 silver badges99 bronze badges

Collectives™ on Stack Overflow

Class with changing hash still works with dictionary access

2 Answers 2

Links to CPython sources

7 Comments

Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

Links to CPython sources

7 Comments

Comments

Your Answer

Sign up or log in

Post as a guest

Related