10

What I did is obviously not something that one would want to do, rather, I was just testing out implementing __hash__ for a given class.

I wanted to see if adding a phony 'hashable' class to a dictionary, then changing it's hash value would then result in it not being able to access it.

My class looks like this:

class PhonyHash:

    def __hash__(self):
        val = list("A string")
        return id(val)  # always different

Executing the following in my IPython console:

>>> p = PhonyHash()
>>> d = { p: "a value"}
>>> hash(p)  # changes hash

and then trying to access the element with d[p] works:

>>> d[p]
"a value"

I understand, this is not something that should be done, I'm really just curious as to why it works. Doesn't dict use the hash() of an object to store/retrieve it? Why is this working?

edit: as noted in the comments by @VPfB sets behave as expected, for some reason:

>>> p = PhonyHash()
>>> s = {p}
>>> p in s
False
8
  • The id of p does not change. That's also something to consider Commented Jul 14, 2016 at 12:16
  • That I am aware of but is not something that troubles me. ids of mutable objects also doesn't change after you mutate them. Commented Jul 14, 2016 at 12:20
  • 1
    It looks like this does not work reliable. If you add many other objects to the dict you will get an KeyError. I would assume that the mechanism that deals with hash-collisions "accidentally" makes this work. Commented Jul 14, 2016 at 12:21
  • 1
    It behaves as expected in a set. s={p} and then p in s returns False. Commented Jul 14, 2016 at 15:26
  • 1
    What is p.hash I don't see this method defined Commented Jul 14, 2016 at 15:42

2 Answers 2

5

This is a strange bit of fate. Several bits of CPython machinery have thwarted you. The three issues at play are:

  1. The initial size of the array that backs a dict is 8 [1]
  2. All objects in CPython have memory addresses that are modulo 8 [2]
  3. The dict class has an optimisation that checks keys are the same object and stops there if true (otherwise it will check if they are equal according to the __eq__ method) [3]

What this means is that despite your object always producing different hash values, the first slot of the backing array that will be examined is the same. If it wasn't you'd get a key error because the slot was empty. The dict then decides it has the right key because it has the exact same object, not just an equal object.

class PhonyHash:
    _hash = 1
    def __hash__(self):
        return self._hash

p = PhonyHash()
d = {p: "val"}
print(p in d) # True
p._hash = 2
print(p in d) # False
p._hash = 9 # 9 % 8 == 1
print(p in d) # True

Links to CPython sources

  1. The dict struct defines ma_table, which starts as ma_smalltable, which is of length PyDict_MinSize.
  2. This is documented in Objects/obmalloc.c
  3. Can be seen in the lookup function here and here
Sign up to request clarification or add additional context in comments.

7 Comments

You should see what OP's case: id(list('a string')) returns with multiple calls
Hmm, very true. Just found out why this weird fluke works though.
The id does actually change between calls in OP's case, at least on my python 3 installation, but: it doesn't change enough and with such a small dictionary the differences don't matter, as the larger hash is compacted to the dictionary's smaller size. Try using the PhonyHash object as key with a dictionary created with d = dict.fromkeys(range(1406185)).
I found the reason why it changes and still works. The minimum size of the table that backs a dict is 8. And more often than not, objects have the same modulo 8 result. So despite the hash being different, the first place the dict looks is where the object is, they are equal by reference so the dict doesn't bother checking for hash equality or __eq__ equality.
How about some cpython source links for 1. - 3. ?
|
3

I have a possible explanation:

According to this source: http://www.laurentluce.com/posts/python-dictionary-implementation/ only few last bits of the hash are used when the table holding the dict elements is small.

The id() number is ususally a machine address and is very probably alligned to some memory address boundary. So the last few bits are always zero and not random at all. In the result, you are always hitting table[0] element.

Trying a different source of a random phony hash changes the situation and the expected KeyError is thrown.


EDIT: Dunes answered the question the same way and he was quicker than me.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.