I have a class with both an __iter__ and a __len__ methods. The latter uses the former to count all elements.
It works like the following:
class A:
def __iter__(self):
print("iter")
for _ in range(5):
yield "something"
def __len__(self):
print("len")
n = 0
for _ in self:
n += 1
return n
Now if we take e.g. the length of an instance it prints len and iter, as expected:
>>> len(A())
len
iter
5
But if we call list() it calls both __iter__ and __len__:
>>> list(A())
len
iter
iter
['something', 'something', 'something', 'something', 'something']
It works as expected if we make a generator expression:
>>> list(x for x in A())
iter
['something', 'something', 'something', 'something', 'something']
I would assume list(A()) and list(x for x in A()) to work the same but they don’t.
Note that it appears to first call __iter__, then __len__, then loop over the iterator:
class B:
def __iter__(self):
print("iter")
def gen():
print("gen")
yield "something"
return gen()
def __len__(self):
print("len")
return 1
print(list(B()))
Output:
iter
len
gen
['something']
How can I get list() not to call __len__ so that my instance’s iterator is not consumed twice? I could define e.g. a length or size method and one would then call A().size() but that’s less pythonic.
I tried to compute the length in __iter__ and cache it so that subsequent calls to __len__ don’t need to iter again but list() calls __len__ without starting to iterate so it doesn’t work.
Note that in my case I work on very large data collections so caching all items is not an option.
__len__should be only defined for types for which it can be made idempotent. That you have a generator means that perhaps you shouldn't really have__len__.