18

I have a class with both an __iter__ and a __len__ methods. The latter uses the former to count all elements.

It works like the following:

class A:
    def __iter__(self):
        print("iter")
        for _ in range(5):
            yield "something"

    def __len__(self):
        print("len")
        n = 0
        for _ in self:
            n += 1
        return n

Now if we take e.g. the length of an instance it prints len and iter, as expected:

>>> len(A())
len
iter
5

But if we call list() it calls both __iter__ and __len__:

>>> list(A())
len
iter
iter
['something', 'something', 'something', 'something', 'something']

It works as expected if we make a generator expression:

>>> list(x for x in A())
iter
['something', 'something', 'something', 'something', 'something']

I would assume list(A()) and list(x for x in A()) to work the same but they don’t.

Note that it appears to first call __iter__, then __len__, then loop over the iterator:

class B:
    def __iter__(self):
        print("iter")

        def gen():
            print("gen")
            yield "something"

        return gen()

    def __len__(self):
        print("len")
        return 1

print(list(B()))

Output:

iter
len
gen
['something']

How can I get list() not to call __len__ so that my instance’s iterator is not consumed twice? I could define e.g. a length or size method and one would then call A().size() but that’s less pythonic.

I tried to compute the length in __iter__ and cache it so that subsequent calls to __len__ don’t need to iter again but list() calls __len__ without starting to iterate so it doesn’t work.

Note that in my case I work on very large data collections so caching all items is not an option.

11
  • Why does len implementation must call iter? Does iter genetare new data each time it's called? Commented May 12, 2016 at 15:06
  • @Daniel No it’s always the same data but it must iterate over it to get its length; we don’t know it in advance. Commented May 12, 2016 at 15:14
  • In what stage of A instance do you know? on init? on a setter method? Commented May 12, 2016 at 15:18
  • @Daniel I won’t know until I iterate over the data. Each class instance parses one file; I can’t know how many elements there are in that file without parsing it. Commented May 12, 2016 at 15:26
  • 4
    __len__ should be only defined for types for which it can be made idempotent. That you have a generator means that perhaps you shouldn't really have __len__. Commented May 12, 2016 at 15:44

2 Answers 2

13

It's a safe bet that the list() constructor is detecting that len() is available and calling it in order to pre-allocate storage for the list.

Your implementation is pretty much completely backwards. You are implementing __len__() by using __iter__(), which is not what Python expects. The expectation is that len() is a fast, efficient way to determine the length in advance.

I don't think you can convince list(A()) not to call len. As you have already observed, you can create an intermediate step that prevents len from being called.

You should definitely cache the result, if the sequence is immutable. If there are as many items as you speculate, there's no sense computing len more than once.

Sign up to request clarification or add additional context in comments.

3 Comments

Thank you that makes sense.
Anecdote: I once implemented __len__ as return len(list(iter(self))), and discovered this was a Very Bad Idea when my test coverage tracking stopped working. Turns out, list(foo) calls __len__ which calls list(), which calls __len__ etc. until there's a MaximumRecursionError -- which shuts down coverage tracking -- and then list() suppresses that error and assumes __len__ is not available. Slow and with unexpected side effects!
Just call sys.setrecursionlimit() with a low number to make it go faster. :-) :-) :-)
-2

You don't have to implement __len__. For an class that is iterable, it just needs to implement either of below:

  • __iter__, which returns an iterator, or a generator as in your class A & B
  • __getitems__, as long as it raises IndexError when the index is out of range

Blow code still works:

class A:
    def __iter__(self):
        print("iter")
        for _ in range(5):
            yield "something"

print list(A())

Which outputs:

iter
['something', 'something', 'something', 'something', 'something']

4 Comments

I don’t have to but I want to be able to get the size of my data, and defining __len__ allows me to use len(A()) instead of e.g. len(list(A())). (ETA: also list(A()) won’t work if I my instances yields millions of large objects, I need to write some code to consume the iterator and increment a counter for that. __len__ seemed like a good place to do that)
@bfontaine, then I'm a little confused with your requirement. do you just want to get size of the data, or also iterator/consume the data (at some point)?
Both. Iterating over the data is the main requirement, but being able to get its size directly would be a great addition that’ll save a few keystrokes when I play with the data in the REPL.
ok, I don't see a way to avoid calling __len__ in list(A()) (when you have that method implementation).

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.