36

Today in class, we learned that retrieving an element from a list is O(1) in Python. Why is this the case? Suppose I have a list of four items, for example:

li = ["perry", 1, 23.5, "s"]

These items have different sizes in memory. And so it is not possible to take the memory location of li[0] and add three times the size of each element to get the memory location of li[3]. So how does the interpreter know where li[3] is without having to traverse the list in order to retrieve the element?

10
  • 15
    What makes you think that arrays are linearly allocated rather than a list of pointers. - [me confused by your profile description] Commented Oct 7, 2018 at 2:37
  • 29
    Don't confuse item access, which is O(1), with item lookup / search, which is O(n). Commented Oct 7, 2018 at 2:39
  • 6
    Some relevant reading material: Python list implementation, How is Python's List Implemented?, What is the underlying data structure for Python lists?. Commented Oct 7, 2018 at 2:48
  • 2
    It is not good to ask the same question on two different SE sites (when answers will be within the same context). Commented Oct 7, 2018 at 14:51
  • 2
    Voting to close as off-topic, since this is language-specific and it's already been answered on Stack Overflow. Commented Oct 7, 2018 at 15:46

3 Answers 3

59

A list in Python is implemented as an array of pointers1. So, what's really happening when you create the list:

["perry", 1, 23.5, "s"]

is that you are actually creating an array of pointers like so:

[0xa3d25342, 0x635423fa, 0xff243546, 0x2545fade]

Each pointer "points" to the respective objects in memory, so that the string "perry" will be stored at address 0xa3d25342 and the number 1 will be stored at 0x635423fa, etc.

Since all pointers are the same size, the interpreter can in fact add 3 times the size of an element to the address of li[0] to get to the pointer stored at li[3].


1 Get more details from: the horse's mouth (CPython source code on GitHub).

Sign up to request clarification or add additional context in comments.

10 Comments

@DmitryVerhoturov That's right but makes no practical difference for this answer. References are reference-counted, docs.python.org/3/c-api/structures.html#c.PyVarObject
Reference is implemented as pointers in every language I know. Semantics may differ slightly (e.g. differences in memory management, or references in C++ being immutable), but in the end they are still pointers.
@TLW I've never seen those before. Where'd you find them?
@Brian Ahh, that makes sense. If I may expound for those who were as curious as I was, those numbers are useful for firmware designers who are doing combinatorial logic within the chips themselves. Big-oh analysis is always done with respect to some abstract machine, and when you're doing firmware, modeling time as 'gate depth' or 'wire distance' is reasonable. For anyone doing software (especially Python and other interpreted languages), it's more useful to do the big-Oh analysis based on an abstract machine where accessing memory takes some fixed number of cycles, hence the O(1)
@TLW That simplicity is important for developers who will never operate in environments where they are concerned with performance as the working set approaches exabytes, but for which there is a substantial performance difference between an O(n) and O(n log n) algorithm with respect to their simplified computational model. The simplified model does a good job of focusing attention on the most important aspects of the algorithm.
|
17

When you say a = [...], a is effectively a pointer to a PyObject containing an array of pointers to PyObjects.

When you ask for a[2], the interpreter first follows the pointer to the list's PyObject, then adds 2 to the address of the array inside it, then returns that pointer. The same happens if you ask for a[0] or a[9999].

Basically, all Python objects are accessed by reference instead of by value, even integer literals like 2. There are just some tricks in the pointer system to keep this all efficient. And pointers have a known size, so they can be stored conveniently in C-style arrays.

2 Comments

what is ''terp''?
@hkBst I infer that it's short for "interpreter".
8

Short answer: Python lists are arrays.

Long answer: The computer science term list usually means either a singly-linked list (as used in functional programming) or a doubly-linked list (as used in procedural programming). These data structures support O(1) insertion at either the head of the list (functionally) or at any position that does not need to be searched for (procedurally). A Python ``list'' has none of these characteristics. Instead it supports (amortized) O(1) appending at the end of the list (like a C++ std::vector or Java ArrayList). Python lists are really resizable arrays in CS terms.

The following comment from the Python documentation explains some of the performance characteristics of Python ``lists'':

It is also possible to use a list as a queue, where the first element added is the first element retrieved (“first-in, first-out”); however, lists are not efficient for this purpose. While appends and pops from the end of list are fast, doing inserts or pops from the beginning of a list is slow (because all of the other elements have to be shifted by one).

5 Comments

I’ve never heard of singly-linked lists being specifically associated with functional programming, or doubly-linked lists being specifically associated with procedural programming. Both types of list are valid and have their use-cases for both programming paradigms (and other programming paradigms besides). Can you back up that claim? I find it quite dubious.
@KRyan I'm pretty sure that Lisp, Haskell, Ocaml are all generally using singly-linked lists, especially with the more convenient primitives in the languages. Lisp in particular has a bunch of shorthand like car/cdr for getting the various parts of the list elements. Of course they're used everywhere else, but, Lisp and functional company often makes much heavier use of them. C++'s list, for example is a doubly linked list, and only recently did they get a forward_list(which is singly typed)
This is a good answer, but I agree that the claim about list implementations in functional vs procedural languages seems to be too general. Whether an abstract list data type is implemented as an array or linked list isn't really a part of the language specification in high level languages, is it? I suppose it's possible to make a Lisp runtime where lists are implemented as arrays, like in cpython?
@HåkenLid: performance characteristics are often part of the specification of a data type, especially for languages that are more serious about performance. For example see this Q&A about C++. I am not aware of such an explicit list for Python, but you can get a hint from the interface exposed by the standard ''list'' type: there is append and extend but there is no prepend/cons.
@HåkenLid: where the docs are silent the fallback position is that the CPython implementation is the de facto specification of Python, although apparently other list implementations do get discussed.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.