95

I have a Python program that works with dictionaries a lot. I have to make copies of dictionaries thousands of times. I need a copy of both the keys and the associated contents. The copy will be edited and must not be linked to the original (e.g. changes in the copy must not affect the original.)

Keys are Strings, Values are Integers (0/1).

I currently use a simple way:

newDict = oldDict.copy()

Profiling my Code shows that the copy operation takes most of the time.

Are there faster alternatives to the dict.copy() method? What would be fastest?

9
  • 1
    If the value can be either 0 or 1, would a bool be a better choice than an int? Commented May 2, 2011 at 19:31
  • 5
    And if you need thousands of copies of them, would bitmasks work even better? Commented May 2, 2011 at 19:38
  • @Samir isn't bool in Python named int anyway. Commented May 2, 2011 at 19:42
  • I agree, though, that a bitmask might be more efficient for you (depending on how you use this "dict", really). Commented May 2, 2011 at 19:42
  • 1
    To clarify, the bool type is actually a subclass (subtype?) of the int type. Commented May 2, 2011 at 20:04

7 Answers 7

69

Looking at the C source for the Python dict operations, you can see that they do a pretty naive (but efficient) copy. It essentially boils down to a call to PyDict_Merge:

PyDict_Merge(PyObject *a, PyObject *b, int override)

This does the quick checks for things like if they're the same object and if they've got objects in them. After that it does a generous one-time resize/alloc to the target dict and then copies the elements one by one. I don't see you getting much faster than the built-in copy().

Sign up to request clarification or add additional context in comments.

1 Comment

Sounds like I better rewrite the code to avoid the use of dicts at all - or use a faster data structure that can do the same job. Thank you very much for the answer!
60

Appearantly dict.copy is faster, as you say.

[utdmr@utdmr-arch ~]$ python -m timeit -s "d={1:1, 2:2, 3:3}" "new = d.copy()"
1000000 loops, best of 3: 0.238 usec per loop
[utdmr@utdmr-arch ~]$ python -m timeit -s "d={1:1, 2:2, 3:3}" "new = dict(d)"
1000000 loops, best of 3: 0.621 usec per loop
[utdmr@utdmr-arch ~]$ python -m timeit -s "from copy import copy; d={1:1, 2:2, 3:3}" "new = copy(d)"
1000000 loops, best of 3: 1.58 usec per loop

8 Comments

Thanks for the comparison! Will try to rewrite the code as to avoid the use of dict copying in most places. Thanks again!
The way to do the last comparison without counting the cost of doing the import every time is with timeit's -s argument: python -m timeit -s "from copy import copy" "new = copy({1:1, 2:2, 3:3})". While you're at it, pull out the dict creation as well (for all examples.)
Maybe repeat the processes many times is better since there might be some fluctuations of one specific shot.
Timeit does that; as it says it loops 1000000 times and averages it.
Image
I have conflicting timings. a = {b: b for b in range(10000)} In [5]: %timeit copy(a) 10000 loops, best of 3: 186 µs per loop In [6]: %timeit deepcopy(a) 100 loops, best of 3: 14.1 ms per loop In [7]: %timeit a.copy() 1000 loops, best of 3: 180 µs per loop
|
13

I realise this is an old thread, but this is a high result in search engines for "dict copy python", and the top result for "dict copy performance", and I believe this is relevant.

From Python 3.7, newDict = oldDict.copy() is up to 5.5x faster than it was previously. Notably, right now, newDict = dict(oldDict) does not seem to have this performance increase.

There is a little more information here.

Comments

12

Can you provide a code sample so I can see how you are using copy() and in what context?

You could use

new = dict(old)

But I dont think it will be faster.

Comments

4

Depending on things you leave to speculation, you may want to wrap the original dictionary and do a sort of copy-on-write.

The "copy" is then a dictionary which looks up stuff in the "parent" dictionary, if it doesn't already contain the key --- but stuffs modifications in itself.

This assumes that you won't be modifying the original and that the extra lookups don't end up costing more.

Comments

4

The measurments are dependent on the dictionary size though. For 10000 entries copy(d) and d.copy() are almost the same.

a = {b: b for b in range(10000)} 
In [5]: %timeit copy(a)
10000 loops, best of 3: 186 µs per loop
In [6]: %timeit deepcopy(a)
100 loops, best of 3: 14.1 ms per loop
In [7]: %timeit a.copy()
1000 loops, best of 3: 180 µs per loop

Comments

0

I just ran into this problem. I managed to switch to lists by separating the keys and the values, and it helped a lot.

Let's say I have:

d = {
    'a': 5
    'b': 6
}

And, I want to make lots of copies with different values but the same keys.

I can create a lookup table:

lookup_table = {
    'a': 0
    'b': 1
}

And, then use a list:

counts = [5, 6]

To get a value, I do:

index = lookup_table['a']
print(counts[index])

To do copy on write, I do something like:

index = lookup_table['a']
new_counts = counts.copy()
new_counts[index] -= 1

I tried going even further by using Python's native array module instead of lists, but it actually made it slightly slower for me. It may be appropriate if you're storing HUGE amounts of data, though.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.