Efficiently accessing arbitrarily deep dictionaries

Question

Suppose I have a multi-level dictionary like this

mydict = {
    'first': {
        'second': {
            'third': {
                'fourth': 'the end'
             }
         }
     }
}

I'd like to access it like this

test = get_entry(mydict, 'first.second.third.fourth')

What I have so far is

def get_entry(dict, keyspec):
    keys = keyspec.split('.')

    result = dict[keys[0]]
    for key in keys[1:]:
       result = dict[key]

    return result

Are there more efficient ways to do it? According to %timeit the runtime of the function is 1.26us, while accessing the dictionary the standard way like this

foo = mydict['first']['second']['third']['fourth']

takes 541ns. I'm looking for ways to trim it to 800ns range if possible.

Thanks

Are all your intermediary dictionaries of length one? If they are, you can use a tuple key fairly efficiently. — jpp
– jpp, Commented Apr 20, 2018 at 16:29
@theausome - that answer "...doesn't seem to work on nested dicts." — JacobIRR
– JacobIRR, Commented Apr 20, 2018 at 16:37
You have to make a few trade-offs if you want to boost performance. What is more likely to change more often - the dictionary you're traversing or the dot notation string you use to traverse? If both are frequently changing and of the same importance you're not going to get much faster than presented in @tdelaney solution. — zwer
– zwer, Commented Apr 25, 2018 at 15:46

tdelaney · Accepted Answer · 2018-04-20 17:42:02Z

13

I got a 20% performance boost by tightening up the code a bit but a whopping 400% increase by using a cache for split strings. That only makes a difference if you use the same spec multiple times. Here are sample implementations and a profile script to test.

test.py

mydict = {
    'first': {
        'second': {
            'third': {
                'fourth': 'the end'
             }
         }
     }
}

# original
def get_entry(dict, keyspec):
    keys = keyspec.split('.')

    result = dict[keys[0]]
    for key in keys[1:]:
       result = result[key]

    return result

# tighten up code
def get_entry_2(mydict, keyspec):
    for key in keyspec.split('.'):
        mydict = mydict[key]
    return mydict

# use a cache
cache = {}
def get_entry_3(mydict, keyspec):
    global cache
    try:
        spec = cache[keyspec]
    except KeyError:
        spec = tuple(keyspec.split('.'))
        cache[keyspec] = spec

    for key in spec:
        mydict = mydict[key]
    return mydict

if __name__ == "__main__":
    test = get_entry(mydict, 'first.second.third.fourth')
    print(test)

profile.py

from timeit import timeit
print("original get_entry")
print(timeit("get_entry(mydict, 'first.second.third.fourth')",
    setup="from test import get_entry, mydict"))

print("get_entry_2 with tighter code")
print(timeit("get_entry_2(mydict, 'first.second.third.fourth')",
    setup="from test import get_entry_2, mydict"))

print("get_entry_3 with cache of split spec")
print(timeit("get_entry_3(mydict, 'first.second.third.fourth')",
    setup="from test import get_entry_3, mydict"))

print("just splitting a spec")
print(timeit("x.split('.')", setup="x='first.second.third.fourth'"))

The timing on my machine is

original get_entry
4.148535753000033
get_entry_2 with tighter code
3.2986323120003362
get_entry_3 with cache of split spec
1.3073233439990872
just splitting a spec
1.0949148639992927

Notice that splitting the spec is a comparatively expensive operation for this function. That's why caching helps.

answered Apr 20, 2018 at 17:42

tdelaney

78.1k6 gold badges92 silver badges129 bronze badges

Sign up to request clarification or add additional context in comments.

10 Comments

kabanus Over a year ago

Looks like you're the only one who paid attention to performance.

coldspeed95 Over a year ago

@kabanus I don't get what you mean. You can get nano-second level performance with my solution as long as you pre-process your data once. Whether that can be done or not is on OP, not me.

tdelaney Over a year ago

@COLDSPEED I think the choice between yours and mine is whether lots of queries are done on one dataset or a few queries are done on many datasets.

coldspeed95 Over a year ago

Yes, there are tradeoffs :)

kabanus Over a year ago

@cᴏʟᴅsᴘᴇᴇᴅ yes :) I was biased against you because it seems like cheating, but looking back, I guess I was just jealous.

|

coldspeed95 · Accepted Answer · 2018-04-21 03:53:17Z

12

There's really only one solution. Rebuild your dictionary. But do it just once.

def recursive_flatten(mydict):
    d = {}
    for k, v in mydict.items():
        if isinstance(v, dict):
            for k2, v2 in recursive_flatten(v).items():
                d[k + '.' + k2] = v2 
        else:
            d[k] = v
    return d

In [786]: new_dict = recursive_flatten(mydict); new_dict
Out[786]: {'first.second.third.fourth': 'the end'}

(Some more tests)

In [788]: recursive_flatten({'x' : {'y' : 1, 'z' : 2}, 'y' : {'a' : 5}, 'z' : 2})
Out[788]: {'x.y': 1, 'x.z': 2, 'y.a': 5, 'z': 2}

In [789]: recursive_flatten({'x' : 1, 'y' : {'x' : 234}})
Out[789]: {'x': 1, 'y.x': 234}

Every access becomes constant time from here on.

Now, just access your value using new_dict['first.second.third.fourth']. Should work for any arbitrarily nested dictionary that does not contain a self-reference.

Note that every solution has its fair share of tradeoffs, this is no exception. Unless you're firing millions of queries at your data such that preprocessing is an acceptable overhead, then this is it. With the other solutions, you are only sidestepping the problem instead of addressing it - which is dealing with the dictionary's structure. OTOH, if you're going to do this once on many such similar data structures, it make no sense to preprocess just for a single query, in which case you may prefer one of the other solutions.

edited Apr 21, 2018 at 3:53

answered Apr 20, 2018 at 16:38

coldspeed95

407k106 gold badges746 silver badges799 bronze badges

8 Comments

user3483203 Over a year ago

Just a note that this seems to only allow access to the final level of nesting, you wouldn't for example be able to access new_dict['first.second']

coldspeed95 Over a year ago

@chrisz If needed, that can be fixed by caching res = recursive_flatten(v), updating d with d.update(res), and then iterating over res in a similar manner.

kabanus Over a year ago

Using a dict directly really is the only fast solution.

kabanus Over a year ago

Though in terms of space, your (extended in comments) solution would not scale nicely (read linearly).

Chris_Rands Over a year ago

I believe this could be a good dupe target, but since you placed the bounty, I thought to ask? stackoverflow.com/questions/14692690/…

|

user3483203 · Accepted Answer · 2018-04-25 15:20:22Z

I updated the answer from How to use a dot "." to access members of dictionary? to use an initial conversion which will then work for nested dictionaries:

You can use the following class to allow dot-indexing of dictionaries:

class dotdict(dict):
    """dot.notation access to dictionary attributes"""
    __getattr__ = dict.get
    __setattr__ = dict.__setitem__
    __delattr__ = dict.__delitem__

However, this only supports nesting if all nested dictionaries are also of type dotdict. That's where the following helper function comes in:

def dct_to_dotdct(d):
    if isinstance(d, dict):
        d = dotdict({k: dct_to_dotdct(v) for k, v in d.items()})
    return d

This function has to be run once on your nested dictionary, and the result can then be indexed using dot-indexing.

Here are some examples:

In [13]: mydict
Out[13]: {'first': {'second': {'third': {'fourth': 'the end'}}}}

In [14]: mydict = dct_to_dotdct(mydict)

In [15]: mydict.first.second
Out[15]: {'third': {'fourth': 'the end'}}

In [16]: mydict.first.second.third.fourth
Out[16]: 'the end'

A note about performance: this answer is slow compared to standard dictionary access, I just wanted to present an option that actually used "dot access" to a dictionary.

kabanus · Accepted Answer · 2018-04-20 21:10:49Z

7

Here is a solution similar to chrisz's, but you do not have to anything to your dict a-prior. :

class dictDotter(dict):
    def __getattr__(self,key):
        val = self[key]
        return val if type(val) != dict else dictDotter(val)

and just x=dictDotter(originalDict) will let you have arbitrary dot getting (`x.first.second...). I'll note this is twice as slow as chrisz solution, and his is 9 times as slow as yours (on my machine, approximately).

So, if you insist on making this work @tdelaney seems to have provided the only real performance improvement.

Another option that does better than what you have (in terms of run time):

class dictObjecter:
    def __init__(self,adict):
        for k,v in adict.items():
            self.__dict__[k] = v
            if type(v) == dict: self.__dict__[k] = dictObjecter(v)

which will make an object out of your dict, so dot notation is usual. This will improve run time to 3 times what you have, so not bad, but at the cost of going over your dict, and replacing it with something else.

Here is the total testing code:

from timeit import timeit

class dictObjecter:
    def __init__(self,adict):
        for k,v in adict.items():
            self.__dict__[k] = v
            if type(v) == dict: self.__dict__[k] = dictObjecter(v)

class dictDotter(dict):
    def __getattr__(self,key):
        val = self[key]
        return val if type(val) != dict else dictDotter(val)

def get_entry(dict, keyspec):
    keys = keyspec.split('.')

    result = dict[keys[0]]
    for key in keys[1:]:
        result = result[key]

    return result

class dotdict(dict):
    """dot.notation access to dictionary attributes"""
    __getattr__ = dict.get
    __setattr__ = dict.__setitem__
    __delattr__ = dict.__delitem__

def dct_to_dotdct(d):
    if isinstance(d, dict):
        d = dotdict({k: dct_to_dotdct(v) for k, v in d.items()})
    return d

x = {'a':{'b':{'c':{'d':1}}}}
y = dictDotter(x)
z = dct_to_dotdct(x)
w = dictObjecter(x)
print('{:15} : {}'.format('dict dotter',timeit('y.a.b.c.d',globals=locals(),number=1000)))
print('{:15} : {}'.format('dot dict',timeit('z.a.b.c.d',globals=locals(),number=1000)))
print('{:15} : {}'.format('dict objecter',timeit('w.a.b.c.d',globals=locals(),number=1000)))
print('{:15} : {}'.format('original',timeit("get_entry(x,'a.b.c.d')",globals=locals(),number=1000)))
print('{:15} : {:.20f}'.format('best ref',timeit("x['a']['b']['c']['d']",globals=locals(),number=1000)))

I provided the last regular lookup as a best reference.The results on a Windows Ubuntu subsystem:

dict dotter     : 0.0035500000003594323
dot dict        : 0.0017939999997906853
dict objecter   : 0.00021699999979318818
original        : 0.0006629999998040148
best ref        : 0.00007999999979801942

so the is objectified dict is 3 times as slow as a regular dictionary lookup - so if speed is important, why would you want this?

edited Apr 20, 2018 at 21:10

answered Apr 20, 2018 at 20:37

kabanus

26.3k7 gold badges48 silver badges80 bronze badges

3 Comments

coldspeed95 Over a year ago

No answer here has actually paid attention to performance, including the answer you claimed had. None of these solutions are any good if there are to be millions of accesses - it all adds up.

kabanus Over a year ago

@cᴏʟᴅsᴘᴇᴇᴅ Hey, at least give me the "nice effort" consideration. I was trying thing that actually need a .a.b.c.d to access deeper into the maze.

coldspeed95 Over a year ago

Okay, you get a "nice effort" consideration from me (+1). I do like your answer, it, like all the other answers, certainly has its merits over mine.

ramazan polat · Accepted Answer · 2018-05-08 12:34:11Z

I had the same need, so I created the Prodict.

For your case, you can do it in one line:

mydict = {
    'first': {
        'second': {
            'third': {
                'fourth': 'the end'
             }
         }
     }
}
dotdict = Prodict.from_dict(mydict)
print(dotdict.first.second.third.fourth) # "the end"

After that, use dotdict just like a dict, because it is a subclass of dict:

dotdict.first == dotdict['first'] # True

You can also add more keys dynamically with dot notation:

dotdict.new_key = 'hooray'
print(dotdict.new_key) # "hooray"

It works even if the new keys are nested dictionaries:

dotdict.it = {'just': 'works'}
print(dotdict.it.just)  # "works"

Lastly, if you define your keys beforehand, you get auto completion and auto type conversion:

class User(Prodict):
    user_id: int
    name: str

user = User(user_id="1", "name":"Ramazan")
type(user.user_id) # <class 'int'>
# IDE will be able to auto complete 'user_id' and 'name' properties

UPDATE:

This is the test result for the same code written by @kabanus:

x = {'a': {'b': {'c': {'d': 1}}}}
y = dictDotter(x)
z = dct_to_dotdct(x)
w = dictObjecter(x)
p = Prodict.from_dict(x)

print('{:15} : {}'.format('dict dotter', timeit('y.a.b.c.d', globals=locals(), number=10000)))
print('{:15} : {}'.format('prodict', timeit('p.a.b.c.d', globals=locals(), number=10000)))
print('{:15} : {}'.format('dot dict', timeit('z.a.b.c.d', globals=locals(), number=10000)))
print('{:15} : {}'.format('dict objecter', timeit('w.a.b.c.d', globals=locals(), number=10000)))
print('{:15} : {}'.format('original', timeit("get_entry(x,'a.b.c.d')", globals=locals(), number=10000)))
print('{:15} : {:.20f}'.format('prodict getitem', timeit("p['a']['b']['c']['d']", globals=locals(), number=10000)))
print('{:15} : {:.20f}'.format('best ref', timeit("x['a']['b']['c']['d']", globals=locals(), number=10000)))

And results:

dict dotter     : 0.04535976458466595
prodict         : 0.02860781018446784
dot dict        : 0.019078164088831673
dict objecter   : 0.0017378700050722368
original        : 0.006594238310349346
prodict getitem : 0.00510931794975705289
best ref        : 0.00121740293554022105

As you can see, its performance is between "dict dotter" and "dot dict". Any performance enhancement suggestion will be appreciated.

Vishal Khichadiya · Accepted Answer · 2018-04-28 08:02:13Z

1

The code should be less iterative and more dynamic!!

data

mydict = {
    'first': {
        'second': {
            'third': {
                'fourth': 'the end'
             }
         }
     }
}

Function

def get_entry(dict, keyspec):
    for keys in keyspec.split('.'):
        dict = dict[keys]
    return dict

call the function

res = get_entry(mydict, 'first.second.third.fourth')

this will take less time to execute even it's a dynamic code execution!!

answered Apr 28, 2018 at 8:02

Vishal Khichadiya

4303 silver badges16 bronze badges

5 Comments

coldspeed95 Over a year ago

I fail to see how this is remotely different from OP's solution which they didn't want.

Vishal Khichadiya Over a year ago

As you see there's no use of extra variables to store values that leads it to save time to execute and the time difference is in micro seconds so this will be effective when this code will execute a million times by another code. Moreover you can use first, first.second , first.second.third as an arg without changing single line of code.

coldspeed95 Over a year ago

The extra variable makes near 0 difference whatsoever, I would certainly hope for larger performance gains than this on a million records.

Vishal Khichadiya Over a year ago

@cᴏʟᴅsᴘᴇᴇᴅ Can you tell me how much time this code will takes if you measure it really!! Because I'm dmm sure that it's very big difference of time when this code will execute with extra variable and without extra variable.

coldspeed95 Over a year ago

Not nearly as much as the other answers, we'll go with that.

ndpu · Accepted Answer · 2018-04-28 12:28:44Z

You can use reduce (functools.reduce in python3):

import operator
def get_entry(dct, keyspec):
    return reduce(operator.getitem, keyspec.split('.'), dct)

It is more nicely looking but with a little less perfomance.

Your version timeit:

>>> timeit("get_entry_original(mydict, 'first.second.third.fourth')",
           "from __main__ import get_entry_original, mydict", number=1000000)
0.5646841526031494

with reduce:

>>> timeit("get_entry(mydict, 'first.second.third.fourth')",
           "from __main__ import get_entry, mydict")
0.6140949726104736

As tdelaney notice - split consume almost as much cpu power as getting key in dict:

def split_keys(keyspec):
    keys = keyspec.split('.')

timeit("split_keys('first.second.third.fourth')",
       "from __main__ import split_keys")
0.28857898712158203

Just move string splitting away from get_entry function:

def get_entry(dct, keyspec_list):
    return reduce(operator.getitem, keyspec_list, dct)


timeit("get_entry(mydict, ['first', 'second', 'third', 'fourth'])",
       "from __main__ import get_entry, mydict")
0.37825703620910645

Collectives™ on Stack Overflow

Efficiently accessing arbitrarily deep dictionaries

7 Answers 7

10 Comments

8 Comments

Comments

3 Comments

Comments

data

Function

call the function

5 Comments

Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

7 Answers 7

10 Comments

8 Comments

Comments

3 Comments

Comments

data

Function

call the function

5 Comments

Comments

Your Answer

Sign up or log in

Post as a guest