Returning Dictionary-length of words in string [duplicate]

Question

I need to build a function that takes as input a string and returns a dictionary.
The keys are numbers and the values are lists that contain the unique words that have a number of letters equal to the keys.
For example, if the input function is as follows:

n_letter_dictionary("The way you see people is the way you treat them and the Way you treat them is what they become")

The function should return:

{2: ['is'], 3: ['and', 'see', 'the', 'way', 'you'], 4: ['them', 'they', 'what'], 5: ['treat'], 6: ['become', 'people']}

The code that I have written is as follows:

def n_letter_dictionary(my_string):
    my_string=my_string.lower().split()
    sample_dictionary={}
    for word in my_string:
        words=len(word)
        sample_dictionary[words]=word
    print(sample_dictionary)
    return sample_dictionary

The function is returning a dictionary as follows:

{2: 'is', 3: 'you', 4: 'they', 5: 'treat', 6: 'become'}

The dictionary does not contain all the words with the same number of letters but is returning only the last one in the string.

gtlambert · Accepted Answer · 2016-03-14 11:50:56Z

7

Since you only want to store unique values in your lists, it actually makes more sense to use a set. Your code is almost right, you just need to make sure that you create a set if words isn't already a key in your dictionary, but that you add to the set if words is already a key in your dictionary. The following displays this:

def n_letter_dictionary(my_string):
    my_string=my_string.lower().split()
    sample_dictionary={}
    for word in my_string:
        words=len(word)
        if words in sample_dictionary:
            sample_dictionary[words].add(word)
        else:
            sample_dictionary[words] = {word}
    print(sample_dictionary)
    return sample_dictionary

n_letter_dictionary("The way you see people is the way you treat them and the Way you treat them is what they become")

Output

{2: set(['is']), 3: set(['and', 'the', 'see', 'you', 'way']), 
 4: set(['them', 'what', 'they']), 5: set(['treat']), 6: set(['become', 'people'])}

edited Mar 14, 2016 at 11:50

answered Mar 14, 2016 at 11:33

gtlambert

12k2 gold badges32 silver badges48 bronze badges

Sign up to request clarification or add additional context in comments.

3 Comments

Ilja Over a year ago

oh, this is better, our other solutions will raise KeyError...

Bhavik Joshi Over a year ago

how to sort the list ['the', 'way', 'you', 'see', 'the', 'way', 'you', 'and', 'the', 'way', 'you']

Ilja Over a year ago

just do some_list.sort() if you want to have it alphabetically

tobias_k · Accepted Answer · 2016-03-14 11:53:37Z

3

The problem with your code is that you just put the latest word into the dictionary. Instead, you have to add that word to some collection of words that have the same length. In your example, that is a list, but a set seems to be more appropriate, assuming order is not important.

def n_letter_dictionary(my_string):
    my_string=my_string.lower().split()
    sample_dictionary={}
    for word in my_string:
        if len(word) not in sample_dictionary:
            sample_dictionary[len(word)] = set()
        sample_dictionary[len(word)].add(word)
    return sample_dictionary

You can make this a bit shorter by using a collections.defaultdict(set):

    my_string=my_string.lower().split()
    sample_dictionary=collections.defaultdict(set)
    for word in my_string:
        sample_dictionary[len(word)].add(word)
    return dict(sample_dictionary)

Or use itertools.groupby, but for this you have to sort by length, first:

    words_sorted = sorted(my_string.lower().split(), key=len)
    return {k: set(g) for k, g in itertools.groupby(words_sorted, key=len)}

Example (same result for each of the three implementations):

>>> n_letter_dictionary("The way you see people is the way you treat them and the Way you treat them is what they become")
{2: {'is'}, 3: {'way', 'the', 'you', 'see', 'and'}, 4: {'what', 'them', 'they'}, 5: {'treat'}, 6: {'become', 'people'}}

edited Mar 14, 2016 at 11:53

answered Mar 14, 2016 at 11:36

tobias_k

83.2k13 gold badges130 silver badges186 bronze badges

1 Comment

Ilja Over a year ago

Quite right, of course it makes more sense to remove duplicates!

Ilja · Accepted Answer · 2016-03-14 11:31:49Z

2

With sample_dictionary[words]=word you overwrite the current contents which you have put there so far. You need a list, and to that you can append.

Instead of that you need:

if words in sample_dictionary.keys():
    sample_dictionary[words].append(word)
else:
    sample_dictionary[words]=[word]

So if there is a value to this key, I append to it, and else create a new list.

answered Mar 14, 2016 at 11:31

Ilja

2,12416 silver badges29 bronze badges

3 Comments

gtlambert Over a year ago

Yup, and you don't actually require the .keys()

A.Seec Over a year ago

Hi, Thanks a lot for the help. Still, i am getting repeating values for keys already present in the dictionary. Do you know a way to prevent repeating words without using set()?

Ilja Over a year ago

Why don't you want to use the set()? Well, there is a way, of course. Replace the else: by elif word not in sample_dictionary[words]: -- then it will check this condition

Christian Witts · Accepted Answer · 2016-03-14 13:41:48Z

2

You can use a defaultdict found in the collections library. You can use it to create a default type for the value portion of your dictionary, in this case a list, and just append to it based on the length of your word.

from collections import defaultdict

def n_letter_dictionary(my_string):
    my_dict = defaultdict(list)
    for word in my_string.split():
        my_dict[len(word)].append(word)

    return my_dict

You could still do this without defaultdict's, but would just be a little longer in length.

def n_letter_dictionary(my_string):
    my_dict = {}
    for word in my_string.split():
        word_length = len(word)
        if word_length in my_dict:
            my_dict[word_length].append(word)
        else:
            my_dict[word_length] = [word]

    return my_dict

To ensure no duplicated in the values list, without using set(). Be warned though, if your value lists are large, and your input data is fairly unique, you'll experience a performance setback as checking if the value already exists in the list will only early exit once it is encountered.

from collections import defaultdict

def n_letter_dictionary(my_string):
    my_dict = defaultdict(list)
    for word in my_string.split():
        if word not in my_dict[len(word)]:
            my_dict[len(word)].append(word)

    return my_dict

# without defaultdicts
def n_letter_dictionary(my_string):
    my_dict = {}                                  # Init an empty dict
    for word in my_string.split():                # Split the string and iterate over it
        word_length = len(word)                   # Get the length, also the key
        if word_length in my_dict:                # Check if the length is in the dict
            if word not in my_dict[word_length]:  # If the length exists as a key, but the word doesn't exist in the value list
                my_dict[word_length].append(word) # Add the word
        else:
            my_dict[word_length] = [word]         # The length/key doesn't exist, so you can safely add it without checking for its existence

So if you have a high frequency of duplicates and a short list of words to scan through, this approach would be acceptable. If you had for example a list of randomly generated words with just permutations of alphabetic characters, causing the value list to bloat, scanning through them will become expensive.

edited Mar 14, 2016 at 13:41

answered Mar 14, 2016 at 11:34

Christian Witts

11.7k1 gold badge37 silver badges47 bronze badges

4 Comments

A.Seec Over a year ago

Thanks a lot, still i am getting repeating values for keys already present in the dictionary. Is there a way to remove repeating words without using set()?

Christian Witts Over a year ago

I added a section on ensuring no duplicates without using set().

A.Seec Over a year ago

I am trying to do it using your 1st method without the defaultdict's, by adding an 'if word not in my_dict' after 'for word in my_string.split():', but i am still getting the same output with repeating words. Could you help me with your the method without the defaultdict's?

Christian Witts Over a year ago

I have added an example without defaultdict but with unique results in the list without using set(). If you had if word not in my_dict that would always return True as word is in the value and your statement is only checking the keys of my_dict.

Alfe · Accepted Answer · 2016-03-14 11:55:27Z

1

The shortest solution I came up with uses a defaultdict:

from collections import defaultdict

sentence = ("The way you see people is the way you treat them"
            " and the Way you treat them is what they become")

Now the algorithm:

wordsOfLength = defaultdict(list)
for word in sentence.split():
    wordsOfLength[len(word)].append(word)

Now wordsOfLength will hold the desired dictionary.

answered Mar 14, 2016 at 11:55

Alfe

60.4k21 gold badges119 silver badges173 bronze badges

Comments

Rockybilly · Accepted Answer · 2016-03-14 12:00:48Z

1

itertools groupby is the perfect tools for this.

from itertools import groupby
def n_letter_dictionary(string):
    result = {}
    for key, group in groupby(sorted(string.split(), key = lambda x: len(x)), lambda x: len(x)):
        result[key] = list(group)
    return result

print n_letter_dictionary("The way you see people is the way you treat them and the Way you treat them is what they become")

# {2: ['is', 'is'], 3: ['The', 'way', 'you', 'see', 'the', 'way', 'you', 'and', 'the', 'Way', 'you'], 4: ['them', 'them', 'what', 'they'], 5: ['treat', 'treat'], 6: ['people', 'become']}

edited Mar 14, 2016 at 12:00

answered Mar 14, 2016 at 11:35

Rockybilly

4,5203 gold badges27 silver badges60 bronze badges

4 Comments

Rockybilly Over a year ago

Indeed, let me correct that swiftly.

tobias_k Over a year ago

Also, key = lambda x: len(x) is the same as just key=len ;-)

Rockybilly Over a year ago

Yes, noticed that, Thanks !

Alfe Over a year ago

Sorting the things is unnecessary effort just to please groupby. Reconsider that aspect.

open source guy · Accepted Answer · 2016-03-14 11:31:34Z

0

my_string="a aa bb ccc a bb".lower().split()
sample_dictionary={}
for word in my_string:
    words=len(word)
    if words not in sample_dictionary:
        sample_dictionary[words] = []
    sample_dictionary[words].append(word)
print(sample_dictionary)

answered Mar 14, 2016 at 11:31

open source guy

2,81510 gold badges41 silver badges62 bronze badges

1 Comment

Alfe Over a year ago

Reconsider the name of the variable words. It's rather a wordLength or similar.

Collectives™ on Stack Overflow

Returning Dictionary-length of words in string [duplicate]

7 Answers 7

3 Comments

1 Comment

3 Comments

4 Comments

Comments

4 Comments

1 Comment

Linked

Hot Network Questions

Collectives™ on Stack Overflow

7 Answers 7

3 Comments

1 Comment

3 Comments

4 Comments

Comments

4 Comments

1 Comment

Linked

Related