Skip to content

Conversation

@csabella
Copy link
Contributor

@csabella csabella commented Feb 24, 2018

Copy link
Member

@terryjreedy terryjreedy left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think I exhausted my ideas for this issue. If CI passes, I expect to merge sometime tomorrow.

def get(self, key, default=None):
return self._get(key)
def __getitem__(self, key):
return self.get(key, 'x')
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I had done a version yesterday that inherited from dict and only implemented:

class Parse(dict):
    def __missing__(self, key):
        return 'x'

It didn't have get or __getitem__ functionality on missing values, but it worked for the translate. But then I read on SO that's it's generally not good to inherit directly from dict and to use Mapping or MutableMapping instead. I guess this doesn't matter here, but I did notice that inheriting from dict allows some attributes to change from read-only to not, such as _tran.update = {1: 2}. So when is it OK to inherit directly from dict?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When it works. We are not going to do anything so nonsensical.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

:-)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Defining __missing__ looks the most natural and efficient way. This is the case for which __missing__ was introduced.

class StringTranslatePseudoMapping(Mapping):
r"""Utility class to be used with str.translate()
class ParseMap(dict):
r"""Dict subclass that maps anything not in dict to 'x'.
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thought of this while I was sleeping. Should be class _ParseMap?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No. PEP434 declares nearly all of idlelib 'private', and it became such in practice nearly 3 years later, for in 3.6. There is no point to further privatization.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks.

@csabella
Copy link
Contributor Author

I added a commit to remove the Mapping import.


class StringTranslatePseudoMapping(Mapping):
r"""Utility class to be used with str.translate()
class ParseMap(dict):
Copy link

@pppery pppery Feb 25, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Isn't this class equivalent to defaultdict(lambda:"x")

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This class doesn't add the keys to the dictionary when they haven't been found. It will return the default value without growing the dictionary.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also, defaultdict calls the function to get 'x', instead of just returning it.

def get(self, key, default=None):
return self._get(key)
def __getitem__(self, key):
return self.get(key, 'x')
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Defining __missing__ looks the most natural and efficient way. This is the case for which __missing__ was introduced.

# Map1 maps uninteresting chars to 'x', open brackets to '(',
# close brackets to ')', and preserves quotes, backslashes,
# newlines and hashes. Used for str.translate() in _study1().
map1 = ParseMap()
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What is the reason of renaming _tran to map1?

Copy link
Member

@terryjreedy terryjreedy Feb 25, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The tests pass with def __missing__(self, key): return 'x', so I will switch if at least as fast, which I suspect it is. (Working on timeit test)

map1 reads better to me. The object is a map for _study1. 'tran' is a prefix for numerous words, often verbs, not a word in itself.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a matter of style, but to me, _tran looks more semantic. It refers to the purpose of it (a translation table). map1 refers only to its type. This is like naming the integer variable "int1" instead of "cnt" or "col".

self._get = _get
# Calling this triples access time; see bpo-32940
def __missing__(self, key):
return 120 # ord('x')
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Return just 'x'. It would be make sense to use b'x'[0] if integers have advantage, but I tested and didn't see a difference.

It is worth to remove ord() from values for other keys.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My tests, on the issue, show that ints slightly win. I see no need to change.

def get(self, key, default=None):
return self._get(key)
# Map all ascii to 120 to avoid __missing__ call, then replace some.
trans = ParseMap((i,120) for i in range(128))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

PEP 8 requires a space after comma.

But wouldn't ParseMap.fromkeys(range(128), 'x') look clearer?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

changed

>>> mapping = StringTranslatePseudoMapping(preserve_dict, ord('x'))
>>> text = "a + b\tc\nd"
>>> text.translate(mapping)
>>> keepwhite = ParseMap((ord(c),ord(c)) for c in ' \t\n\r')
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Add a space after comma. And the second ord() is not needed.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

space added

@terryjreedy terryjreedy merged commit f0daa88 into python:master Feb 28, 2018
@miss-islington
Copy link
Contributor

Thanks @csabella for the PR, and @terryjreedy for merging it 🌮🎉.. I'm working now to backport this PR to: 3.6, 3.7.
🐍🍒⛏🤖

miss-islington pushed a commit to miss-islington/cpython that referenced this pull request Feb 28, 2018
…thonGH-5862)

The new code also runs faster.
(cherry picked from commit f0daa88)

Co-authored-by: Cheryl Sabella <[email protected]>
@bedevere-bot
Copy link

GH-5940 is a backport of this pull request to the 3.7 branch.

@bedevere-bot
Copy link

GH-5941 is a backport of this pull request to the 3.6 branch.

miss-islington pushed a commit to miss-islington/cpython that referenced this pull request Feb 28, 2018
…thonGH-5862)

The new code also runs faster.
(cherry picked from commit f0daa88)

Co-authored-by: Cheryl Sabella <[email protected]>
miss-islington added a commit that referenced this pull request Feb 28, 2018
…-5862)

The new code also runs faster.
(cherry picked from commit f0daa88)

Co-authored-by: Cheryl Sabella <[email protected]>
miss-islington added a commit that referenced this pull request Feb 28, 2018
…-5862)

The new code also runs faster.
(cherry picked from commit f0daa88)

Co-authored-by: Cheryl Sabella <[email protected]>
@csabella csabella deleted the pyparse branch March 1, 2018 00:17
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

7 participants