Skip to content

Conversation

@csabella
Copy link
Contributor

@csabella csabella commented Feb 19, 2018

@terryjreedy
Copy link
Member

I am reviewing now and expect to push at least a few changes. Coverage is currently an excellent 97%. The misses are mostly conditions within Parse methods. If a check is eally needed, and we understand why, then adding a test case that goes the other way should be easy. If I see how, I will add such.

149 def find_good_parse_start:
173 for tries in range(5):  # Never jumps? to 183.  This note on for statement is new and unclear.
193 if m and not is_char_in_string(m.start()): # Never true, next line never executed.

229 def _study1(self): 
272 if level == 0:  # (if not level? always true
282 if level:   # always true
308 if w == 0:  # always true, 313 continue never executed
310 if level == 0:  # always true
339 continue # supposedly not executed, but must be when prior assert passes

533 def compute_backslash_indent(self): 
557 if level:  # always true
562 elif ch == '#':  # never true, hence break never executed

@csabella
Copy link
Contributor Author

For the coverage, I couldn't figure out how to get that last call to is_char_in_string to return false if the previous returns had been true. From the comments, the idea of the extra loop is to allow the colorizer to catch up; meaning, on the first call it might be true, but a few milliseconds later it might be false, even calling on the same string and index. Since is_char_in_string is a function call passed to find_good_parse_start, the value returned would have to be non-deterministic. I thought of having the function flip itself after the first 5 tries (to get through the for loop) and, with a side effect, that could probably be done. I just didn't know if that was the right approach.

@terryjreedy
Copy link
Member

I won't push anything tonight. I spent most of my time understanding the parse attribute initialization 'bug' and opening a follow-up issue. In part, I wanted to know if it could lead to buggy certain types of input.

@terryjreedy
Copy link
Member

OK, reviewing again. I will try to not get sidetracked by possible code changes.

Copy link
Member

@terryjreedy terryjreedy left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am mostly ready to merge this and move on to editing pyparse But I would like answer to #nospace puzzle if you have one, even if I merge first.

eq(start(is_char_in_string=lambda index: index >= 44), 33)
# If everything before the 'def' is in a string, then returns None.
# The non-continuation def line returns 44 (see below).
eq(start(is_char_in_string=lambda index: index < 44), None)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since 44 < 44 is False, this strikes me as a bug, though I do not yet see why None rather than 44 is returned.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In the tries for loop, the first thing it does is rfind for ':\n', so in this example, that's the end of the def statement (pos 109). It then finds the start of the line as rfind('\n'), which does not return the start of the line that actually has the def on it (it returns pos 70 instead of 44). _synchre doesn't find a keyword, so it gets ready to loop again. On the next loop, it again does rfind for ':\n', which is the end of the class statement. But the class statement is in a string (pos 33), so that's no good. This means it enters if pos is None:. Since that invokes _synchre from the beginning of the string, it again finds class and again rejects it.

One goal of this routine seems to be to find a better start point as quickly as possible. It sacrifices 'being perfect' for just trying a few (5) times. So, I don't know if this is a big deal that it doesn't find it. The downstream effect is that _study2 (and to a lesser extent _study1) might have to process more of the string instead of the 'best guess' of the tail of the string.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks. I missed the point that def is only found in the 'while 1' loop after backing up to class and searching forward. See msg312525 of bpo-32880 for more thoughts.


(NONE, BACKSLASH, FIRST, NEXT, BRACKET) = range(5)
TestInfo = namedtuple('TestInfo', ['string', 'goodlines',
'continuation'])
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just curious, why do you prefer namedtuples (which are fine) over tuples unpacked in the for loop header?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I originally did it with unpacking, but felt that the _study2 tests were unreadable because there were so many elements in the tuple. Adding the namedtuple name to the creation on each line seemed to do the trick on the readability by at least making each test line distinguishable. Once I converted that one and it didn't slow down the tests, I did the others for consistency. Switching back and forth between styles when reading the code seemed that it might require just a little more brain power and keeping it consistent let the brain know, 'ok, i've seen this before'. Aren't you glad you asked? lol :-)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

good enough

setstr = p.set_str
study = p._study1

(NONE, BACKSLASH, FIRST, NEXT, BRACKET) = range(5)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think I like these better than the C_NAMES in the file.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

:-) Since this is from the early 2000s, maybe using C_ for constant instead of just caps was more of the accepted style? Should it be added to theTODO list?

TestInfo('())\n', [0, 1], NONE), # Extra closer.
TestInfo(')(\n', [0, 1], BRACKET), # Extra closer.
# For the mismatched example, it doesn't look like contination.
TestInfo('{)(]\n', [0, 1], NONE), # Mismatched.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I did not yet read the _study1 while loop, but these answers are intelligible, given the string, even though understanding the function definition will require looking at the use context.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

From what I can tell _study1 is all about continuation type. Since the mismatch example isn't code that's likely to happen 'in the wild', I don't think it's likely to cause issues. But, if someone was playing around with the editor, they might be surprised that a non-matching bracket might close a '('.

For example, in an IDLE editor,

{'a': [1, 2, 3}

indents 1 space upon Enter. I know there's not a linter to mark it as a syntax error, but I think users might expect for the next line to align with the { because they thought they finished the dictionary.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

See msg 312526. Except for possible bugs, mis-indents are a mark of syntax error ;-)

tests = (
TestInfo('', 0, 0, '', None, ((0, 0),)),
TestInfo("'''This is a multiline continutation docstring.\n\n",
0, 49, "'", None, ((0, 0), (0, 1), (49, 0))),
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The _study2 docstring says lastch is 'last non-whitespace char' before comment while comment within _study2 says 'last interesting char'. So I expected '.' But it appears in the code that chars within a string, after the initial quote, are uninteresting. I changed 'non-whitespace' to 'interesting', and may clarify that in next issue.

0, 11, '', None, ((0, 0), (0, 1), (11, 0))),
# A comment without a space is a special case
TestInfo(' #Comment\\\n',
0, 0, '', None, ((0, 0),)),
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This strikes me as a bug. I made both samples start with either no spaces or one space and get same results. For the 2nd, I verified that ch in never '#', so that "if ch == '#':" is never triggered, but I cannot see why or how it gets skipped. I am inclined to disable this case.

Copy link
Contributor Author

@csabella csabella Feb 21, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It was for coverage. This regex doesn't work with a space after the #.

_junkre = re.compile(r"""
    [ \t]*
    (?: \# \S .* )?
    \n
""", re.VERBOSE).match

TestInfo('def function1(self, a):\n pass\n', no),
TestInfo('# A comment:\n', no),
TestInfo('"""A docstring:\n', no),
TestInfo('"""A docstring:\n', no),
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These are mostly redundant and the only thing the function does beyond calling _study2 is switch on ':'

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, I was just showing that a : inside a comment or docstring doesn't count, in case the code was changed to do the check some way besides :.

tests = (
TestInfo('', ((0, 0),)),
TestInfo('a\n', ((0, 0),)),
TestInfo('()()\n', ((0, 0), (0, 1), (2, 0), (2, 1), (4, 0))),
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I understand that the code produces 4 rather than 3 but I wonder if the caller really cares.
These really test _study2 more, but with cases where we only care about the bracketing.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I actually intended this test to show the level (second element) increase and decrease on closers. I didn't look at how it was used, but just tried to make tests for what it was returning.

setstr(test.string)
test.assert_(closer())

def test_get_last_open_bracket_pos(self):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I hope to delete this with the function later, but until then, will leave it.


def test_tran(self):
self.assertEqual('\t a([{b}])b"c\'d\n'.translate(self.parser._tran),
'xxx(((x)))x"x\'x\n')
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was wondering if explicit tests should be added for that. I don't know if compressing a string like this is common so that it can be parsed faster, but I learned a lot from this technique.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Abstractly, mappings implement functions (in the set-theoretic sense). Copying the code that produces the mapping is not terribly useful, but in this case, I discovered a single 'actual = expected' test.

if i < 0:
break
i = str.rfind('\n', 0, i) + 1 # start of colon line
i = str.rfind('\n', 0, i) + 1 # start of colon line (-1+1=0)
Copy link
Contributor Author

@csabella csabella Feb 21, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I thought this (-1+1=0) was a clever trick when I debugged it. :-)

@terryjreedy
Copy link
Member

The no-space puzzle I mentioned above is this one (I forgot to push the 'Add comment' button).

        TestInfo(' # Comment\\\n',
                 0, 12, '', None, ((0, 0), (1, 1), (12, 0))),
        # A comment without a space is a special case
        TestInfo(' #Comment\\\n',
                 0, 0, '', None, ((0, 0),)),

Why the difference?

@terryjreedy terryjreedy merged commit c84cf6c into python:master Feb 22, 2018
@miss-islington
Copy link
Contributor

Thanks @csabella for the PR, and @terryjreedy for merging it 🌮🎉.. I'm working now to backport this PR to: 3.6, 3.7.
🐍🍒⛏🤖

miss-islington pushed a commit to miss-islington/cpython that referenced this pull request Feb 22, 2018
There are no code changes other than comments and docstrings.
(cherry picked from commit c84cf6c)

Co-authored-by: Cheryl Sabella <[email protected]>
@bedevere-bot
Copy link

GH-5803 is a backport of this pull request to the 3.7 branch.

@bedevere-bot
Copy link

GH-5804 is a backport of this pull request to the 3.6 branch.

miss-islington pushed a commit to miss-islington/cpython that referenced this pull request Feb 22, 2018
There are no code changes other than comments and docstrings.
(cherry picked from commit c84cf6c)

Co-authored-by: Cheryl Sabella <[email protected]>
miss-islington added a commit that referenced this pull request Feb 22, 2018
There are no code changes other than comments and docstrings.
(cherry picked from commit c84cf6c)

Co-authored-by: Cheryl Sabella <[email protected]>
miss-islington added a commit that referenced this pull request Feb 22, 2018
There are no code changes other than comments and docstrings.
(cherry picked from commit c84cf6c)

Co-authored-by: Cheryl Sabella <[email protected]>
@csabella
Copy link
Contributor Author

Not sure if you saw my previous comment about the "no-space issue".

Why the difference?

If you're asking why there is a different result for the tests, it's because this regex doesn't work with a space after the #.

_junkre = re.compile(r"""
    [ \t]*
    (?: \# \S .* )?
    \n
""", re.VERBOSE).match

If your 'why' is why _junkre is written like that, I haven't been able to figure it out.

@csabella csabella deleted the pyparse branch February 22, 2018 13:08
@terryjreedy
Copy link
Member

terryjreedy commented Feb 23, 2018

After looking at the effect on smart indenting, I decided that ignoring '#x' as junk is the wrong case. See bpo-32918

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants