-
-
Notifications
You must be signed in to change notification settings - Fork 33.7k
bpo-32874: IDLE: add tests for pyparse #5755
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
I am reviewing now and expect to push at least a few changes. Coverage is currently an excellent 97%. The misses are mostly conditions within Parse methods. If a check is eally needed, and we understand why, then adding a test case that goes the other way should be easy. If I see how, I will add such. |
|
For the coverage, I couldn't figure out how to get that last call to |
|
I won't push anything tonight. I spent most of my time understanding the parse attribute initialization 'bug' and opening a follow-up issue. In part, I wanted to know if it could lead to buggy certain types of input. |
|
OK, reviewing again. I will try to not get sidetracked by possible code changes. |
terryjreedy
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am mostly ready to merge this and move on to editing pyparse But I would like answer to #nospace puzzle if you have one, even if I merge first.
| eq(start(is_char_in_string=lambda index: index >= 44), 33) | ||
| # If everything before the 'def' is in a string, then returns None. | ||
| # The non-continuation def line returns 44 (see below). | ||
| eq(start(is_char_in_string=lambda index: index < 44), None) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since 44 < 44 is False, this strikes me as a bug, though I do not yet see why None rather than 44 is returned.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In the tries for loop, the first thing it does is rfind for ':\n', so in this example, that's the end of the def statement (pos 109). It then finds the start of the line as rfind('\n'), which does not return the start of the line that actually has the def on it (it returns pos 70 instead of 44). _synchre doesn't find a keyword, so it gets ready to loop again. On the next loop, it again does rfind for ':\n', which is the end of the class statement. But the class statement is in a string (pos 33), so that's no good. This means it enters if pos is None:. Since that invokes _synchre from the beginning of the string, it again finds class and again rejects it.
One goal of this routine seems to be to find a better start point as quickly as possible. It sacrifices 'being perfect' for just trying a few (5) times. So, I don't know if this is a big deal that it doesn't find it. The downstream effect is that _study2 (and to a lesser extent _study1) might have to process more of the string instead of the 'best guess' of the tail of the string.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks. I missed the point that def is only found in the 'while 1' loop after backing up to class and searching forward. See msg312525 of bpo-32880 for more thoughts.
|
|
||
| (NONE, BACKSLASH, FIRST, NEXT, BRACKET) = range(5) | ||
| TestInfo = namedtuple('TestInfo', ['string', 'goodlines', | ||
| 'continuation']) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just curious, why do you prefer namedtuples (which are fine) over tuples unpacked in the for loop header?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I originally did it with unpacking, but felt that the _study2 tests were unreadable because there were so many elements in the tuple. Adding the namedtuple name to the creation on each line seemed to do the trick on the readability by at least making each test line distinguishable. Once I converted that one and it didn't slow down the tests, I did the others for consistency. Switching back and forth between styles when reading the code seemed that it might require just a little more brain power and keeping it consistent let the brain know, 'ok, i've seen this before'. Aren't you glad you asked? lol :-)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
good enough
| setstr = p.set_str | ||
| study = p._study1 | ||
|
|
||
| (NONE, BACKSLASH, FIRST, NEXT, BRACKET) = range(5) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think I like these better than the C_NAMES in the file.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
:-) Since this is from the early 2000s, maybe using C_ for constant instead of just caps was more of the accepted style? Should it be added to theTODO list?
| TestInfo('())\n', [0, 1], NONE), # Extra closer. | ||
| TestInfo(')(\n', [0, 1], BRACKET), # Extra closer. | ||
| # For the mismatched example, it doesn't look like contination. | ||
| TestInfo('{)(]\n', [0, 1], NONE), # Mismatched. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I did not yet read the _study1 while loop, but these answers are intelligible, given the string, even though understanding the function definition will require looking at the use context.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
From what I can tell _study1 is all about continuation type. Since the mismatch example isn't code that's likely to happen 'in the wild', I don't think it's likely to cause issues. But, if someone was playing around with the editor, they might be surprised that a non-matching bracket might close a '('.
For example, in an IDLE editor,
{'a': [1, 2, 3}indents 1 space upon Enter. I know there's not a linter to mark it as a syntax error, but I think users might expect for the next line to align with the { because they thought they finished the dictionary.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
See msg 312526. Except for possible bugs, mis-indents are a mark of syntax error ;-)
| tests = ( | ||
| TestInfo('', 0, 0, '', None, ((0, 0),)), | ||
| TestInfo("'''This is a multiline continutation docstring.\n\n", | ||
| 0, 49, "'", None, ((0, 0), (0, 1), (49, 0))), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The _study2 docstring says lastch is 'last non-whitespace char' before comment while comment within _study2 says 'last interesting char'. So I expected '.' But it appears in the code that chars within a string, after the initial quote, are uninteresting. I changed 'non-whitespace' to 'interesting', and may clarify that in next issue.
| 0, 11, '', None, ((0, 0), (0, 1), (11, 0))), | ||
| # A comment without a space is a special case | ||
| TestInfo(' #Comment\\\n', | ||
| 0, 0, '', None, ((0, 0),)), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This strikes me as a bug. I made both samples start with either no spaces or one space and get same results. For the 2nd, I verified that ch in never '#', so that "if ch == '#':" is never triggered, but I cannot see why or how it gets skipped. I am inclined to disable this case.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It was for coverage. This regex doesn't work with a space after the #.
_junkre = re.compile(r"""
[ \t]*
(?: \# \S .* )?
\n
""", re.VERBOSE).match| TestInfo('def function1(self, a):\n pass\n', no), | ||
| TestInfo('# A comment:\n', no), | ||
| TestInfo('"""A docstring:\n', no), | ||
| TestInfo('"""A docstring:\n', no), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These are mostly redundant and the only thing the function does beyond calling _study2 is switch on ':'
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, I was just showing that a : inside a comment or docstring doesn't count, in case the code was changed to do the check some way besides :.
| tests = ( | ||
| TestInfo('', ((0, 0),)), | ||
| TestInfo('a\n', ((0, 0),)), | ||
| TestInfo('()()\n', ((0, 0), (0, 1), (2, 0), (2, 1), (4, 0))), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I understand that the code produces 4 rather than 3 but I wonder if the caller really cares.
These really test _study2 more, but with cases where we only care about the bracketing.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I actually intended this test to show the level (second element) increase and decrease on closers. I didn't look at how it was used, but just tried to make tests for what it was returning.
| setstr(test.string) | ||
| test.assert_(closer()) | ||
|
|
||
| def test_get_last_open_bracket_pos(self): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I hope to delete this with the function later, but until then, will leave it.
|
|
||
| def test_tran(self): | ||
| self.assertEqual('\t a([{b}])b"c\'d\n'.translate(self.parser._tran), | ||
| 'xxx(((x)))x"x\'x\n') |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I was wondering if explicit tests should be added for that. I don't know if compressing a string like this is common so that it can be parsed faster, but I learned a lot from this technique.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Abstractly, mappings implement functions (in the set-theoretic sense). Copying the code that produces the mapping is not terribly useful, but in this case, I discovered a single 'actual = expected' test.
| if i < 0: | ||
| break | ||
| i = str.rfind('\n', 0, i) + 1 # start of colon line | ||
| i = str.rfind('\n', 0, i) + 1 # start of colon line (-1+1=0) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I thought this (-1+1=0) was a clever trick when I debugged it. :-)
|
The no-space puzzle I mentioned above is this one (I forgot to push the 'Add comment' button). Why the difference? |
|
Thanks @csabella for the PR, and @terryjreedy for merging it 🌮🎉.. I'm working now to backport this PR to: 3.6, 3.7. |
There are no code changes other than comments and docstrings. (cherry picked from commit c84cf6c) Co-authored-by: Cheryl Sabella <[email protected]>
|
GH-5803 is a backport of this pull request to the 3.7 branch. |
|
GH-5804 is a backport of this pull request to the 3.6 branch. |
There are no code changes other than comments and docstrings. (cherry picked from commit c84cf6c) Co-authored-by: Cheryl Sabella <[email protected]>
There are no code changes other than comments and docstrings. (cherry picked from commit c84cf6c) Co-authored-by: Cheryl Sabella <[email protected]>
There are no code changes other than comments and docstrings. (cherry picked from commit c84cf6c) Co-authored-by: Cheryl Sabella <[email protected]>
|
Not sure if you saw my previous comment about the "no-space issue".
If you're asking why there is a different result for the tests, it's because this regex doesn't work with a space after the _junkre = re.compile(r"""
[ \t]*
(?: \# \S .* )?
\n
""", re.VERBOSE).matchIf your 'why' is why |
|
After looking at the effect on smart indenting, I decided that ignoring '#x' as junk is the wrong case. See bpo-32918 |
https://bugs.python.org/issue32874