Skip to content

Conversation

@eendebakpt
Copy link
Contributor

@eendebakpt eendebakpt commented Apr 26, 2024

difflib.unified_diff (and difflib.context_diff ) are documented to have lists of strings as input. When called with strings as input arguments, e.g. difflib.unified_diff('one', 'two') a diff is generated, instead of returning an error.

In this PR we disallow input of pure strings to prevent users from accidentally passing a string to difflib.unified_diff.

  • Two tests had to be modified, but it looks like these tests were written to test something else and not passing strings as arguments
  • We still allow other sequence types such as a tuple of strings.
  • This is a backwards incompatible change. If we do not want this, then I will modify the PR to update the documentation and add tests for this behavior.

Lib/difflib.py Outdated
if b and not isinstance(b[0], str):
raise TypeError('lines to compare must be str, not %s (%r)' %
(type(b[0]).__name__, b[0]))
if isinstance(a, str) or isinstance(b, str):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Following the existing code, IMO it would be great add the not %s (%r)' phrase.

self.assertIn('ımplıcıt', output)


class TestInputParsing(unittest.TestCase):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There are three existing tests for types checking: test_mixed_types_content, test_mixed_types_filenames and test_mixed_types_dates. I think that it is better to extract them in a separate class (they are no longer about mixing 8-bit and Unicode strings) and add a new test in this class.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done. I moved the 3 existing tests to a new class and refactored the new test to be in the same style.

Lib/difflib.py Outdated
raise TypeError('lines to compare must be str, not %s (%r)' %
(type(b[0]).__name__, b[0]))
if isinstance(a, str):
raise TypeError('input must be a sequence of strings, not %s (%r)' %
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If the user pass the content of the whole file without splitting it on lines, all this (potentially megabites) will be included in the error message. This is not good. The content of the string does not add anything to clarify the bug, the type is enough.

It is better to include only type name instead of the repr also in all other error messages, but this is not related to this PR.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree we should not print the entire object. I changed it for the new error messages. Let me know whether you want me to also change the other error messages in this PR.

@serhiy-storchaka serhiy-storchaka merged commit c3b6dbf into python:main Jun 10, 2024
@serhiy-storchaka
Copy link
Member

Thank you for your contribution @eendebakpt. There were some unrelated random test failures, so I forget to merge this at its time. I should use "auto-merge" instead.

mrahtz pushed a commit to mrahtz/cpython that referenced this pull request Jun 30, 2024
noahbkim pushed a commit to hudson-trading/cpython that referenced this pull request Jul 11, 2024
estyxx pushed a commit to estyxx/cpython that referenced this pull request Jul 17, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants