Skip to content

Conversation

@Pusnow
Copy link
Contributor

@Pusnow Pusnow commented Jun 5, 2017

@corona10
Copy link
Member

@Pusnow
I am not a committer of this library.
But here is a one thing I want to review.
Can you add test codes about your changing?
You can add your test cases in here.

Thank you.

@Pusnow
Copy link
Contributor Author

Pusnow commented Jul 29, 2017

Okay, I added some tests for the issue.

if (i < len &&
TBase <= PyUnicode_READ(kind, data, i) &&
PyUnicode_READ(kind, data, i) <= (TBase+TCount)) {
TBase < PyUnicode_READ(kind, data, i) &&
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are you sure this should be < rather than <=?

Copy link
Contributor Author

@Pusnow Pusnow Aug 2, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes.
That code determines PyUnicode_READ(kind, data, i) is a trailing(final) consonant while TBase(0x11A7) is the last Vowel in Hangul (Hangul Jamo).
So < is correct rather than <=.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks! And after checking (which I should have done before leaving my comment), I see that this agrees with section 3.12 of (version 10 of ) the standard.

Still, Python eyes are rather used to seeing half-open ranges, so anything other than lower <= value < high looks surprising. Is it worth adding a comment explaining what's going on?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Okay, I'll add some comments.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've just added some comments. Is it enough?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks! Yes, that's helpful.

Copy link

@ghost ghost Aug 10, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let me give a supplement:

Before Unicode 4.1.0 (draft), here is: TBase <= code <= TBase+TCount
see: http://www.unicode.org/reports/tr15/tr15-24.html#hangul_composition

After Unicode 4.1.0, here is TBase < code < TBase+TCount, which in line with the latest version (Unicode 10.0)
see: http://www.unicode.org/reports/tr15/tr15-25.html#hangul_composition

This change happened in 2005.

@Pusnow
Copy link
Contributor Author

Pusnow commented Aug 7, 2017

I think it can be merged. Is there anything I need to do?

@Pusnow
Copy link
Contributor Author

Pusnow commented Aug 24, 2017

Hello?

@corona10
Copy link
Member

@Pusnow
There should be a Misc/NEWS.d entry for this change using blurb.
See https://devguide.python.org/committing/#what-s-new-and-news-entries

@Pusnow
Copy link
Contributor Author

Pusnow commented Aug 24, 2017

Done, thank you for response.

@vstinner
Copy link
Member

I closed and reopened the PR to force to reschedule a test on AppVeyor: it just started a new job, https://ci.appveyor.com/project/python/cpython/build/3.8build17701

@zhangyangyu zhangyangyu merged commit d134809 into python:master Jun 15, 2018
@miss-islington
Copy link
Contributor

Thanks @Pusnow for the PR, and @zhangyangyu for merging it 🌮🎉.. I'm working now to backport this PR to: 2.7, 3.6, 3.7.
🐍🍒⛏🤖

@bedevere-bot
Copy link

GH-7702 is a backport of this pull request to the 3.7 branch.

miss-islington pushed a commit to miss-islington/cpython that referenced this pull request Jun 15, 2018
…ythonGH-1958)

Hangul composition check boundaries are wrong for the second character
([0x1161, 0x1176) instead of [0x1161, 0x1176]) and third character ((0x11A7, 0x11C3)
instead of [0x11A7, 0x11C3]).
(cherry picked from commit d134809)

Co-authored-by: Wonsup Yoon <[email protected]>
@bedevere-bot
Copy link

GH-7703 is a backport of this pull request to the 3.6 branch.

miss-islington pushed a commit to miss-islington/cpython that referenced this pull request Jun 15, 2018
…ythonGH-1958)

Hangul composition check boundaries are wrong for the second character
([0x1161, 0x1176) instead of [0x1161, 0x1176]) and third character ((0x11A7, 0x11C3)
instead of [0x11A7, 0x11C3]).
(cherry picked from commit d134809)

Co-authored-by: Wonsup Yoon <[email protected]>
@miss-islington
Copy link
Contributor

Sorry, @Pusnow and @zhangyangyu, I could not cleanly backport this to 2.7 due to a conflict.
Please backport using cherry_picker on command line.
cherry_picker d134809cd3764c6a634eab7bb8995e3e2eff14d5 2.7

miss-islington added a commit that referenced this pull request Jun 15, 2018
…H-1958)

Hangul composition check boundaries are wrong for the second character
([0x1161, 0x1176) instead of [0x1161, 0x1176]) and third character ((0x11A7, 0x11C3)
instead of [0x11A7, 0x11C3]).
(cherry picked from commit d134809)

Co-authored-by: Wonsup Yoon <[email protected]>
zhangyangyu pushed a commit to zhangyangyu/cpython that referenced this pull request Jun 15, 2018
…u11c3 (pythonGH-1958)

Hangul composition check boundaries are wrong for the second character
([0x1161, 0x1176) instead of [0x1161, 0x1176]) and third character ((0x11A7, 0x11C3)
instead of [0x11A7, 0x11C3])..
(cherry picked from commit d134809)

Co-authored-by: Wonsup Yoon <[email protected]>
@bedevere-bot
Copy link

GH-7704 is a backport of this pull request to the 2.7 branch.

miss-islington added a commit that referenced this pull request Jun 15, 2018
…H-1958)

Hangul composition check boundaries are wrong for the second character
([0x1161, 0x1176) instead of [0x1161, 0x1176]) and third character ((0x11A7, 0x11C3)
instead of [0x11A7, 0x11C3]).
(cherry picked from commit d134809)

Co-authored-by: Wonsup Yoon <[email protected]>
zhangyangyu added a commit that referenced this pull request Jun 15, 2018
…H-1958) (GH-7704)

Hangul composition check boundaries are wrong for the second character
([0x1161, 0x1176) instead of [0x1161, 0x1176]) and third character ((0x11A7, 0x11C3)
instead of [0x11A7, 0x11C3])..
(cherry picked from commit d134809)

Co-authored-by: Wonsup Yoon <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

9 participants