Commit 6bccbe7
The documented definition was much broader than the real one:
there are tons of characters with general category "Other",
and we don't (and shouldn't) treat most of them as whitespace.
Rewrite the definition to agree with the comment on
_PyUnicode_IsWhitespace, and with the logic in makeunicodedata.py,
which is what generates that function and so ultimately governs.
Add suitable breadcrumbs so that a reader who wants to pin down
exactly what this definition means (what's a "bidirectional class"
of "B"?) can do so. The `unicodedata` module documentation is an
appropriate central place for our references to Unicode's own copious
documentation, so point there.
Also add to the isspace() test a thorough check that the
implementation agrees with the intended definition.
1 parent 077af8c commit 6bccbe7
2 files changed
+19
-4
lines changed| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
1763 | 1763 | | |
1764 | 1764 | | |
1765 | 1765 | | |
1766 | | - | |
1767 | | - | |
1768 | | - | |
| 1766 | + | |
| 1767 | + | |
| 1768 | + | |
| 1769 | + | |
| 1770 | + | |
| 1771 | + | |
| 1772 | + | |
1769 | 1773 | | |
1770 | 1774 | | |
1771 | 1775 | | |
| |||
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
12 | 12 | | |
13 | 13 | | |
14 | 14 | | |
| 15 | + | |
15 | 16 | | |
16 | 17 | | |
17 | 18 | | |
| |||
617 | 618 | | |
618 | 619 | | |
619 | 620 | | |
620 | | - | |
| 621 | + | |
621 | 622 | | |
622 | 623 | | |
623 | 624 | | |
624 | 625 | | |
| 626 | + | |
| 627 | + | |
| 628 | + | |
| 629 | + | |
| 630 | + | |
| 631 | + | |
| 632 | + | |
| 633 | + | |
| 634 | + | |
| 635 | + | |
625 | 636 | | |
626 | 637 | | |
627 | 638 | | |
| |||
0 commit comments