Commit 8c1c426
The documented definition was much broader than the real one:
there are tons of characters with general category "Other",
and we don't (and shouldn't) treat most of them as whitespace.
Rewrite the definition to agree with the comment on
_PyUnicode_IsWhitespace, and with the logic in makeunicodedata.py,
which is what generates that function and so ultimately governs.
Add suitable breadcrumbs so that a reader who wants to pin down
exactly what this definition means (what's a "bidirectional class"
of "B"?) can do so. The `unicodedata` module documentation is an
appropriate central place for our references to Unicode's own copious
documentation, so point there.
Also add to the isspace() test a thorough check that the
implementation agrees with the intended definition.
1 parent 786a4e1 commit 8c1c426
2 files changed
+19
-4
lines changed| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
1756 | 1756 | | |
1757 | 1757 | | |
1758 | 1758 | | |
1759 | | - | |
1760 | | - | |
1761 | | - | |
| 1759 | + | |
| 1760 | + | |
| 1761 | + | |
| 1762 | + | |
| 1763 | + | |
| 1764 | + | |
| 1765 | + | |
1762 | 1766 | | |
1763 | 1767 | | |
1764 | 1768 | | |
| |||
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
11 | 11 | | |
12 | 12 | | |
13 | 13 | | |
| 14 | + | |
14 | 15 | | |
15 | 16 | | |
16 | 17 | | |
| |||
615 | 616 | | |
616 | 617 | | |
617 | 618 | | |
618 | | - | |
| 619 | + | |
619 | 620 | | |
620 | 621 | | |
621 | 622 | | |
622 | 623 | | |
| 624 | + | |
| 625 | + | |
| 626 | + | |
| 627 | + | |
| 628 | + | |
| 629 | + | |
| 630 | + | |
| 631 | + | |
| 632 | + | |
| 633 | + | |
623 | 634 | | |
624 | 635 | | |
625 | 636 | | |
| |||
0 commit comments