-
-
Notifications
You must be signed in to change notification settings - Fork 33.9k
bpo-30863: Rewrite PyUnicode_AsWideChar() and PyUnicode_AsWideCharString(). #2599
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
…ing(). They no longer cache the wchar_t* representation of string objects.
Misc/NEWS
Outdated
| C API | ||
| ----- | ||
|
|
||
| - bpo-30863: PyUnicode_AsWideChar() and PyUnicode_AsWideCharString() no longer |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this should change to use blurb.
Objects/unicodeobject.c
Outdated
| Py_ssize_t res; | ||
|
|
||
| assert(unicode != NULL); | ||
| assert(PyUnicode_IS_READY(unicode)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can't we get not ready string here? IMHO it should be put after the following if block.
Objects/unicodeobject.c
Outdated
| "embedded null character"); | ||
|
|
||
| buflen = unicode_get_widechar_size(unicode); | ||
| if (buflen < 0) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why could it be < 0? It looks to me assert(buflen >= 0) is right.
Objects/unicodeobject.c
Outdated
| } | ||
|
|
||
| buffer = PyMem_NEW(wchar_t, buflen + 1); | ||
| buffer = (wchar_t *) PyMem_MALLOC(sizeof(wchar_t) * (buflen + 1)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we need overflow check here?
Objects/unicodeobject.c
Outdated
| memcpy(w, wstr, size * sizeof(wchar_t)); | ||
| return; | ||
| } | ||
| assert(PyUnicode_KIND(unicode) != SIZEOF_WCHAR_T); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd suggest remove this assertion since we have two more specific assertions below and they are easier to read and understand.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM. Only question is this patch aims to remove cache or get rid of dependency of PyUnicode_AsUnicodeAndSize so we can deprecate it? I mean should the NEWS entry reflect this.
|
Both. Deprecating |
|
No. I just mean if that's also a goal, it's better to mention it in the NEWS. Currently the NEWS takes removing the cache as the only goal. |
|
If deprecating |
|
I mean something like
Is it acceptable to you? If not, doesn't matter, just merge it since the code already LGTM. :-) |
|
@serhiy-storchaka and @zhangyangyu: ping |
|
I was going to discuss this change on Python-Dev first. |
https://mail.python.org/pipermail/python-dev/2018-October/155530.html |
|
I started a new Pipelines build at https://dev.azure.com/python/cpython/_build/results?buildId=32792&view=logs to get a first check on whether there are significant performance regressions in the test suite. If it comes out about equal (with, say, this recent build), then I have no concerns. |
|
Looks good to me 👍 |
|
Thanks @serhiy-storchaka and @zooba: I wanted to do this change since Python 3.3 :-) I already attempted to make this change previously, but I failed. Mostly, because the legacy API was still very popular. |
They no longer cache the
wchar_t*representation of string objects.https://bugs.python.org/issue30863