-
-
Notifications
You must be signed in to change notification settings - Fork 33.7k
gh-139122: Reimplement base UUID type, uuid4(), and uuid7() in C #139123
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
… in C The C implementation considerably boosts the performance of the key UUID operations: ------------------------------------ Operation Speedup ------------------------------------ uuid4() generation 15.01x uuid7() generation 29.64x UUID from string 6.76x UUID from bytes 5.16x str(uuid) conversion 6.66x ------------------------------------ Summary of changes: * The UUID type is reimplemented in C in its entirety. * The pure-Python is kept around and is used of the C implementation isn't available for some reason. * Both implementations are tested extensively; additional tests are added to ensure that the C implementation of the type follows the pure Python implementation fully. * The Python implementation stores UUID values as int objects. The C implementation stores them as `uint8_t[16]` array. * The C implementation supports unpickling of UUIDs created with Python 2 using protocols starting with 0. That necessitated a small fix to the `copyreg` module (the change is only affecting legacy pickle pathway). * The C implementation has faster hash() implementation but also caches the computed hash value to speedup cases when UUIDs are used as set/dict keys. * The C implementation has a freelist to make new UUID object instantiation as fast as possible. * uuid4() and uuid7() are now implmented in C. The most performance boost (10x) comes from overfetching entropy to decrease the number of _PyOS_URandom() calls. On its own it's a safe optimization with the edge case that Unix fork needs to be explicitly handled. We do that by comparing the current PID to the PID of when the random buffer was populated. * Portions of code are coming from my implementation of faster UUID in gel-python [1]. I did use AI during the development, but basically had to rewrite the code it generated to be more idiomatic and efficient. * The benchmark can be found here [2]. * This PR makes Python UUID operations as fast as they are in NodeJS and Bun runtimes. [1] https://github.com/MagicStack/py-pgproto/blob/b8109fb311a59f30f9947567a13508da9a776564/uuid.pyx [2] https://gist.github.com/1st1/f03e816f34a61e4d46c78ff98baf4818
2d0a381 to
c4363f8
Compare
fa12192 to
4a5c2a2
Compare
|
Note to reviewers:
|
|
|
||
| *counter = (((uint64_t)(high & 0x1FF) << 32) | (low >> 32)) & 0x1FFFFFFFFFF; | ||
| *tail = (uint32_t)low; | ||
| return 0; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I haven't tested thoroughly (mostly looking at GodBolt output), but this ought to generate more efficient code (and there might be more ways to generate the right values directly, I just didn't go that far):
struct {
uint16_t high;
uint32_t low1;
uint32_t low2;
} rand_bytes;
if (gen_random(state, (uint8_t*)&rand_bytes, sizeof(rand_bytes)) < 0) {
return -1;
}
*counter = (((uint64_t)(rand_bytes.high & 0x1FF) << 32) | (rand_bytes.low1)) & 0x1FFFFFFFFFF;
*tail = rand_bytes.low2;
|
I would be happy to have a C implementation of UUID but for reviewing purposes, may I suggest that we first focus on implementing the wrapper in C in one PR and have different PRs for each UUID? it would be much easier to focus on the algorithm to review. |
picnixz
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These are the first round comments I have. I realy want multiple PRs because the module becomes really huge and there are parts that I don't think we need to reimplement in C.
| raise ValueError('badly formed hexadecimal UUID string') | ||
| int = int_(hex, 16) | ||
| elif bytes_le is not None: | ||
| if not isinstance(bytes_le, bytes_): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This will make a behavioural change so let's not change this here (previously there would be an assertion error). Let's do it in a separate PR (and remove the assert at the same time)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't view this as a behavioral change. It was an error condition before, it's now handled properly. I wouldn't touch this code if I didn't have to re-implement it in C; re-implementing "assert" statement behavior is just counterproductive. I'd do that, if it was really a behavioral change, but I strongly believe it is not.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
FTR, a similar discussion also took place in #73915 where I suggested that we raise a proper TypeError. However, changing runtime exceptions must be documented both in a What's New and in .rst, so don't forget to add this.
Sorry, I'm not going to do multiple PRs, I've no time for that. But I will address your comments here for sure! |
da233b4 to
ba9217e
Compare
|
I would REALLY appreciate smaller PRs. Usually this is our (current) workflow, namely incrementing changes. I think other core devs would agree with me here (@vstinner, @serhiy-storchaka and @encukou). Note that whatever we do, this is a feature that will only be included in 3.15 so I can also make the small PRs once I am back at home. Changing 3.14 is no more possible unless the RM gives their approval here (cc @hugovk, who could also whether he prefers one or multiple PR) as performance improvements are usually considered as new features (especially if we are adding a C implementation). |
| @@ -0,0 +1,57 @@ | |||
| Reimplement base UUID type, uuid4(), and uuid7() in C | |||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
NEWS entries cannot contain multiple paragraphs. I would suggest adding a What's New entry as well
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh, i had no idea! Let me fix it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
The current PR layout:
I'm usually also for making smaller PRs, but objectively there's no point for that right here; I don't see how reviewing this would be fundamentally any different if instead of one PR with 1700 lines in uuidmodule.c, you'd have 3 PRs, one with 1600 lines and another with like 100 lines, and yet another with 30. Reviewing time would be the same, no? What's the point? If you absolutely insist I can do this, but tbqh I really don't understand why you're pushing for this so hard. It would create a lot of work for me and not necessarily make reviewers more happy.
I don't think this is 3.14 material and I wasn't pushing for that. 3.15 is fine (unless other people but me want this to be in 3.14, in which case I won't say no). That said I'd like to see my work through and would love to still merge this myself. |
Modules/_uuidmodule.c
Outdated
| uint64_t random_idx; | ||
| uint64_t random_last_pid; | ||
|
|
||
| // A freelist for uuid objects -- 15-20% performance boost. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There is a generic freelist implementation for python objects (see https://github.com/python/cpython/blob/7257b24140ac1b39fb8cfd4610134ec79575a396/Include/internal/pycore_freelist_state.h). Unless there are strong reasons not to, the uuid object should use the generic implementation.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sadly I can't use that, as uuidobject.c is compiled as a shared lib.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That is a pity. I would suggest to leave out the freelist implementation in this PR. It makes the PR simpler (and therefore easier to review and accept). Even without the 10-20% performance gain from the freelist this PR is worthwhile. We can then reconsider the freelist in followup PRs.
|
@picnixz I'm sorry for my snappiness (I blame 4 days spent deep in C coding! :)) I'm adding some basic machinery to test that C implementation of uuid() functions behave identically. Since this is even more extra code it's now very reasonable to split this into two prs: one for the base UUID type, another for uuid4() and uuid7(). I'll do that later. |
| uuidobject *self = NULL; | ||
| uuid_state *state = get_uuid_state_by_cls(type); | ||
|
|
||
| Py_BEGIN_CRITICAL_SECTION(type); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What if a subclass of UUID is created? Then both an exact UUID and the subclass can than access the freelist concurrently.
The following minimal example with a subclass segfaults for me
from uuid import UUID
class U(UUID):
pass
a = UUID(int=10)
print(a)
b = U(int=10)
print(b)
print('-')
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
good catch, I'll fix tomorrow
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actually, it's not that (although dealloc could be causing trouble, so I fixed the freelist to work only with the base type), but rather the code that was getting the module state wasn't assuming there could be subclassing at play. Fixed & pushed.
| // There's a precedent with NodeJS doing exact same thing for | ||
| // improving performance of their UUID implementation. | ||
|
|
||
| // IMPORTANT: callers should have a critical section or a lock |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would remove the comment and rename the method to gen_random_lock_held (or a variation to indicate which lock is held).
In addition you could add inside the method an assert statement to verify the lock is indeed held. For example for critical sections there is _Py_CRITICAL_SECTION_ASSERT_OBJECT_LOCKED
We're all in agreement this is for 3.15 👍 |
…7() in C The C implementation considerably boosts the performance of the key UUID operations: ------------------------------------ Operation Speedup ------------------------------------ uuid4() generation 15.01x uuid7() generation 29.64x UUID from string 6.76x UUID from bytes 5.16x str(uuid) conversion 6.66x ------------------------------------ Summary of changes: * The UUID type is reimplemented in C in its entirety. * The pure-Python is kept around and is used of the C implementation isn't available for some reason. * Both implementations are tested extensively; additional tests are added to ensure that the C implementation of the type follows the pure Python implementation fully. * The Python implementation stores UUID values as int objects. The C implementation stores them as `uint8_t[16]` array. * The C implementation has faster hash() implementation but also caches the computed hash value to speedup cases when UUIDs are used as set/dict keys. * The C implementation has a freelist to make new UUID object instantiation as fast as possible. * uuid4() and uuid7() are now implmented in C. The most performance boost (10x) comes from overfetching entropy to decrease the number of _PyOS_URandom() calls. On its own it's a safe optimization with the edge case that Unix fork needs to be explicitly handled. We do that by comparing the current PID to the PID of when the random buffer was populated. * Portions of code are coming from my implementation of faster UUID in gel-python [1]. I did use AI during the development, but basically had to rewrite the code it generated to be more idiomatic and efficient. * The benchmark can be found here [2]. * This PR makes Python UUID operations as fast as they are in NodeJS and Bun runtimes. [1] https://github.com/MagicStack/py-pgproto/blob/b8109fb311a59f30f9947567a13508da9a776564/uuid.pyx [2] https://gist.github.com/1st1/f03e816f34a61e4d46c78ff98baf4818
A side effect of some compatility fixes is the new code, with which the new C uuid7() is now 35x faster that pure Python (used to be 30x).
This fixes a segfault; regr test added. While there, make the freelist only work with our base UUID type and no subclasses of it.
picnixz
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I still haven't reviewed the arithmetic and some of my comments may clash with each others as there are a lot. I'm a bit skeptical about the hook strategy for testing and I would appreciate if we could have another approach but if not, then so be it (I don't know if we have other modules that need this kind of patching).
|
|
||
|
|
||
| _UINT_128_MAX = (1 << 128) - 1 | ||
|
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's remove this blank line as well.
| raise ValueError('badly formed hexadecimal UUID string') | ||
| int = int_(hex, 16) | ||
| elif bytes_le is not None: | ||
| if not isinstance(bytes_le, bytes_): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
FTR, a similar discussion also took place in #73915 where I suggested that we raise a proper TypeError. However, changing runtime exceptions must be documented both in a What's New and in .rst, so don't forget to add this.
| elif bytes_le is not None: | ||
| if not isinstance(bytes_le, bytes_): | ||
| raise TypeError( | ||
| f'a bytes-like object is required, not {type(bytes_le).__name__!r}' |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We don't want a "bytes-like" object, but a real "bytes" object. Bytes-like objects have a different meaning: https://docs.python.org/3/glossary.html#term-bytes-like-object.
| raise ValueError('fields is not a 6-tuple') | ||
| (time_low, time_mid, time_hi_version, | ||
| clock_seq_hi_variant, clock_seq_low, node) = fields | ||
| clock_seq_hi_variant, clock_seq_low, node) = fields |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| clock_seq_hi_variant, clock_seq_low, node) = fields | |
| clock_seq_hi_variant, clock_seq_low, node) = fields |
| clock_seq = (clock_seq_hi_variant << 8) | clock_seq_low | ||
| int = ((time_low << 96) | (time_mid << 80) | | ||
| (time_hi_version << 64) | (clock_seq << 48) | node) | ||
| (time_hi_version << 64) | (clock_seq << 48) | node) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| (time_hi_version << 64) | (clock_seq << 48) | node) | |
| (time_hi_version << 64) | (clock_seq << 48) | node) |
| _uuid_UUID___setstate___impl(uuidobject *self, PyObject *state) | ||
| /*[clinic end generated code: output=cdf6bd4a2a680b3f input=b1ec0744788a73a0]*/ | ||
| { | ||
| if (!PyDict_Check(state)) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A mapping is also allowed.
| // Get and set the 'int' value | ||
| PyObject *int_value = PyDict_GetItem(state, &_Py_ID(int)); | ||
| if (int_value == NULL) { | ||
| PyErr_SetString(PyExc_ValueError, "state must have 'int' key"); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In the pure Python implementation, this currently raises a KeyError instead of a ValueError so we should also match the exception message.
| } | ||
|
|
||
| // Get and set the 'int' value | ||
| PyObject *int_value = PyDict_GetItem(state, &_Py_ID(int)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
GetItem may return NULL if the key is missing or if there is an error. Since we need to support mappings and not just dicts, use PyObject_GetItem (it returns a strong reference and handles KeyError accordingly).
| PyObject *shift = NULL; | ||
| PyObject *shifted = NULL; | ||
|
|
||
| one = PyLong_FromLong(1); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For 1, you can use _PyLong_GetOne()
| // by older Pythons (using ancient pickle protocol verions) and | ||
| // restore from pickles produced by old Python versions. | ||
| .name = "uuid.UUID", | ||
|
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hm... why? This is the best way to ensure that Python and C produce one to one equivalent results given the same input (time/entropy). IMO this is the best way known to test things, what are the other options? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've thought about the hooks. For the pure Python implementation, I would really appreciate that we don't have hooks. We can test the Python implementation with mocks. In the testing suite, we can have a setup method that sets those mocks or a function that can be used as a decorator. The rationale is that people read Python code and I don't want to expose those kind of hacks.
Other places in the standard library don't expose hooks installation and always prefer using mocks instead.
Now, for the C part, I think the following could work:
- instead of having hooks, import
os.urandomandtime.time_nsand store them in the module state, (os.urandomuses_PyOS_URandom, and up to some negligible cost for creating a bytes object, it's the same interface; I don't think we lose much by having this small overhead but please check it) - in the tests, mock
os.urandomandtime.time_nsbefore importing_uuid(or import a fresh copy of it).
This could make the tests slower as we likely need to re-import _uuid (and possibly uuid) everytime, but (if this works), this should be much cleaner. I think this is the only module in CPython that needs this kind of testing.
| // Validate that fields is a sequence with exactly 6 elements | ||
| if (!PySequence_Check(fields)) { | ||
| PyErr_SetString(PyExc_TypeError, "fields must be a sequence"); | ||
| return -1; | ||
| } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You don't need to check for the type as it will be done by PySequence_Size. Considering the Python code does if len(fields) != 6: ..., you can directly use PySequence_Size and return NULL if len < 0.
| return -1; | ||
| } | ||
|
|
||
| # define EXTRACT_FIELD(field_num, max_value, error_msg, type, name) \ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we have upper case parameters for macros as specified by PEP-7? TiA. This helps distinguishing between a variable local to the macro or not.
| if (extract_field(fields, field_num, max_value, error_msg, \ | ||
| &(name##_extracted)) < 0) { \ | ||
| return -1; \ | ||
| } \ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| if (extract_field(fields, field_num, max_value, error_msg, \ | |
| &(name##_extracted)) < 0) { \ | |
| return -1; \ | |
| } \ | |
| if (extract_field(fields, field_num, max_value, error_msg, \ | |
| &(name##_extracted)) < 0) \ | |
| { \ | |
| return -1; \ | |
| } \ |
| if (type != state->UuidType) { | ||
| // We only apply the freelist optimization to the known C type, | ||
| // not any subclasses of it. | ||
| goto dealloc; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I never used freelists but in other places of the code, we call Py_DECREF(tp) even if there is a freelist. Can you explain why we skip type decref for the exact type?
| static PyObject * | ||
| Uuid_get_hex(PyObject *o, void *closure) | ||
| { | ||
| uuidobject *self = (uuidobject *)o; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You can use _Py_strhex() instead of using byte_to_hex.
| if (uuid->cached_hash != -1) { | ||
| // UUIDs are very often used in dicts/sets, so it makes | ||
| // sense to cache the computed hash (like we do for str) | ||
| return uuid->cached_hash; | ||
| } | ||
|
|
||
| Py_hash_t hash = Py_HashBuffer(uuid->bytes, 16); | ||
| uuid->cached_hash = hash; | ||
| return hash; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| if (uuid->cached_hash != -1) { | |
| // UUIDs are very often used in dicts/sets, so it makes | |
| // sense to cache the computed hash (like we do for str) | |
| return uuid->cached_hash; | |
| } | |
| Py_hash_t hash = Py_HashBuffer(uuid->bytes, 16); | |
| uuid->cached_hash = hash; | |
| return hash; | |
| // UUIDs are very often used in dicts/sets, so it makes | |
| // sense to cache the computed hash (like we do for str) | |
| if (uuid->cached_hash == -1) { | |
| uuid->cached_hash = Py_HashBuffer(uuid->bytes, 16); | |
| } | |
| return uuid->cached_hash; |
|
|
||
| /*[clinic input] | ||
| @critical_section | ||
| _uuid._install_c_hooks |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's call it __install_c_hooks with a double underscore to make it even more private. In addition, in the docstring, add "Internal function. Do not use."
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think that double underscore makes a difference. Also, the docstring should explain that it's used for testing purpose only.
vstinner
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice work! Here is a first review.
| def _gen_random(size): | ||
| if _random_hook is None: | ||
| return os.urandom(size) | ||
| else: | ||
| return _random_hook(size) | ||
|
|
||
|
|
||
| def _gen_time(): | ||
| if _time_hook is None: | ||
| return time.time_ns() | ||
| else: | ||
| return _time_hook() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I suggest to initialize _random_hook to os.urandom and _time_hook to time.time.ns, to avoid an useless test:
| def _gen_random(size): | |
| if _random_hook is None: | |
| return os.urandom(size) | |
| else: | |
| return _random_hook(size) | |
| def _gen_time(): | |
| if _time_hook is None: | |
| return time.time_ns() | |
| else: | |
| return _time_hook() | |
| def _gen_random(size): | |
| return _random_hook(size) | |
| def _gen_time(): | |
| return _time_hook() |
Update also _install_py_hooks() to do the same if arguments are None.
|
|
||
| #include "pycore_long.h" // _PyLong_FromByteArray, _PyLong_AsByteArray | ||
| #include "pycore_pylifecycle.h" // _PyOS_URandom() | ||
| #include "pycore_time.h" // PyTime_Time |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
PyTime_Time() now comes with Python.h, there is no need to include pycore_time.h.
|
|
||
| PyObject *uint128_max; | ||
|
|
||
| PyObject *random_func; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm fine with random_func name.
|
|
||
| // A freelist for uuid objects -- 15-20% performance boost. | ||
| uuidobject *freelist; | ||
| uint64_t freelist_size; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nitpick:
| uint64_t freelist_size; | |
| size_t freelist_size; |
| uuid_state *state = get_uuid_state(module); | ||
| uint8_t bytes[16]; | ||
|
|
||
| if (gen_random(state, bytes, 16) < 0) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| if (gen_random(state, bytes, 16) < 0) { | |
| if (gen_random(state, bytes, sizeof(bytes)) < 0) { |
| Py_INCREF(random_func); | ||
| Py_XSETREF(state->random_func, random_func); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| Py_INCREF(random_func); | |
| Py_XSETREF(state->random_func, random_func); | |
| Py_XSETREF(state->random_func, Py_NewRef(random_func)); |
| Py_INCREF(time_func); | ||
| Py_XSETREF(state->time_func, time_func); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| Py_INCREF(time_func); | |
| Py_XSETREF(state->time_func, time_func); | |
| Py_XSETREF(state->time_func, Py_NewRef(time_func)); |
| self->bytes[3] = time_low; | ||
|
|
||
| self->bytes[4] = time_mid >> 8; | ||
| self->bytes[5] = time_mid; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| self->bytes[5] = time_mid; | |
| self->bytes[5] = (uint8_t)time_mid; |
| if (self == NULL) { | ||
| return NULL; | ||
| } | ||
| memset(self->bytes, 0, 16); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nipitck: would it be possible to add a UUID_SIZE constant (#define UUID_SIZE 16) and use it instead of hardcoded number 16?
| memset(self->bytes, 0, 16); | |
| memset(self->bytes, 0, UUID_SIZE); |
| } | ||
|
|
||
| // Get the class name (sadly can't use tp_name -- we don't need the full name) | ||
| PyObject *cls_name = PyObject_GetAttrString((PyObject *)Py_TYPE(self), "__name__"); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can't you use PyType_GetName() here? Or maybe PyType_GetQualName()?
You can also use %N to get the fully qualified type name in PyUnicode_FromFormat().
| byte_to_hex(uuid->bytes[i], &str[24 + (i - 10) * 2]); | ||
| } | ||
|
|
||
| return PyUnicode_FromStringAndSize(str, 36); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You could probably eke out a little more performance by directly allocating a 36-byte-long 1-byte string and writing into it, á la https://github.com/python/cpython/pull/140666/files#diff-f08acf1e3d98794dd2c41177696af35438637baab52ab985951481647bd76caaR102-R107
|
@1st1: Do you want me to apply my suggestions directly to move on on this PR? |
The C implementation considerably boosts the performance of the key UUID operations:
Summary of changes:
The UUID type is reimplemented in C in its entirety.
The pure-Python is kept around and is used of the C implementation isn't available for some reason.
Both implementations are tested extensively; additional tests are added to ensure that the C implementation of the type follows the pure Python implementation fully.
The Python implementation stores UUID values as int objects. The C implementation stores them as
uint8_t[16]array.The C implementation supports unpickling of UUIDs created with Python 2 using protocols starting with 0.
The C implementation has faster hash() implementation but also caches the computed hash value to speedup cases when UUIDs are used as set/dict keys.
The C implementation has a freelist to make new UUID object instantiation as fast as possible.
uuid4() and uuid7() are now implmented in C. The most performance boost (10x) comes from overfetching entropy to decrease the number of _PyOS_URandom() calls. On its own it's a safe optimization with the edge case that Unix fork needs to be explicitly handled. We do that by comparing the current PID to the PID of when the random buffer was populated.
Portions of code are coming from my implementation of faster UUID in gel-python [1]. I did use AI during the development, but basically had to rewrite the code it generated to be more idiomatic and efficient.
The benchmark can be found here [2].
This PR makes Python UUID operations as fast as they are in NodeJS and Bun runtimes.
[1] https://github.com/MagicStack/py-pgproto/blob/b8109fb311a59f30f9947567a13508da9a776564/uuid.pyx
[2] https://gist.github.com/1st1/f03e816f34a61e4d46c78ff98baf4818