gh-139122: Reimplement base UUID type, uuid4(), and uuid7() in C #139123

1st1 · 2025-09-18T13:14:06Z

The C implementation considerably boosts the performance of the key UUID operations:

------------------------------------
Operation                    Speedup
------------------------------------
uuid4() generation            15.01x
uuid7() generation            29.64x
UUID from string               6.76x
UUID from bytes                5.16x
str(uuid) conversion           6.66x
------------------------------------

Summary of changes:

The UUID type is reimplemented in C in its entirety.
The pure-Python is kept around and is used of the C implementation isn't available for some reason.
Both implementations are tested extensively; additional tests are added to ensure that the C implementation of the type follows the pure Python implementation fully.
The Python implementation stores UUID values as int objects. The C implementation stores them as uint8_t[16] array.
The C implementation supports unpickling of UUIDs created with Python 2 using protocols starting with 0.
The C implementation has faster hash() implementation but also caches the computed hash value to speedup cases when UUIDs are used as set/dict keys.
The C implementation has a freelist to make new UUID object instantiation as fast as possible.
uuid4() and uuid7() are now implmented in C. The most performance boost (10x) comes from overfetching entropy to decrease the number of _PyOS_URandom() calls. On its own it's a safe optimization with the edge case that Unix fork needs to be explicitly handled. We do that by comparing the current PID to the PID of when the random buffer was populated.
Portions of code are coming from my implementation of faster UUID in gel-python [1]. I did use AI during the development, but basically had to rewrite the code it generated to be more idiomatic and efficient.
The benchmark can be found here [2].
This PR makes Python UUID operations as fast as they are in NodeJS and Bun runtimes.

[1] https://github.com/MagicStack/py-pgproto/blob/b8109fb311a59f30f9947567a13508da9a776564/uuid.pyx

[2] https://gist.github.com/1st1/f03e816f34a61e4d46c78ff98baf4818

Issue: Implement UUID functions and type in C #139122

python-cla-bot · 2025-09-18T13:14:11Z

All commit authors signed the Contributor License Agreement.

… in C The C implementation considerably boosts the performance of the key UUID operations: ------------------------------------ Operation Speedup ------------------------------------ uuid4() generation 15.01x uuid7() generation 29.64x UUID from string 6.76x UUID from bytes 5.16x str(uuid) conversion 6.66x ------------------------------------ Summary of changes: * The UUID type is reimplemented in C in its entirety. * The pure-Python is kept around and is used of the C implementation isn't available for some reason. * Both implementations are tested extensively; additional tests are added to ensure that the C implementation of the type follows the pure Python implementation fully. * The Python implementation stores UUID values as int objects. The C implementation stores them as `uint8_t[16]` array. * The C implementation supports unpickling of UUIDs created with Python 2 using protocols starting with 0. That necessitated a small fix to the `copyreg` module (the change is only affecting legacy pickle pathway). * The C implementation has faster hash() implementation but also caches the computed hash value to speedup cases when UUIDs are used as set/dict keys. * The C implementation has a freelist to make new UUID object instantiation as fast as possible. * uuid4() and uuid7() are now implmented in C. The most performance boost (10x) comes from overfetching entropy to decrease the number of _PyOS_URandom() calls. On its own it's a safe optimization with the edge case that Unix fork needs to be explicitly handled. We do that by comparing the current PID to the PID of when the random buffer was populated. * Portions of code are coming from my implementation of faster UUID in gel-python [1]. I did use AI during the development, but basically had to rewrite the code it generated to be more idiomatic and efficient. * The benchmark can be found here [2]. * This PR makes Python UUID operations as fast as they are in NodeJS and Bun runtimes. [1] https://github.com/MagicStack/py-pgproto/blob/b8109fb311a59f30f9947567a13508da9a776564/uuid.pyx [2] https://gist.github.com/1st1/f03e816f34a61e4d46c78ff98baf4818

1st1 · 2025-09-18T13:28:07Z

Note to reviewers:

Please compare uuid7 C and Python implementations side by side
Please review all of the bitshift/magic code closely (Python implementation uses int, i'm using char[16], so low-level operations are very different but have to be equivalent)

zooba · 2025-09-18T13:45:59Z

Modules/_uuidmodule.c

+
+    *counter = (((uint64_t)(high & 0x1FF) << 32) | (low >> 32)) & 0x1FFFFFFFFFF;
+    *tail = (uint32_t)low;
+    return 0;


I haven't tested thoroughly (mostly looking at GodBolt output), but this ought to generate more efficient code (and there might be more ways to generate the right values directly, I just didn't go that far):

struct { uint16_t high; uint32_t low1; uint32_t low2; } rand_bytes; if (gen_random(state, (uint8_t*)&rand_bytes, sizeof(rand_bytes)) < 0) { return -1; } *counter = (((uint64_t)(rand_bytes.high & 0x1FF) << 32) | (rand_bytes.low1)) & 0x1FFFFFFFFFF; *tail = rand_bytes.low2;

Modules/_uuidmodule.c

picnixz · 2025-09-18T17:02:44Z

I would be happy to have a C implementation of UUID but for reviewing purposes, may I suggest that we first focus on implementing the wrapper in C in one PR and have different PRs for each UUID? it would be much easier to focus on the algorithm to review.

picnixz

These are the first round comments I have. I realy want multiple PRs because the module becomes really huge and there are parts that I don't think we need to reimplement in C.

picnixz · 2025-09-18T17:07:25Z

Lib/uuid.py

                raise ValueError('badly formed hexadecimal UUID string')
            int = int_(hex, 16)
        elif bytes_le is not None:
+            if not isinstance(bytes_le, bytes_):


This will make a behavioural change so let's not change this here (previously there would be an assertion error). Let's do it in a separate PR (and remove the assert at the same time)

I don't view this as a behavioral change. It was an error condition before, it's now handled properly. I wouldn't touch this code if I didn't have to re-implement it in C; re-implementing "assert" statement behavior is just counterproductive. I'd do that, if it was really a behavioral change, but I strongly believe it is not.

FTR, a similar discussion also took place in #73915 where I suggested that we raise a proper TypeError. However, changing runtime exceptions must be documented both in a What's New and in .rst, so don't forget to add this.

Lib/uuid.py

Modules/_uuidmodule.c

1st1 · 2025-09-19T10:48:34Z

I realy want multiple PRs because the module becomes really huge and there are parts that I don't think we need to reimplement in C.

Sorry, I'm not going to do multiple PRs, I've no time for that. But I will address your comments here for sure!

picnixz · 2025-09-19T12:29:04Z

I would REALLY appreciate smaller PRs. Usually this is our (current) workflow, namely incrementing changes. I think other core devs would agree with me here (@vstinner, @serhiy-storchaka and @encukou).

Note that whatever we do, this is a feature that will only be included in 3.15 so I can also make the small PRs once I am back at home. Changing 3.14 is no more possible unless the RM gives their approval here (cc @hugovk, who could also whether he prefers one or multiple PR) as performance improvements are usually considered as new features (especially if we are adding a C implementation).

picnixz · 2025-09-19T12:30:30Z

Misc/NEWS.d/next/Library/2025-09-18-14-13-00.gh-issue-139122.m3lp66.rst

@@ -0,0 +1,57 @@
+Reimplement base UUID type, uuid4(), and uuid7() in C


NEWS entries cannot contain multiple paragraphs. I would suggest adding a What's New entry as well

Oh, i had no idea! Let me fix it.

1st1 · 2025-09-19T12:52:59Z

I would REALLY appreciate smaller PRs. Usually this is our (current) workflow, namely incrementing changes. I think other core devs would agree with me here (@vstinner, @serhiy-storchaka and @encukou).

The current PR layout:

90% is reimplementing the UUID type
10% is uuid4 and uuid7 implementations which are isolated separate functions from (1); uuid4 is very tiny.

I'm usually also for making smaller PRs, but objectively there's no point for that right here; I don't see how reviewing this would be fundamentally any different if instead of one PR with 1700 lines in uuidmodule.c, you'd have 3 PRs, one with 1600 lines and another with like 100 lines, and yet another with 30. Reviewing time would be the same, no? What's the point?

If you absolutely insist I can do this, but tbqh I really don't understand why you're pushing for this so hard. It would create a lot of work for me and not necessarily make reviewers more happy.

Note that whatever we do, this is a feature that will only be included in 3.15 so I can also make the small PRs once I am back at home. Changing 3.14 is no more possible unless the RM gives their approval here (cc @hugovk, who could also whether he prefers one or multiple PR) as performance improvements are usually considered as new features (especially if we are adding a C implementation).

I don't think this is 3.14 material and I wasn't pushing for that. 3.15 is fine (unless other people but me want this to be in 3.14, in which case I won't say no).

That said I'd like to see my work through and would love to still merge this myself.

eendebakpt · 2025-09-19T13:16:31Z

Modules/_uuidmodule.c

+    uint64_t random_idx;
+    uint64_t random_last_pid;
+
+    // A freelist for uuid objects -- 15-20% performance boost.


There is a generic freelist implementation for python objects (see https://github.com/python/cpython/blob/7257b24140ac1b39fb8cfd4610134ec79575a396/Include/internal/pycore_freelist_state.h). Unless there are strong reasons not to, the uuid object should use the generic implementation.

Sadly I can't use that, as uuidobject.c is compiled as a shared lib.

That is a pity. I would suggest to leave out the freelist implementation in this PR. It makes the PR simpler (and therefore easier to review and accept). Even without the 10-20% performance gain from the freelist this PR is worthwhile. We can then reconsider the freelist in followup PRs.

1st1 · 2025-09-21T14:47:05Z

@picnixz I'm sorry for my snappiness (I blame 4 days spent deep in C coding! :)) I'm adding some basic machinery to test that C implementation of uuid() functions behave identically. Since this is even more extra code it's now very reasonable to split this into two prs: one for the base UUID type, another for uuid4() and uuid7(). I'll do that later.

eendebakpt · 2025-09-21T18:31:09Z

Modules/_uuidmodule.c

+    uuidobject *self = NULL;
+    uuid_state *state = get_uuid_state_by_cls(type);
+
+    Py_BEGIN_CRITICAL_SECTION(type);


What if a subclass of UUID is created? Then both an exact UUID and the subclass can than access the freelist concurrently.

The following minimal example with a subclass segfaults for me

from uuid import UUID class U(UUID): pass a = UUID(int=10) print(a) b = U(int=10) print(b) print('-')

good catch, I'll fix tomorrow

Actually, it's not that (although dealloc could be causing trouble, so I fixed the freelist to work only with the base type), but rather the code that was getting the module state wasn't assuming there could be subclassing at play. Fixed & pushed.

eendebakpt · 2025-09-21T18:41:44Z

Modules/_uuidmodule.c

+    // There's a precedent with NodeJS doing exact same thing for
+    // improving performance of their UUID implementation.
+
+    // IMPORTANT: callers should have a critical section or a lock


I would remove the comment and rename the method to gen_random_lock_held (or a variation to indicate which lock is held).

In addition you could add inside the method an assert statement to verify the lock is indeed held. For example for critical sections there is _Py_CRITICAL_SECTION_ASSERT_OBJECT_LOCKED

hugovk · 2025-09-21T19:54:06Z

Note that whatever we do, this is a feature that will only be included in 3.15 so I can also make the small PRs once I am back at home. Changing 3.14 is no more possible unless the RM gives their approval here (cc @hugovk, who could also whether he prefers one or multiple PR) as performance improvements are usually considered as new features (especially if we are adding a C implementation).

I don't think this is 3.14 material and I wasn't pushing for that. 3.15 is fine (unless other people but me want this to be in 3.14, in which case I won't say no).

We're all in agreement this is for 3.15 👍

…7() in C The C implementation considerably boosts the performance of the key UUID operations: ------------------------------------ Operation Speedup ------------------------------------ uuid4() generation 15.01x uuid7() generation 29.64x UUID from string 6.76x UUID from bytes 5.16x str(uuid) conversion 6.66x ------------------------------------ Summary of changes: * The UUID type is reimplemented in C in its entirety. * The pure-Python is kept around and is used of the C implementation isn't available for some reason. * Both implementations are tested extensively; additional tests are added to ensure that the C implementation of the type follows the pure Python implementation fully. * The Python implementation stores UUID values as int objects. The C implementation stores them as `uint8_t[16]` array. * The C implementation has faster hash() implementation but also caches the computed hash value to speedup cases when UUIDs are used as set/dict keys. * The C implementation has a freelist to make new UUID object instantiation as fast as possible. * uuid4() and uuid7() are now implmented in C. The most performance boost (10x) comes from overfetching entropy to decrease the number of _PyOS_URandom() calls. On its own it's a safe optimization with the edge case that Unix fork needs to be explicitly handled. We do that by comparing the current PID to the PID of when the random buffer was populated. * Portions of code are coming from my implementation of faster UUID in gel-python [1]. I did use AI during the development, but basically had to rewrite the code it generated to be more idiomatic and efficient. * The benchmark can be found here [2]. * This PR makes Python UUID operations as fast as they are in NodeJS and Bun runtimes. [1] https://github.com/MagicStack/py-pgproto/blob/b8109fb311a59f30f9947567a13508da9a776564/uuid.pyx [2] https://gist.github.com/1st1/f03e816f34a61e4d46c78ff98baf4818

A side effect of some compatility fixes is the new code, with which the new C uuid7() is now 35x faster that pure Python (used to be 30x).

This fixes a segfault; regr test added. While there, make the freelist only work with our base UUID type and no subclasses of it.

picnixz

I still haven't reviewed the arithmetic and some of my comments may clash with each others as there are a lot. I'm a bit skeptical about the hook strategy for testing and I would appreciate if we could have another approach but if not, then so be it (I don't know if we have other modules that need this kind of patching).

picnixz · 2025-09-22T12:54:59Z

Lib/uuid.py



 _UINT_128_MAX = (1 << 128) - 1
+


Let's remove this blank line as well.

picnixz · 2025-09-22T12:57:47Z

Lib/uuid.py

                raise ValueError('badly formed hexadecimal UUID string')
            int = int_(hex, 16)
        elif bytes_le is not None:
+            if not isinstance(bytes_le, bytes_):


FTR, a similar discussion also took place in #73915 where I suggested that we raise a proper TypeError. However, changing runtime exceptions must be documented both in a What's New and in .rst, so don't forget to add this.

picnixz · 2025-09-22T12:59:38Z

Lib/uuid.py

        elif bytes_le is not None:
+            if not isinstance(bytes_le, bytes_):
+                raise TypeError(
+                    f'a bytes-like object is required, not {type(bytes_le).__name__!r}'


We don't want a "bytes-like" object, but a real "bytes" object. Bytes-like objects have a different meaning: https://docs.python.org/3/glossary.html#term-bytes-like-object.

picnixz · 2025-09-22T12:59:51Z

Lib/uuid.py

                raise ValueError('fields is not a 6-tuple')
            (time_low, time_mid, time_hi_version,
-             clock_seq_hi_variant, clock_seq_low, node) = fields
+            clock_seq_hi_variant, clock_seq_low, node) = fields


Suggested change

clock_seq_hi_variant, clock_seq_low, node) = fields

clock_seq_hi_variant, clock_seq_low, node) = fields

picnixz · 2025-09-22T12:59:58Z

Lib/uuid.py

            clock_seq = (clock_seq_hi_variant << 8) | clock_seq_low
            int = ((time_low << 96) | (time_mid << 80) |
-                   (time_hi_version << 64) | (clock_seq << 48) | node)
+                    (time_hi_version << 64) | (clock_seq << 48) | node)


Suggested change

(time_hi_version << 64) | (clock_seq << 48) | node)

(time_hi_version << 64) | (clock_seq << 48) | node)

picnixz · 2025-09-22T14:37:51Z

Modules/_uuidmodule.c

+_uuid_UUID___setstate___impl(uuidobject *self, PyObject *state)
+/*[clinic end generated code: output=cdf6bd4a2a680b3f input=b1ec0744788a73a0]*/
+{
+    if (!PyDict_Check(state)) {


A mapping is also allowed.

picnixz · 2025-09-22T14:40:09Z

Modules/_uuidmodule.c

+    // Get and set the 'int' value
+    PyObject *int_value = PyDict_GetItem(state, &_Py_ID(int));
+    if (int_value == NULL) {
+        PyErr_SetString(PyExc_ValueError, "state must have 'int' key");


In the pure Python implementation, this currently raises a KeyError instead of a ValueError so we should also match the exception message.

picnixz · 2025-09-22T14:41:09Z

Modules/_uuidmodule.c

+    }
+
+    // Get and set the 'int' value
+    PyObject *int_value = PyDict_GetItem(state, &_Py_ID(int));


GetItem may return NULL if the key is missing or if there is an error. Since we need to support mappings and not just dicts, use PyObject_GetItem (it returns a strong reference and handles KeyError accordingly).

picnixz · 2025-09-22T14:43:49Z

Modules/_uuidmodule.c

+    PyObject *shift = NULL;
+    PyObject *shifted = NULL;
+
+    one = PyLong_FromLong(1);


For 1, you can use _PyLong_GetOne()

picnixz · 2025-09-22T14:44:43Z

Modules/_uuidmodule.c

+    // by older Pythons (using ancient pickle protocol verions) and
+    // restore from pickles produced by old Python versions.
+    .name = "uuid.UUID",
+


Suggested change

1st1 · 2025-09-22T21:02:00Z

I'm a bit skeptical about the hook strategy for testing and I would appreciate if we could have another approach but if not

Hm... why? This is the best way to ensure that Python and C produce one to one equivalent results given the same input (time/entropy). IMO this is the best way known to test things, what are the other options?

picnixz

I've thought about the hooks. For the pure Python implementation, I would really appreciate that we don't have hooks. We can test the Python implementation with mocks. In the testing suite, we can have a setup method that sets those mocks or a function that can be used as a decorator. The rationale is that people read Python code and I don't want to expose those kind of hacks.

Other places in the standard library don't expose hooks installation and always prefer using mocks instead.

Now, for the C part, I think the following could work:

instead of having hooks, import os.urandom and time.time_ns and store them in the module state, (os.urandom uses _PyOS_URandom, and up to some negligible cost for creating a bytes object, it's the same interface; I don't think we lose much by having this small overhead but please check it)
in the tests, mock os.urandom and time.time_ns before importing _uuid (or import a fresh copy of it).

This could make the tests slower as we likely need to re-import _uuid (and possibly uuid) everytime, but (if this works), this should be much cleaner. I think this is the only module in CPython that needs this kind of testing.

picnixz · 2025-09-23T11:28:24Z

Modules/_uuidmodule.c

+    // Validate that fields is a sequence with exactly 6 elements
+    if (!PySequence_Check(fields)) {
+        PyErr_SetString(PyExc_TypeError, "fields must be a sequence");
+        return -1;
+    }


You don't need to check for the type as it will be done by PySequence_Size. Considering the Python code does if len(fields) != 6: ..., you can directly use PySequence_Size and return NULL if len < 0.

picnixz · 2025-09-23T11:28:54Z

Modules/_uuidmodule.c

+        return -1;
+    }
+
+#   define EXTRACT_FIELD(field_num, max_value, error_msg, type, name)           \


Can we have upper case parameters for macros as specified by PEP-7? TiA. This helps distinguishing between a variable local to the macro or not.

picnixz · 2025-09-23T11:30:01Z

Modules/_uuidmodule.c

+            if (extract_field(fields, field_num, max_value, error_msg,          \
+                            &(name##_extracted)) < 0) {                         \
+                return -1;                                                      \
+            }                                                                   \


Suggested change

if (extract_field(fields, field_num, max_value, error_msg, \

&(name##_extracted)) < 0) { \

return -1; \

} \

if (extract_field(fields, field_num, max_value, error_msg, \

&(name##_extracted)) < 0) \

{ \

return -1; \

} \

picnixz · 2025-09-23T11:32:30Z

Modules/_uuidmodule.c

+    if (type != state->UuidType) {
+        // We only apply the freelist optimization to the known C type,
+        // not any subclasses of it.
+        goto dealloc;


I never used freelists but in other places of the code, we call Py_DECREF(tp) even if there is a freelist. Can you explain why we skip type decref for the exact type?

picnixz · 2025-09-23T11:39:46Z

Modules/_uuidmodule.c

+static PyObject *
+Uuid_get_hex(PyObject *o, void *closure)
+{
+    uuidobject *self = (uuidobject *)o;


You can use _Py_strhex() instead of using byte_to_hex.

picnixz · 2025-09-23T11:47:59Z

Modules/_uuidmodule.c

+    if (uuid->cached_hash != -1) {
+        // UUIDs are very often used in dicts/sets, so it makes
+        // sense to cache the computed hash (like we do for str)
+        return uuid->cached_hash;
+    }
+
+    Py_hash_t hash = Py_HashBuffer(uuid->bytes, 16);
+    uuid->cached_hash = hash;
+    return hash;


Suggested change

if (uuid->cached_hash != -1) {

// UUIDs are very often used in dicts/sets, so it makes

// sense to cache the computed hash (like we do for str)

return uuid->cached_hash;

}

Py_hash_t hash = Py_HashBuffer(uuid->bytes, 16);

uuid->cached_hash = hash;

return hash;

// UUIDs are very often used in dicts/sets, so it makes

// sense to cache the computed hash (like we do for str)

if (uuid->cached_hash == -1) {

uuid->cached_hash = Py_HashBuffer(uuid->bytes, 16);

}

return uuid->cached_hash;

picnixz · 2025-09-23T11:50:07Z

Modules/_uuidmodule.c

+
+/*[clinic input]
+@critical_section
+_uuid._install_c_hooks


Let's call it __install_c_hooks with a double underscore to make it even more private. In addition, in the docstring, add "Internal function. Do not use."

I don't think that double underscore makes a difference. Also, the docstring should explain that it's used for testing purpose only.

vstinner

Nice work! Here is a first review.

vstinner · 2025-09-23T13:14:41Z

Lib/uuid.py

+def _gen_random(size):
+    if _random_hook is None:
+        return os.urandom(size)
+    else:
+        return _random_hook(size)
+
+
+def _gen_time():
+    if _time_hook is None:
+        return time.time_ns()
+    else:
+        return _time_hook()


I suggest to initialize _random_hook to os.urandom and _time_hook to time.time.ns, to avoid an useless test:

Suggested change

def _gen_random(size):

if _random_hook is None:

return os.urandom(size)

else:

return _random_hook(size)

def _gen_time():

if _time_hook is None:

return time.time_ns()

else:

return _time_hook()

def _gen_random(size):

return _random_hook(size)

def _gen_time():

return _time_hook()

Update also _install_py_hooks() to do the same if arguments are None.

vstinner · 2025-09-23T13:21:29Z

Modules/_uuidmodule.c

+
+#include "pycore_long.h"          // _PyLong_FromByteArray, _PyLong_AsByteArray
+#include "pycore_pylifecycle.h"   // _PyOS_URandom()
+#include "pycore_time.h"          // PyTime_Time


PyTime_Time() now comes with Python.h, there is no need to include pycore_time.h.

vstinner · 2025-09-23T13:27:41Z

Modules/_uuidmodule.c

+
+    PyObject *uint128_max;
+
+    PyObject *random_func;


I'm fine with random_func name.

vstinner · 2025-09-23T13:28:29Z

Modules/_uuidmodule.c

+
+    // A freelist for uuid objects -- 15-20% performance boost.
+    uuidobject *freelist;
+    uint64_t freelist_size;


Nitpick:

Suggested change

uint64_t freelist_size;

size_t freelist_size;

vstinner · 2025-09-23T13:38:05Z

Modules/_uuidmodule.c

+    uuid_state *state = get_uuid_state(module);
+    uint8_t bytes[16];
+
+    if (gen_random(state, bytes, 16) < 0) {


Suggested change

if (gen_random(state, bytes, 16) < 0) {

if (gen_random(state, bytes, sizeof(bytes)) < 0) {

vstinner · 2025-09-23T13:45:53Z

Modules/_uuidmodule.c

+        Py_INCREF(random_func);
+        Py_XSETREF(state->random_func, random_func);


Suggested change

Py_INCREF(random_func);

Py_XSETREF(state->random_func, random_func);

Py_XSETREF(state->random_func, Py_NewRef(random_func));

vstinner · 2025-09-23T13:46:14Z

Modules/_uuidmodule.c

+        Py_INCREF(time_func);
+        Py_XSETREF(state->time_func, time_func);


Suggested change

Py_INCREF(time_func);

Py_XSETREF(state->time_func, time_func);

Py_XSETREF(state->time_func, Py_NewRef(time_func));

vstinner · 2025-09-23T13:52:41Z

Modules/_uuidmodule.c

+    self->bytes[3] = time_low;
+
+    self->bytes[4] = time_mid >> 8;
+    self->bytes[5] = time_mid;


Suggested change

self->bytes[5] = time_mid;

self->bytes[5] = (uint8_t)time_mid;

vstinner · 2025-09-23T13:56:45Z

Modules/_uuidmodule.c

+    if (self == NULL) {
+        return NULL;
+    }
+    memset(self->bytes, 0, 16);


nipitck: would it be possible to add a UUID_SIZE constant (#define UUID_SIZE 16) and use it instead of hardcoded number 16?

Suggested change

memset(self->bytes, 0, 16);

memset(self->bytes, 0, UUID_SIZE);

vstinner · 2025-09-23T14:01:10Z

Modules/_uuidmodule.c

+    }
+
+    // Get the class name (sadly can't use tp_name -- we don't need the full name)
+    PyObject *cls_name = PyObject_GetAttrString((PyObject *)Py_TYPE(self), "__name__");


Can't you use PyType_GetName() here? Or maybe PyType_GetQualName()?

You can also use %N to get the fully qualified type name in PyUnicode_FromFormat().

akx · 2025-10-27T12:52:42Z

Modules/_uuidmodule.c

+        byte_to_hex(uuid->bytes[i], &str[24 + (i - 10) * 2]);
+    }
+
+    return PyUnicode_FromStringAndSize(str, 36);


You could probably eke out a little more performance by directly allocating a 36-byte-long 1-byte string and writing into it, á la https://github.com/python/cpython/pull/140666/files#diff-f08acf1e3d98794dd2c41177696af35438637baab52ab985951481647bd76caaR102-R107

vstinner · 2025-10-30T18:25:50Z

@1st1: Do you want me to apply my suggestions directly to move on on this PR?

1st1 requested review from ambv, pablogsal and zooba September 18, 2025 13:14

1st1 requested review from ZeroIntensity and ericsnowcurrently as code owners September 18, 2025 13:14

bedevere-app bot added the awaiting core review label Sep 18, 2025

1st1 changed the title ~~The C implementation considerably boosts the performance of the key UUID~~ Reimplement base UUID type, uuid4(), and uuid7() in C Sep 18, 2025

1st1 force-pushed the uuid_push branch 2 times, most recently from 2d0a381 to c4363f8 Compare September 18, 2025 13:22

1st1 changed the title ~~Reimplement base UUID type, uuid4(), and uuid7() in C~~ #139122: Reimplement base UUID type, uuid4(), and uuid7() in C Sep 18, 2025

1st1 changed the title ~~#139122: Reimplement base UUID type, uuid4(), and uuid7() in C~~ #gh-139122: Reimplement base UUID type, uuid4(), and uuid7() in C Sep 18, 2025

bedevere-app bot mentioned this pull request Sep 18, 2025

Implement UUID functions and type in C #139122

Open

1st1 force-pushed the uuid_push branch 2 times, most recently from fa12192 to 4a5c2a2 Compare September 18, 2025 13:24

zooba reviewed Sep 18, 2025

View reviewed changes

StanFromIreland changed the title ~~#gh-139122: Reimplement base UUID type, uuid4(), and uuid7() in C~~ gh-139122: Reimplement base UUID type, uuid4(), and uuid7() in C Sep 18, 2025

picnixz reviewed Sep 18, 2025

View reviewed changes

1st1 force-pushed the uuid_push branch from 4a5c2a2 to d244f95 Compare September 19, 2025 10:47

1st1 force-pushed the uuid_push branch 2 times, most recently from da233b4 to ba9217e Compare September 19, 2025 10:59

picnixz reviewed Sep 19, 2025

View reviewed changes

eendebakpt reviewed Sep 19, 2025

View reviewed changes

eendebakpt reviewed Sep 21, 2025

View reviewed changes

1st1 force-pushed the uuid_push branch from 1f24c14 to ab26ad9 Compare September 22, 2025 04:42

1st1 added 16 commits September 21, 2025 21:52

Address @picnixz's review

5ff81f3

Regen the clinic files

2b8508e

Clarify the type name

9261f1c

Fix news

7f2b904

Trim the NEWS file down

21ca384

Use PyObject* in getters

8bceabd

Codegen string literals and use them

d25862a

Ensure total compatibility of C/Python implementations of uuid4 / uuid7

4ba14a8

A side effect of some compatility fixes is the new code, with which the new C uuid7() is now 35x faster that pure Python (used to be 30x).

Stop importing unused SafeUUID members

0d4e605

Drop SafeUUID.unknown

b15b85b

Drop more things that can be imported dynamically

37685da

Remove circular import; import 'uuid' from '_uuid' lazily

f83d186

Regen files

baacd10

Get module state assuming there could be subclasses of UUID

bd67350

This fixes a segfault; regr test added. While there, make the freelist only work with our base UUID type and no subclasses of it.

More ergonomic int cast code

b10a1f7

1st1 force-pushed the uuid_push branch from 0d2f27e to b10a1f7 Compare September 22, 2025 04:53

picnixz reviewed Sep 22, 2025

View reviewed changes

bitemyapp mentioned this pull request Sep 22, 2025

C + Python/UUID rewrite in Rust bitemyapp/mentats#2

Open

picnixz reviewed Sep 23, 2025

View reviewed changes

vstinner reviewed Sep 23, 2025

View reviewed changes

yihong0618 mentioned this pull request Oct 27, 2025

gh-131196: Improve uuid.__str__() performance a little more #140663

Open

akx mentioned this pull request Oct 27, 2025

gh-131196: Add C implementation of UUID stringification #140666

Closed

akx reviewed Oct 27, 2025

View reviewed changes

		@@ -0,0 +1,57 @@
		Reimplement base UUID type, uuid4(), and uuid7() in C

	clock_seq_hi_variant, clock_seq_low, node) = fields
	clock_seq_hi_variant, clock_seq_low, node) = fields

	(time_hi_version << 64) \| (clock_seq << 48) \| node)
	(time_hi_version << 64) \| (clock_seq << 48) \| node)

	if (gen_random(state, bytes, 16) < 0) {
	if (gen_random(state, bytes, sizeof(bytes)) < 0) {

		Py_INCREF(random_func);
		Py_XSETREF(state->random_func, random_func);

	Py_INCREF(random_func);
	Py_XSETREF(state->random_func, random_func);
	Py_XSETREF(state->random_func, Py_NewRef(random_func));

		Py_INCREF(time_func);
		Py_XSETREF(state->time_func, time_func);

	Py_INCREF(time_func);
	Py_XSETREF(state->time_func, time_func);
	Py_XSETREF(state->time_func, Py_NewRef(time_func));

	self->bytes[5] = time_mid;
	self->bytes[5] = (uint8_t)time_mid;

	memset(self->bytes, 0, 16);
	memset(self->bytes, 0, UUID_SIZE);

Uh oh!

gh-139122: Reimplement base UUID type, uuid4(), and uuid7() in C #139123

Are you sure you want to change the base?

gh-139122: Reimplement base UUID type, uuid4(), and uuid7() in C #139123

Uh oh!

Conversation

1st1 commented Sep 18, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

python-cla-bot bot commented Sep 18, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

1st1 commented Sep 18, 2025

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

picnixz commented Sep 18, 2025

Uh oh!

picnixz left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

1st1 commented Sep 19, 2025

Uh oh!

picnixz commented Sep 19, 2025

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

1st1 commented Sep 19, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

1st1 commented Sep 21, 2025

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

hugovk commented Sep 21, 2025

Uh oh!

picnixz left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

1st1 commented Sep 18, 2025 •

edited

Loading

python-cla-bot bot commented Sep 18, 2025 •

edited

Loading

1st1 commented Sep 19, 2025 •

edited

Loading

picnixz left a comment •

edited

Loading