SCRU64 ID offers compact, time-ordered unique identifiers generated by distributed nodes. SCRU64 has the following features:
- 63-bit non-negative integer storable as signed/unsigned 64-bit integer
- Sortable by generation time (as integer and as text)
- 12-digit case-insensitive textual representation (Base36)
- ~38-bit Unix epoch-based timestamp that ensures useful life until year 4261
- Variable-length node/machine ID and counter fields that share 24 bits
Examples in the 12-digit canonical textual representation:
0u375nxqh5cq
0u375nxqh5cr
0u375nxqh5cs
0u375nxqh5ct
0u375ny0glr0
0u375ny0glr1
0u375ny0glr2
0u375ny0glr3
SCRU64's uniqueness is realm-specific, i.e., dependent on the centralized assignment of node ID to each generator. If you need decentralized, globally unique time-ordered identifiers, consider SCRU128. See the following comparison table for the properties of the two schemes.
| SCRU64 | SCRU128 | |
|---|---|---|
| Long name | Sortable, Clock-based, Realm-specifically Unique identifier | Sortable, Clock and Random number-based Unique identifier |
| Binary size | 63 bits | 128 bits |
| Textual size | 12 digits | 25 digits |
| Textual encoding | Base36 with 0-9a-z (case-insensitive) |
Base36 with 0-9a-z (case-insensitive) |
| Timestamp range | January 1, 1970 - February 27, 4261 (UTC) | January 1, 1970 - August 2, 10889 (UTC) |
| Timestamp resolution | 256 milliseconds | 1 millisecond |
| Number of IDs per timestamp | Up to 8.4 million per 256 milliseconds per node (configurable) | 140.7 trillion per millisecond per node (on average) |
| Number of distributed nodes | Up to 8.4 million (configurable) | No specific limitation |
| Source of uniqueness | Centrally pre-assigned node ID | Independently generated random numbers |
| Choose it when you ... | Prefer 64-bit integer for storage, indexing, and other reasons | Want unique IDs without hassle to coordinate generators |
A SCRU64 ID is a non-negative integer less than 36^12 (approx. 2^62)
consisting of three terms:
timestamp * 2^24 + node_id * 2^(24 - node_id_size) + counterWhere:
timestampis a non-negative integer less than36^12 / 2^24(from zero to282429536480) representing a 256-millisecond-precision Unix timestamp (i.e., the number of 256 milliseconds elapsed since 1970-01-01 00:00:00+00:00, ignoring leap seconds; or a Unix timestamp in milliseconds divided by 256).node_idis anode_id_size-bit unsigned integer uniquely assigned to each SCRU64 generator in a relevant scope, wherenode_id_sizeis an integer between 1 and 23, inclusive.- The method to assign unique
node_ids is implementation-dependent and is out of the scope of this specification. node_id_sizemay be chosen arbitrarily and may vary from node to node as long as the leading bits of a longernode_iddo not collide with shorternode_ids.
- The method to assign unique
counteris a24 - node_id_size-bit unsigned integer incremented by one whenever a generator produces a new ID.counteris reset to a random number whentimestampmoves forward.countershould be reset to a random number of the fullcountersize but may be reset to a smaller-bit (e.g., 15-bit for a 16-bitcounter) random number to reserve the leading bits as an overflow guard to guarantee the room for a certain number of IDs within atimestamptick.
This definition is equivalent to the following bitwise operations using 64-bit integers:
timestamp << 24 | node_id << (24 - node_id_size) | counterFor the following value ranges:
timestamp = unix_timestamp_in_milliseconds >> 8
0 <= timestamp <= 282429536480
1 <= node_id_size <= 23
0 <= node_id < 1 << node_id_size
0 <= counter < 1 << (24 - node_id_size)A SCRU64 ID is usually represented as a 64-bit unsigned integer but may be
expressed as a signed integer, a byte array, or any other form as long as it
encodes integers from zero to the maximum value (36^12 - 1).
A SCRU64 ID is encoded in a string using the Base36 encoding. The Base36
denotes a SCRU64 ID as an unsigned integer in the radix of 36 using the digits
of 0-9a-z (0123456789abcdefghijklmnopqrstuvwxyz), with leading zeros added
to form a 12-digit canonical representation. The following pseudo equation
illustrates the encoding algorithm:
109959589539758421
= 0 * 36^11 + 30 * 36^10 + 2 * 36^9 + ... + 4 * 36^2 + 11 * 36^1 + 9
= '0' * 36^11 + 'u' * 36^10 + '2' * 36^9 + ... + '4' * 36^2 + 'b' * 36^1 + '9'
= "0u2pf62ji4b9"
The maximum value (36^12 - 1) of SCRU64 ID is defined as the maximum value
denotable in a 12-digit Base36 string (i.e., zzzzzzzzzzzz).
For the sake of uniformity, an encoder should use lowercase letters in encoding IDs. A decoder, on the other hand, must always ignore cases when interpreting or lexicographically sorting encoded IDs.
The Base36 encoding shown above is available by default in several languages
(e.g., strconv.FormatUint() and strconv.ParseUint() in Go). Another simple
way to implement it is by using 64-bit integer division and modulo operations.
The following C code illustrates the straightforward algorithm:
uint64_t num = UINT64_C(109959589539758421);
static const char digits[] = "0123456789abcdefghijklmnopqrstuvwxyz";
char text[13];
for (int i = 11; i >= 0; i--) {
text[i] = digits[num % 36];
num /= 36;
}
text[12] = '\0';
puts(text); // 0u2pf62ji4b9The IDs with timestamp set at zero or 36^12 / 2^24 - 1 are reserved for
special purposes (e.g., use as dummy, error, or example values) and must not be
used or assigned as an identifier of anything.
The random numbers used need not be of cryptographic quality because small
random numbers are insecure anyway. This specification introduces randomness not
as a source of uniqueness or unguessability but primarily as a thin protection
against unintended duplication of node_ids by accidents and mistakes.
Implementations should not extract the node_id from a SCRU64 ID to identify
the generator because:
node_idis not necessarily a persistent ID of a single node and may change over time.node_id_sizemay also change over time to adjust the trade-off between the number of distributed nodes and the number of IDs per node.
node_id is embedded in an ID solely for the purpose of uniqueness guarantee
across distributed nodes, and thus the use of node_id for any other reason may
harm the extensibility of the variable node_id_size design.
Counter overflow occurs when the counter field does not provide sufficient
space for the IDs generated within a timestamp tick. The counter of SCRU64
IDs tends to overflow when the workload is high because the scheme is only able
to spare a limited number of bits for counter. Therefore, generators must
implement reasonable logic to handle such overflows. The recommended approach is
to increment timestamp and continue in the following way:
- Increment
timestampby one. - Reset
counterto a random number.
This approach is recommended over other options such as the "sleep till next tick" approach because this technique allows the generation of monotonically ordered IDs in a non-blocking manner. Raising an error on a counter overflow is generally not recommended because a counter overflow is not a fault of users of SCRU64.
This approach results in a greater timestamp value than the real-time clock.
Such a gap between timestamp and the wall clock should be handled as a small
clock rollback discussed below.
A SCRU64 generator relies on a real-time clock to ensure the monotonic order of generated IDs; therefore, it cannot guarantee monotonicity when the clock moves back. When a generator detects a clock rollback by comparing the up-to-date timestamp from the system clock and the one embedded in the last generated ID, the recommended treatment is:
- If the rollback is small enough (e.g., a few seconds), treat the
timestampof the last generated ID as the up-to-date one, betting that the wall clock will catch up soon. - Otherwise, stall the generator and wait for the next
timestamptick, reset the generator with another uniquenode_id, or abort and raise an error.
This approach ensures a prompt response when a clock rollback is small, while
the generator behavior will be implementation-dependent if the clock rollback is
significant, or if the demand for IDs is so high that the counter overflow
handling discussed above results in a timestamp significantly advanced from
the current timestamp.
Resetting timestamp without refreshing node_id is highly discouraged because
it results in a very high risk of duplicate IDs.
The reference implementations include a default global generator that reads the
node_id and node_id_size configuration from an environment variable. Pass a
unique node configuration encoded in a node specifier string through the
SCRU64_NODE_SPEC environment variable to invoke a process, and the invoked
application will have access to the global SCRU64 generator configured with the
given node information. For example:
SCRU64_NODE_SPEC=42/8 npx scru64 -n 4A node spec string starts with a decimal node_id, a hexadecimal node_id
prefixed by "0x", or a 12-digit node_prev SCRU64 ID value, followed by a slash
and a decimal node_id_size value ranging from 1 to 23 (e.g., "42/8",
"0xb00/12", "0u2r85hm2pt3/16"). The first and second forms create a fresh new
generator with the given node_id, while the third form constructs one that
generates subsequent SCRU64 IDs to the node_prev.
The global generator raises an error if a proper node spec is not passed, because a SCRU64 generator is not able to generate a unique ID without knowledge of a unique node configuration.
SCRU64 employs the variable node_id_size design to pursue the right balance
between the number of distributed nodes and the number of IDs generated by each
node within a 256-millisecond timestamp tick. If an application consists of
many independent nodes that generate a few IDs per timestamp, a large
node_id_size will fit. If an application deploys a few nodes that generate
many IDs, then a small node_id_size should be used.
With the variable-size design, an application can operate short node_ids and
long node_ids together if the leading bits of longer node_ids are carefully
assigned not to collide with short node_ids. This approach is useful to mix
hot nodes that generate many and cold nodes that do not. For example:
# Hot nodes take 0b0000, 0b0001, ..., 0b0111
SCRU64_NODE_SPEC=0x0/4 launch_hot_node
SCRU64_NODE_SPEC=0x1/4 launch_hot_node
# Cold nodes use 0b1000_0000, 0b1000_0001, ..., 0b1111_1111
SCRU64_NODE_SPEC=0x80/8 launch_cold_node
SCRU64_NODE_SPEC=0x81/8 launch_cold_node
SCRU64_NODE_SPEC=0x82/8 launch_cold_node
SCRU64_NODE_SPEC=0x83/8 launch_cold_nodeNode configurations may vary from time to time. Theoretically, anode_id must
be unique in a single timestamp tick only, so two generators sharing one
node_id will not collide with each other as long as they operate in totally
different timestamp spaces. A typical use case of this property is to expire
all the node_ids at the end of a day and lease new ones that are adaptively
customized based on the stats of yesterday.
This work is licensed under a Creative Commons Attribution 4.0 International (CC BY 4.0) License.