[Feature](func) Support function mmh64_v2#57180
Conversation
|
Thank you for your contribution to Apache Doris. Please clearly describe your PR:
|
regression-test/suites/query_p0/sql_functions/hash_functions/test_hash_function.groovy
Show resolved
Hide resolved
|
PR approved by at least one committer and no changes requested. |
|
PR approved by anyone and no changes requested. |
|
run buildall |
FE UT Coverage ReportIncrement line coverage |
ClickBench: Total hot run time: 28.47 s |
BE UT Coverage ReportIncrement line coverage Increment coverage report
|
BE Regression && UT Coverage ReportIncrement line coverage Increment coverage report
|
FE Regression Coverage ReportIncrement line coverage |
BE Regression && UT Coverage ReportIncrement line coverage Increment coverage report
|
FE Regression Coverage ReportIncrement line coverage |
|
run buildall |
ClickBench: Total hot run time: 30.05 s |
BE UT Coverage ReportIncrement line coverage Increment coverage report
|
BE Regression && UT Coverage ReportIncrement line coverage Increment coverage report
|
FE Regression Coverage ReportIncrement line coverage |
|
PR approved by at least one committer and no changes requested. |
1. alias `xxhash3_64` for `xxhash_64`
2. Support function `murmur_hash3_64_v2`
Before:
```text
mysql> SELECT MURMUR_HASH3_64('1000209601_1756808272');
+------------------------------------------+
| MURMUR_HASH3_64('1000209601_1756808272') |
+------------------------------------------+
| 1680830166030441144 |
+------------------------------------------+
```
This result differs from the return value of directly calling
mmh3.hash64 in other languages, because external APIs all use mmh3_128
for calculation and then truncate the first 64 bits. The 64-bit version
inside Doris compared to the 128-bit version lacks an h2 register (the
64-bit version only has h1), which will affect the final mixing
calculation of h1 and h2 in the algorithm, leading to issues in the
result.
After support:
```text
mysql> SELECT MURMUR_HASH3_64_V2('1000209601_1756808272');
+---------------------------------------------+
| MURMUR_HASH3_64_V2('1000209601_1756808272') |
+---------------------------------------------+
| 4038800892574899471 |
+---------------------------------------------+
```
The result now is completely the same as the API call.
1. alias `xxhash3_64` for `xxhash_64`
2. Support function `murmur_hash3_64_v2`
Before:
```text
mysql> SELECT MURMUR_HASH3_64('1000209601_1756808272');
+------------------------------------------+
| MURMUR_HASH3_64('1000209601_1756808272') |
+------------------------------------------+
| 1680830166030441144 |
+------------------------------------------+
```
This result differs from the return value of directly calling
mmh3.hash64 in other languages, because external APIs all use mmh3_128
for calculation and then truncate the first 64 bits. The 64-bit version
inside Doris compared to the 128-bit version lacks an h2 register (the
64-bit version only has h1), which will affect the final mixing
calculation of h1 and h2 in the algorithm, leading to issues in the
result.
After support:
```text
mysql> SELECT MURMUR_HASH3_64_V2('1000209601_1756808272');
+---------------------------------------------+
| MURMUR_HASH3_64_V2('1000209601_1756808272') |
+---------------------------------------------+
| 4038800892574899471 |
+---------------------------------------------+
```
The result now is completely the same as the API call.
1. alias `xxhash3_64` for `xxhash_64`
2. Support function `murmur_hash3_64_v2`
Before:
```text
mysql> SELECT MURMUR_HASH3_64('1000209601_1756808272');
+------------------------------------------+
| MURMUR_HASH3_64('1000209601_1756808272') |
+------------------------------------------+
| 1680830166030441144 |
+------------------------------------------+
```
This result differs from the return value of directly calling
mmh3.hash64 in other languages, because external APIs all use mmh3_128
for calculation and then truncate the first 64 bits. The 64-bit version
inside Doris compared to the 128-bit version lacks an h2 register (the
64-bit version only has h1), which will affect the final mixing
calculation of h1 and h2 in the algorithm, leading to issues in the
result.
After support:
```text
mysql> SELECT MURMUR_HASH3_64_V2('1000209601_1756808272');
+---------------------------------------------+
| MURMUR_HASH3_64_V2('1000209601_1756808272') |
+---------------------------------------------+
| 4038800892574899471 |
+---------------------------------------------+
```
The result now is completely the same as the API call.
1. alias `xxhash3_64` for `xxhash_64`
2. Support function `murmur_hash3_64_v2`
Before:
```text
mysql> SELECT MURMUR_HASH3_64('1000209601_1756808272');
+------------------------------------------+
| MURMUR_HASH3_64('1000209601_1756808272') |
+------------------------------------------+
| 1680830166030441144 |
+------------------------------------------+
```
This result differs from the return value of directly calling
mmh3.hash64 in other languages, because external APIs all use mmh3_128
for calculation and then truncate the first 64 bits. The 64-bit version
inside Doris compared to the 128-bit version lacks an h2 register (the
64-bit version only has h1), which will affect the final mixing
calculation of h1 and h2 in the algorithm, leading to issues in the
result.
After support:
```text
mysql> SELECT MURMUR_HASH3_64_V2('1000209601_1756808272');
+---------------------------------------------+
| MURMUR_HASH3_64_V2('1000209601_1756808272') |
+---------------------------------------------+
| 4038800892574899471 |
+---------------------------------------------+
```
The result now is completely the same as the API call.
1. alias `xxhash3_64` for `xxhash_64`
2. Support function `murmur_hash3_64_v2`
Before:
```text
mysql> SELECT MURMUR_HASH3_64('1000209601_1756808272');
+------------------------------------------+
| MURMUR_HASH3_64('1000209601_1756808272') |
+------------------------------------------+
| 1680830166030441144 |
+------------------------------------------+
```
This result differs from the return value of directly calling
mmh3.hash64 in other languages, because external APIs all use mmh3_128
for calculation and then truncate the first 64 bits. The 64-bit version
inside Doris compared to the 128-bit version lacks an h2 register (the
64-bit version only has h1), which will affect the final mixing
calculation of h1 and h2 in the algorithm, leading to issues in the
result.
After support:
```text
mysql> SELECT MURMUR_HASH3_64_V2('1000209601_1756808272');
+---------------------------------------------+
| MURMUR_HASH3_64_V2('1000209601_1756808272') |
+---------------------------------------------+
| 4038800892574899471 |
+---------------------------------------------+
```
The result now is completely the same as the API call.
1. alias `xxhash3_64` for `xxhash_64`
2. Support function `murmur_hash3_64_v2`
Before:
```text
mysql> SELECT MURMUR_HASH3_64('1000209601_1756808272');
+------------------------------------------+
| MURMUR_HASH3_64('1000209601_1756808272') |
+------------------------------------------+
| 1680830166030441144 |
+------------------------------------------+
```
This result differs from the return value of directly calling
mmh3.hash64 in other languages, because external APIs all use mmh3_128
for calculation and then truncate the first 64 bits. The 64-bit version
inside Doris compared to the 128-bit version lacks an h2 register (the
64-bit version only has h1), which will affect the final mixing
calculation of h1 and h2 in the algorithm, leading to issues in the
result.
After support:
```text
mysql> SELECT MURMUR_HASH3_64_V2('1000209601_1756808272');
+---------------------------------------------+
| MURMUR_HASH3_64_V2('1000209601_1756808272') |
+---------------------------------------------+
| 4038800892574899471 |
+---------------------------------------------+
```
The result now is completely the same as the API call.
1. alias `xxhash3_64` for `xxhash_64`
2. Support function `murmur_hash3_64_v2`
Before:
```text
mysql> SELECT MURMUR_HASH3_64('1000209601_1756808272');
+------------------------------------------+
| MURMUR_HASH3_64('1000209601_1756808272') |
+------------------------------------------+
| 1680830166030441144 |
+------------------------------------------+
```
This result differs from the return value of directly calling
mmh3.hash64 in other languages, because external APIs all use mmh3_128
for calculation and then truncate the first 64 bits. The 64-bit version
inside Doris compared to the 128-bit version lacks an h2 register (the
64-bit version only has h1), which will affect the final mixing
calculation of h1 and h2 in the algorithm, leading to issues in the
result.
After support:
```text
mysql> SELECT MURMUR_HASH3_64_V2('1000209601_1756808272');
+---------------------------------------------+
| MURMUR_HASH3_64_V2('1000209601_1756808272') |
+---------------------------------------------+
| 4038800892574899471 |
+---------------------------------------------+
```
The result now is completely the same as the API call.
1. alias `xxhash3_64` for `xxhash_64`
2. Support function `murmur_hash3_64_v2`
Before:
```text
mysql> SELECT MURMUR_HASH3_64('1000209601_1756808272');
+------------------------------------------+
| MURMUR_HASH3_64('1000209601_1756808272') |
+------------------------------------------+
| 1680830166030441144 |
+------------------------------------------+
```
This result differs from the return value of directly calling
mmh3.hash64 in other languages, because external APIs all use mmh3_128
for calculation and then truncate the first 64 bits. The 64-bit version
inside Doris compared to the 128-bit version lacks an h2 register (the
64-bit version only has h1), which will affect the final mixing
calculation of h1 and h2 in the algorithm, leading to issues in the
result.
After support:
```text
mysql> SELECT MURMUR_HASH3_64_V2('1000209601_1756808272');
+---------------------------------------------+
| MURMUR_HASH3_64_V2('1000209601_1756808272') |
+---------------------------------------------+
| 4038800892574899471 |
+---------------------------------------------+
```
The result now is completely the same as the API call.
1. alias `xxhash3_64` for `xxhash_64`
2. Support function `murmur_hash3_64_v2`
Before:
```text
mysql> SELECT MURMUR_HASH3_64('1000209601_1756808272');
+------------------------------------------+
| MURMUR_HASH3_64('1000209601_1756808272') |
+------------------------------------------+
| 1680830166030441144 |
+------------------------------------------+
```
This result differs from the return value of directly calling
mmh3.hash64 in other languages, because external APIs all use mmh3_128
for calculation and then truncate the first 64 bits. The 64-bit version
inside Doris compared to the 128-bit version lacks an h2 register (the
64-bit version only has h1), which will affect the final mixing
calculation of h1 and h2 in the algorithm, leading to issues in the
result.
After support:
```text
mysql> SELECT MURMUR_HASH3_64_V2('1000209601_1756808272');
+---------------------------------------------+
| MURMUR_HASH3_64_V2('1000209601_1756808272') |
+---------------------------------------------+
| 4038800892574899471 |
+---------------------------------------------+
```
The result now is completely the same as the API call.
## Proposed changes pick: apache#57180
What problem does this PR solve?
Issue Number: close #xxx
Related PR: #xxx
Problem Summary:
Release note
alias
xxhash3_64forxxhash_64Support function
murmur_hash3_64_v2Before:
This result differs from the return value of directly calling mmh3.hash64 in other languages, because external APIs all use mmh3_128 for calculation and then truncate the first 64 bits. The 64-bit version inside Doris compared to the 128-bit version lacks an h2 register (the 64-bit version only has h1), which will affect the final mixing calculation of h1 and h2 in the algorithm, leading to issues in the result.
After support:
The result now is completely the same as the API call.
Check List (For Author)
Test
Behavior changed:
Does this need documentation?
Check List (For Reviewer who merge this PR)