[feature](catalog) support varbinary type mapping in hive/iceberg/paimon table#57821
Merged
zhangstar333 merged 27 commits intoapache:masterfrom Nov 26, 2025
Merged
[feature](catalog) support varbinary type mapping in hive/iceberg/paimon table#57821zhangstar333 merged 27 commits intoapache:masterfrom
zhangstar333 merged 27 commits intoapache:masterfrom
Conversation
This comment was marked as duplicate.
This comment was marked as duplicate.
2ada766 to
94c865d
Compare
Contributor
Author
|
run buildall |
Cloud UT Coverage ReportIncrement line coverage Increment coverage report
|
Contributor
FE UT Coverage ReportIncrement line coverage |
TPC-H: Total hot run time: 34948 ms |
TPC-DS: Total hot run time: 188642 ms |
ClickBench: Total hot run time: 28.57 s |
Contributor
BE UT Coverage ReportIncrement line coverage Increment coverage report
|
ClickBench: Total hot run time: 28.39 s |
Contributor
BE UT Coverage ReportIncrement line coverage Increment coverage report
|
Contributor
BE Regression && UT Coverage ReportIncrement line coverage Increment coverage report
|
Contributor
FE Regression Coverage ReportIncrement line coverage |
morningman
approved these changes
Nov 26, 2025
Contributor
|
PR approved by at least one committer and no changes requested. |
hubgeter
approved these changes
Nov 26, 2025
zclllyybb
approved these changes
Nov 26, 2025
16 tasks
zhangstar333
added a commit
that referenced
this pull request
Nov 27, 2025
### What problem does this PR solve? Problem Summary: support varbinary type mapping in DB2,MYSQL,Oracle,PostgreSQL,SQLServer JDBC catalog. u can control this when create catalog with property "enable.mapping.varbinary", default value is false. if it's true, will mapping the binary type to doris varbinary type, if it's false, will mapping the binary type to doris string type. Followup #57821 ### Release note support mapping varbinary type in JBDC catalog
github-actions bot
pushed a commit
that referenced
this pull request
Nov 27, 2025
### What problem does this PR solve? Problem Summary: support varbinary type mapping in DB2,MYSQL,Oracle,PostgreSQL,SQLServer JDBC catalog. u can control this when create catalog with property "enable.mapping.varbinary", default value is false. if it's true, will mapping the binary type to doris varbinary type, if it's false, will mapping the binary type to doris string type. Followup #57821 ### Release note support mapping varbinary type in JBDC catalog
zhangstar333
added a commit
to zhangstar333/incubator-doris
that referenced
this pull request
Nov 28, 2025
…mon table (apache#57821) Problem Summary: support varbinary type in hive/iceberg/paimon table, could mapping varbinary type into doris directly, not of use string type, could use catalog properties enable.mapping.varbinary control it, and default is false. and TVF function, eg HDFS also have param could control, and default is false. 1. when parquet file column type is tparquet::Type::BYTE_ARRAY and no logicalType and converted_type,read it to column_varbianry directly. so both physical convert and logical convert are consistent. if tparquet::Type::BYTE_ARRAY and have set logicalType, eg String, so those will be reading as column_string, and if the table column create as binary column, so VarBinaryConverter used convert column_string to column_varbinary. 2. when orc file column is binary type, also mapping to varbinary type directly, and could reuse StringVectorBatch. 3. add cast between string and varbinary type. 4. mapping UUID to binary type instead of string in iceberg . 5. change the bool safe_cast_string(**const char\* startptr, size_t buffer_size**, xxx) signature to safe_cast_string(**const StringRef& str_ref**, xxx). 6. add **const** to read_date_text_impl function. 7. add some test with paimon catalog test varbinary, will add more case for hive/iceberg and update doc. ``` mysql> show create table binary_demo3; +--------------+--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ | Table | Create Table | +--------------+--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ | binary_demo3 | CREATE TABLE `binary_demo3` ( `id` int NULL, `record_name` char(10) NULL, `vrecord_name` text NULL, `bin` varbinary(10) NULL, `varbin` varbinary(2147483647) NULL ) ENGINE=PAIMON_EXTERNAL_TABLE LOCATION 'file:/mnt/disk2/zhangsida/test_paimon/demo.db/binary_demo3' PROPERTIES ( "path" = "file:/mnt/disk2/zhangsida/test_paimon/demo.db/binary_demo3", "primary-key" = "id" ); | +--------------+--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ 1 row in set (0.00 sec) mysql> select *, length(record_name),length(vrecord_name),length(bin),length(varbin) from binary_demo3; +------+-------------+--------------+------------------------+----------------+---------------------+----------------------+-------------+----------------+ | id | record_name | vrecord_name | bin | varbin | length(record_name) | length(vrecord_name) | length(bin) | length(varbin) | +------+-------------+--------------+------------------------+----------------+---------------------+----------------------+-------------+----------------+ | 1 | AAAA | AAAA | 0xAAAA0000000000000000 | 0xAAAA | 10 | 4 | 10 | 2 | | 2 | 6161 | 6161 | 0x61610000000000000000 | 0x6161 | 10 | 4 | 10 | 2 | | 3 | NULL | NULL | NULL | NULL | NULL | NULL | NULL | NULL | +------+-------------+--------------+------------------------+----------------+---------------------+----------------------+-------------+----------------+ ``` support varbinary type mapping in hive/iceberg/paimon table
16 tasks
yiguolei
pushed a commit
that referenced
this pull request
Nov 28, 2025
…/iceberg/paimon table (#57821) (#58482) ### What problem does this PR solve? Problem Summary: cherry-pick from (#57821) ### Release note None ### Check List (For Author) - Test <!-- At least one of them must be included. --> - [ ] Regression test - [ ] Unit Test - [ ] Manual test (add detailed scripts or steps below) - [ ] No need to test or manual test. Explain why: - [ ] This is a refactor/code format and no logic has been changed. - [ ] Previous test can cover this change. - [ ] No code files have been changed. - [ ] Other reason <!-- Add your reason? --> - Behavior changed: - [ ] No. - [ ] Yes. <!-- Explain the behavior change --> - Does this need documentation? - [ ] No. - [ ] Yes. <!-- Add document PR link here. eg: apache/doris-website#1214 --> ### Check List (For Reviewer who merge this PR) - [ ] Confirm the release note - [ ] Confirm test cases - [ ] Confirm document - [ ] Add branch pick label <!-- Add branch pick label that this PR should merge into -->
zhangstar333
added a commit
that referenced
this pull request
Nov 28, 2025
### What problem does this PR solve? Problem Summary: support varbinary type mapping in DB2,MYSQL,Oracle,PostgreSQL,SQLServer JDBC catalog. u can control this when create catalog with property "enable.mapping.varbinary", default value is false. if it's true, will mapping the binary type to doris varbinary type, if it's false, will mapping the binary type to doris string type. Followup #57821 ### Release note support mapping varbinary type in JBDC catalog
16 tasks
morningman
pushed a commit
that referenced
this pull request
Dec 1, 2025
### What problem does this PR solve? the introduced pr: #57821
github-actions bot
pushed a commit
that referenced
this pull request
Dec 1, 2025
### What problem does this PR solve? the introduced pr: #57821
nagisa-kunhah
pushed a commit
to nagisa-kunhah/doris
that referenced
this pull request
Dec 14, 2025
…mon table (apache#57821) ### What problem does this PR solve? Problem Summary: support varbinary type in hive/iceberg/paimon table, could mapping varbinary type into doris directly, not of use string type, could use catalog properties enable.mapping.varbinary control it, and default is false. and TVF function, eg HDFS also have param could control, and default is false. 1. when parquet file column type is tparquet::Type::BYTE_ARRAY and no logicalType and converted_type,read it to column_varbianry directly. so both physical convert and logical convert are consistent. if tparquet::Type::BYTE_ARRAY and have set logicalType, eg String, so those will be reading as column_string, and if the table column create as binary column, so VarBinaryConverter used convert column_string to column_varbinary. 2. when orc file column is binary type, also mapping to varbinary type directly, and could reuse StringVectorBatch. 3. add cast between string and varbinary type. 4. mapping UUID to binary type instead of string in iceberg . 5. change the bool safe_cast_string(**const char\* startptr, size_t buffer_size**, xxx) signature to safe_cast_string(**const StringRef& str_ref**, xxx). 6. add **const** to read_date_text_impl function. 7. add some test with paimon catalog test varbinary, will add more case for hive/iceberg and update doc. ``` mysql> show create table binary_demo3; +--------------+--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ | Table | Create Table | +--------------+--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ | binary_demo3 | CREATE TABLE `binary_demo3` ( `id` int NULL, `record_name` char(10) NULL, `vrecord_name` text NULL, `bin` varbinary(10) NULL, `varbin` varbinary(2147483647) NULL ) ENGINE=PAIMON_EXTERNAL_TABLE LOCATION 'file:/mnt/disk2/zhangsida/test_paimon/demo.db/binary_demo3' PROPERTIES ( "path" = "file:/mnt/disk2/zhangsida/test_paimon/demo.db/binary_demo3", "primary-key" = "id" ); | +--------------+--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ 1 row in set (0.00 sec) mysql> select *, length(record_name),length(vrecord_name),length(bin),length(varbin) from binary_demo3; +------+-------------+--------------+------------------------+----------------+---------------------+----------------------+-------------+----------------+ | id | record_name | vrecord_name | bin | varbin | length(record_name) | length(vrecord_name) | length(bin) | length(varbin) | +------+-------------+--------------+------------------------+----------------+---------------------+----------------------+-------------+----------------+ | 1 | AAAA | AAAA | 0xAAAA0000000000000000 | 0xAAAA | 10 | 4 | 10 | 2 | | 2 | 6161 | 6161 | 0x61610000000000000000 | 0x6161 | 10 | 4 | 10 | 2 | | 3 | NULL | NULL | NULL | NULL | NULL | NULL | NULL | NULL | +------+-------------+--------------+------------------------+----------------+---------------------+----------------------+-------------+----------------+ ``` ### Release note support varbinary type mapping in hive/iceberg/paimon table
nagisa-kunhah
pushed a commit
to nagisa-kunhah/doris
that referenced
this pull request
Dec 14, 2025
…e#58215) ### What problem does this PR solve? Problem Summary: support varbinary type mapping in DB2,MYSQL,Oracle,PostgreSQL,SQLServer JDBC catalog. u can control this when create catalog with property "enable.mapping.varbinary", default value is false. if it's true, will mapping the binary type to doris varbinary type, if it's false, will mapping the binary type to doris string type. Followup apache#57821 ### Release note support mapping varbinary type in JBDC catalog
nagisa-kunhah
pushed a commit
to nagisa-kunhah/doris
that referenced
this pull request
Dec 14, 2025
### What problem does this PR solve? the introduced pr: apache#57821
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
What problem does this PR solve?
Problem Summary:
support varbinary type in hive/iceberg/paimon table, could mapping varbinary type into doris directly, not of use string type, could use catalog properties enable.mapping.varbinary control it, and default is false.
and TVF function, eg HDFS also have param could control, and default is false.
when parquet file column type is tparquet::Type::BYTE_ARRAY and no logicalType and converted_type,read it to column_varbianry directly. so both physical convert and logical convert are consistent.
if tparquet::Type::BYTE_ARRAY and have set logicalType, eg String, so those will be reading as column_string, and if the table column create as binary column, so VarBinaryConverter used convert column_string to column_varbinary.
when orc file column is binary type, also mapping to varbinary type directly, and could reuse StringVectorBatch.
add cast between string and varbinary type.
mapping UUID to binary type instead of string in iceberg .
change the bool safe_cast_string(const char* startptr, size_t buffer_size, xxx) signature to safe_cast_string(const StringRef& str_ref, xxx).
add const to read_date_text_impl function.
add some test with paimon catalog test varbinary, will add more case for hive/iceberg and update doc.
Release note
None
Check List (For Author)
Test
Behavior changed:
Does this need documentation?
Check List (For Reviewer who merge this PR)