Skip to content

branch-4.0: [feature](catalog) support varbinary type mapping in hive/iceberg/paimon table (#57821)#58482

Merged
yiguolei merged 3 commits intoapache:branch-4.0from
zhangstar333:branch-4.0
Nov 28, 2025
Merged

branch-4.0: [feature](catalog) support varbinary type mapping in hive/iceberg/paimon table (#57821)#58482
yiguolei merged 3 commits intoapache:branch-4.0from
zhangstar333:branch-4.0

Conversation

@zhangstar333
Copy link
Contributor

What problem does this PR solve?

Problem Summary:
cherry-pick from (#57821)

Release note

None

Check List (For Author)

  • Test

    • Regression test
    • Unit Test
    • Manual test (add detailed scripts or steps below)
    • No need to test or manual test. Explain why:
      • This is a refactor/code format and no logic has been changed.
      • Previous test can cover this change.
      • No code files have been changed.
      • Other reason
  • Behavior changed:

    • No.
    • Yes.
  • Does this need documentation?

    • No.
    • Yes.

Check List (For Reviewer who merge this PR)

  • Confirm the release note
  • Confirm test cases
  • Confirm document
  • Add branch pick label

…mon table (apache#57821)

Problem Summary:
support varbinary type in hive/iceberg/paimon table, could mapping
varbinary type into doris directly, not of use string type, could use
catalog properties enable.mapping.varbinary control it, and default is
false.
and TVF function, eg HDFS also have param could control, and default is
false.

1. when parquet file column type is tparquet::Type::BYTE_ARRAY and no
logicalType and converted_type,read it to column_varbianry directly. so
both physical convert and logical convert are consistent.

if tparquet::Type::BYTE_ARRAY and have set logicalType, eg String, so
those will be reading as column_string, and if the table column create
as binary column, so VarBinaryConverter used convert column_string to
column_varbinary.

2. when orc file column is binary type, also mapping to varbinary type
directly, and could reuse StringVectorBatch.

3. add cast between string and varbinary type.

4. mapping UUID to binary type instead of string  in iceberg .

5. change the bool safe_cast_string(**const char\* startptr, size_t
buffer_size**, xxx) signature to safe_cast_string(**const StringRef&
str_ref**, xxx).

6. add **const** to read_date_text_impl function.

7. add some test with paimon catalog test varbinary, will add more case
for hive/iceberg and update doc.
```
mysql> show create table binary_demo3;
+--------------+--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| Table        | Create Table                                                                                                                                                                                                                                                                                                                                                                                     |
+--------------+--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| binary_demo3 | CREATE TABLE `binary_demo3` (
  `id` int NULL,
  `record_name` char(10) NULL,
  `vrecord_name` text NULL,
  `bin` varbinary(10) NULL,
  `varbin` varbinary(2147483647) NULL
) ENGINE=PAIMON_EXTERNAL_TABLE
LOCATION 'file:/mnt/disk2/zhangsida/test_paimon/demo.db/binary_demo3'
PROPERTIES (
  "path" = "file:/mnt/disk2/zhangsida/test_paimon/demo.db/binary_demo3",
  "primary-key" = "id"
); |
+--------------+--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
1 row in set (0.00 sec)

mysql> select *, length(record_name),length(vrecord_name),length(bin),length(varbin) from binary_demo3;
+------+-------------+--------------+------------------------+----------------+---------------------+----------------------+-------------+----------------+
| id   | record_name | vrecord_name | bin                    | varbin         | length(record_name) | length(vrecord_name) | length(bin) | length(varbin) |
+------+-------------+--------------+------------------------+----------------+---------------------+----------------------+-------------+----------------+
|    1 | AAAA        | AAAA         | 0xAAAA0000000000000000 | 0xAAAA         |                  10 |                    4 |          10 |              2 |
|    2 | 6161        | 6161         | 0x61610000000000000000 | 0x6161         |                  10 |                    4 |          10 |              2 |
|    3 | NULL        | NULL         | NULL                   | NULL           |                NULL |                 NULL |        NULL |           NULL |
+------+-------------+--------------+------------------------+----------------+---------------------+----------------------+-------------+----------------+

```

support varbinary type mapping in hive/iceberg/paimon table
@hello-stephen
Copy link
Contributor

Thank you for your contribution to Apache Doris.
Don't know what should be done next? See How to process your PR.

Please clearly describe your PR:

  1. What problem was fixed (it's best to include specific error reporting information). How it was fixed.
  2. Which behaviors were modified. What was the previous behavior, what is it now, why was it modified, and what possible impacts might there be.
  3. What features were added. Why was this function added?
  4. Which code was refactored and why was this part of the code refactored?
  5. Which functions were optimized and what is the difference before and after the optimization?

@zhangstar333
Copy link
Contributor Author

run buildall

@hello-stephen
Copy link
Contributor

Cloud UT Coverage Report

Increment line coverage 🎉

Increment coverage report
Complete coverage report

Category Coverage
Function Coverage 83.49% (1573/1884)
Line Coverage 67.62% (28036/41459)
Region Coverage 68.13% (13805/20264)
Branch Coverage 58.41% (7366/12610)

@zhangstar333
Copy link
Contributor Author

run buildall

@hello-stephen
Copy link
Contributor

Cloud UT Coverage Report

Increment line coverage 🎉

Increment coverage report
Complete coverage report

Category Coverage
Function Coverage 83.49% (1573/1884)
Line Coverage 67.66% (28051/41459)
Region Coverage 68.19% (13819/20264)
Branch Coverage 58.39% (7363/12610)

@doris-robot
Copy link

BE UT Coverage Report

Increment line coverage 79.40% (212/267) 🎉

Increment coverage report
Complete coverage report

Category Coverage
Function Coverage 52.71% (18206/34543)
Line Coverage 38.09% (165752/435153)
Region Coverage 33.13% (128817/388806)
Branch Coverage 33.95% (55477/163425)

@github-actions github-actions bot added the approved Indicates a PR has been approved by one committer. label Nov 28, 2025
@github-actions
Copy link
Contributor

PR approved by at least one committer and no changes requested.

@github-actions
Copy link
Contributor

PR approved by anyone and no changes requested.

@yiguolei yiguolei merged commit e11a065 into apache:branch-4.0 Nov 28, 2025
22 of 26 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

approved Indicates a PR has been approved by one committer. reviewed

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants

Comments