[fix](paimon)Support user-defined S3 config prefixes and unify to HDFS S3A protocol#57116
Conversation
…FS S3A protocol Previously, for FS-based Paimon catalogs, internal configuration translation was performed to ensure the storage layer used S3FileIO (which internally relied on Hadoop S3). However, Paimon also allows users to specify S3-related options with various prefixes such as s3., s3a., or fs.s3a. in their configuration. The S3FileIO implementation in Paimon would automatically normalize these keys to the standard Hadoop prefix fs.s3a.. With the recent refactor, we have unified all object storage access to use the HDFS S3A protocol directly. Therefore, the system must now handle these legacy user-defined prefixes internally to ensure compatibility. ``` Before this change, users might define custom parameters like: paimon.s3.list.version=1 paimon.s3.paging.maximum=100 paimon.fs.s3.read.ahead.buffer.size=1 paimon.s3a.replication.factor=3 After normalization, they are automatically converted to Hadoop-compatible S3A keys: fs.s3a.list.version=1 fs.s3a.paging.maximum=100 fs.s3a.read.ahead.buffer.size=1 fs.s3a.replication.factor=3 ```
|
Thank you for your contribution to Apache Doris. Please clearly describe your PR:
|
|
run buildall |
ClickBench: Total hot run time: 30.27 s |
FE UT Coverage ReportIncrement line coverage |
FE Regression Coverage ReportIncrement line coverage |
|
run buildall |
TPC-DS: Total hot run time: 190384 ms |
ClickBench: Total hot run time: 30.48 s |
|
run buildall |
ClickBench: Total hot run time: 31.39 s |
FE UT Coverage ReportIncrement line coverage |
|
run buildall |
ClickBench: Total hot run time: 31 s |
FE UT Coverage ReportIncrement line coverage |
...e/src/main/java/org/apache/doris/datasource/property/metastore/AbstractPaimonProperties.java
Outdated
Show resolved
Hide resolved
…metastore/AbstractPaimonProperties.java Co-authored-by: Mingyu Chen (Rayner) <yunyou@selectdb.com>
|
run buildall |
TPC-DS: Total hot run time: 190318 ms |
ClickBench: Total hot run time: 28.31 s |
FE Regression Coverage ReportIncrement line coverage |
…er-paimon-list.version # Conflicts: # fe/fe-core/src/main/java/org/apache/doris/datasource/property/storage/AzureProperties.java
|
run buildall |
TPC-DS: Total hot run time: 190875 ms |
ClickBench: Total hot run time: 27.83 s |
|
PR approved by at least one committer and no changes requested. |
|
PR approved by anyone and no changes requested. |
FE Regression Coverage ReportIncrement line coverage |
…S S3A protocol (apache#57116) … Previously, for FS-based Paimon catalogs, internal configuration translation was performed to ensure the storage layer used S3FileIO (which internally relied on Hadoop S3). However, Paimon also allows users to specify S3-related options with various prefixes such as s3., s3a., or fs.s3a. in their configuration. The S3FileIO implementation in Paimon would automatically normalize these keys to the standard Hadoop prefix fs.s3a.. With the recent refactor, we have unified all object storage access to use the HDFS S3A protocol directly. Therefore, the system must now handle these legacy user-defined prefixes internally to ensure compatibility. ``` Before this change, users might define custom parameters like: paimon.s3.list.version=1 paimon.s3.paging.maximum=100 paimon.fs.s3.read.ahead.buffer.size=1 paimon.s3a.replication.factor=3 After normalization, they are automatically converted to Hadoop-compatible S3A keys: fs.s3a.list.version=1 fs.s3a.paging.maximum=100 fs.s3a.read.ahead.buffer.size=1 fs.s3a.replication.factor=3 ``` --------- Co-authored-by: Mingyu Chen (Rayner) <yunyou@selectdb.com>
…S S3A protocol (apache#57116) … Previously, for FS-based Paimon catalogs, internal configuration translation was performed to ensure the storage layer used S3FileIO (which internally relied on Hadoop S3). However, Paimon also allows users to specify S3-related options with various prefixes such as s3., s3a., or fs.s3a. in their configuration. The S3FileIO implementation in Paimon would automatically normalize these keys to the standard Hadoop prefix fs.s3a.. With the recent refactor, we have unified all object storage access to use the HDFS S3A protocol directly. Therefore, the system must now handle these legacy user-defined prefixes internally to ensure compatibility. ``` Before this change, users might define custom parameters like: paimon.s3.list.version=1 paimon.s3.paging.maximum=100 paimon.fs.s3.read.ahead.buffer.size=1 paimon.s3a.replication.factor=3 After normalization, they are automatically converted to Hadoop-compatible S3A keys: fs.s3a.list.version=1 fs.s3a.paging.maximum=100 fs.s3a.read.ahead.buffer.size=1 fs.s3a.replication.factor=3 ``` --------- Co-authored-by: Mingyu Chen (Rayner) <yunyou@selectdb.com> (cherry picked from commit d739136)
…
Previously, for FS-based Paimon catalogs, internal configuration translation was performed to ensure the storage layer used S3FileIO (which internally relied on Hadoop S3).
However, Paimon also allows users to specify S3-related options with various prefixes such as s3., s3a., or fs.s3a. in their configuration. The S3FileIO implementation in Paimon would automatically normalize these keys to the standard Hadoop prefix fs.s3a..
With the recent refactor, we have unified all object storage access to use the HDFS S3A protocol directly. Therefore, the system must now handle these legacy user-defined prefixes internally to ensure compatibility.
Doc: apache/doris-website#3030