[opt](catalog) support nested namespaces of iceberg#56415
[opt](catalog) support nested namespaces of iceberg#56415morningman merged 13 commits intoapache:masterfrom
Conversation
|
Thank you for your contribution to Apache Doris. Please clearly describe your PR:
|
76fdf45 to
633b4de
Compare
0d250c1 to
464df8f
Compare
|
run buildall |
ClickBench: Total hot run time: 30.54 s |
FE UT Coverage ReportIncrement line coverage |
FE Regression Coverage ReportIncrement line coverage |
|
run buildall |
ClickBench: Total hot run time: 30.61 s |
FE UT Coverage ReportIncrement line coverage |
FE Regression Coverage ReportIncrement line coverage |
|
run buildall |
ClickBench: Total hot run time: 30.29 s |
FE UT Coverage ReportIncrement line coverage |
FE Regression Coverage ReportIncrement line coverage |
|
PR approved by at least one committer and no changes requested. |
|
PR approved by anyone and no changes requested. |
### What problem does this PR solve? Iceberg has 3 levels of metadata: catalog, namespace and table, mapping to Doris' catalog, database and table. Iceberg support nested namespaces, which means the following namespaces are valid: ``` ns1 ns1.ns2 ns1.ns2.ns3 ``` So we need to support mapping nested namespace to Doris' database. This PR add a global variable `enable_nested_namespace` to control this behavior. Default is `false`, and no logic is changed. If set to true, Doris can support following statments: ``` mysql> switch iceberg; mysql> show databases; +--------------------+ | Database | +--------------------+ | nested | | nested.db1 | | nested.db2 | +--------------------+ mysql> use iceberg.nested.db1; ERROR 1049 (42000): Only one dot can be in the name: iceberg.nested.db1 mysql> use iceberg.`nested.db1`; ERROR 5086 (42000): errCode = 2, detailMessage = Unknown catalog 'nested' mysql> set global enable_nested_namespace=true; mysql> use iceberg.nested.db1; Database changed mysql> select k1 from iceberg.`nested.db1`.nested1; mysql> select nested1.k1 from `nested.db1`.nested1; mysql> select `nested.db1`.nested1.k1 from iceberg.`nested.db1`.nested1; mysql> select iceberg.`nested.db1`.nested1.k1 from nested1; +------+ | k1 | +------+ | 1 | +------+ mysql> refresh catalog iceberg; mysql> refresh database iceberg.`nested.db1`; mysql> refresh table iceberg.`nested.db1`.nested1; Query OK, 0 rows affected (0.01 sec) ``` But, I can execute statement like: ``` use iceberg.`nested.db1`; ``` I don't know why, there is a very strange behavior in MySQL client, when adding back quota, the INIT_DB command can only receive `nested.db1` part, but expect `iceberg.nested.db1`. Also support creating nested database name in internal catalog: ``` create database `db1.db2` ```
### What problem does this PR solve? Iceberg has 3 levels of metadata: catalog, namespace and table, mapping to Doris' catalog, database and table. Iceberg support nested namespaces, which means the following namespaces are valid: ``` ns1 ns1.ns2 ns1.ns2.ns3 ``` So we need to support mapping nested namespace to Doris' database. This PR add a global variable `enable_nested_namespace` to control this behavior. Default is `false`, and no logic is changed. If set to true, Doris can support following statments: ``` mysql> switch iceberg; mysql> show databases; +--------------------+ | Database | +--------------------+ | nested | | nested.db1 | | nested.db2 | +--------------------+ mysql> use iceberg.nested.db1; ERROR 1049 (42000): Only one dot can be in the name: iceberg.nested.db1 mysql> use iceberg.`nested.db1`; ERROR 5086 (42000): errCode = 2, detailMessage = Unknown catalog 'nested' mysql> set global enable_nested_namespace=true; mysql> use iceberg.nested.db1; Database changed mysql> select k1 from iceberg.`nested.db1`.nested1; mysql> select nested1.k1 from `nested.db1`.nested1; mysql> select `nested.db1`.nested1.k1 from iceberg.`nested.db1`.nested1; mysql> select iceberg.`nested.db1`.nested1.k1 from nested1; +------+ | k1 | +------+ | 1 | +------+ mysql> refresh catalog iceberg; mysql> refresh database iceberg.`nested.db1`; mysql> refresh table iceberg.`nested.db1`.nested1; Query OK, 0 rows affected (0.01 sec) ``` But, I can execute statement like: ``` use iceberg.`nested.db1`; ``` I don't know why, there is a very strange behavior in MySQL client, when adding back quota, the INIT_DB command can only receive `nested.db1` part, but expect `iceberg.nested.db1`. Also support creating nested database name in internal catalog: ``` create database `db1.db2` ```
### What problem does this PR solve? Followup #56415 Problem Summary: 1. The previous `getNamespace` logic is wrong, we should split the `dbName` by `.` to create namespaces. 2. Allow not specify `oauth.uri` of iceberg rest catalog, to follow the new spec of IRC So we can connect Snowflake open catalog like this: ``` CREATE CATALOG ice PROPERTIES ( 'type' = 'iceberg', 'warehouse' = 'yy_external_catalog3', 'iceberg.catalog.type' = 'rest', 'iceberg.rest.uri' = 'https://xxx.snowflakecomputing.com/polaris/api/catalog', 'iceberg.rest.security.type' = 'oauth2', 'iceberg.rest.oauth2.credential' = 'id:secrete, 'iceberg.rest.oauth2.scope' = 'PRINCIPAL_ROLE:yy_sn_principal_role', 'iceberg.rest.nested-namespace-enabled' = 'true', 's3.endpoint' = 'https://s3.us-west-2.amazonaws.com', 's3.region' = 'us-west-2', 'iceberg.rest.nested-namespace-enabled' = 'true' ); ```
### What problem does this PR solve? Followup #56415 Problem Summary: 1. The previous `getNamespace` logic is wrong, we should split the `dbName` by `.` to create namespaces. 2. Allow not specify `oauth.uri` of iceberg rest catalog, to follow the new spec of IRC So we can connect Snowflake open catalog like this: ``` CREATE CATALOG ice PROPERTIES ( 'type' = 'iceberg', 'warehouse' = 'yy_external_catalog3', 'iceberg.catalog.type' = 'rest', 'iceberg.rest.uri' = 'https://xxx.snowflakecomputing.com/polaris/api/catalog', 'iceberg.rest.security.type' = 'oauth2', 'iceberg.rest.oauth2.credential' = 'id:secrete, 'iceberg.rest.oauth2.scope' = 'PRINCIPAL_ROLE:yy_sn_principal_role', 'iceberg.rest.nested-namespace-enabled' = 'true', 's3.endpoint' = 'https://s3.us-west-2.amazonaws.com', 's3.region' = 'us-west-2', 'iceberg.rest.nested-namespace-enabled' = 'true' ); ```
Iceberg has 3 levels of metadata: catalog, namespace and table, mapping to Doris' catalog, database and table. Iceberg support nested namespaces, which means the following namespaces are valid: ``` ns1 ns1.ns2 ns1.ns2.ns3 ``` So we need to support mapping nested namespace to Doris' database. This PR add a global variable `enable_nested_namespace` to control this behavior. Default is `false`, and no logic is changed. If set to true, Doris can support following statments: ``` mysql> switch iceberg; mysql> show databases; +--------------------+ | Database | +--------------------+ | nested | | nested.db1 | | nested.db2 | +--------------------+ mysql> use iceberg.nested.db1; ERROR 1049 (42000): Only one dot can be in the name: iceberg.nested.db1 mysql> use iceberg.`nested.db1`; ERROR 5086 (42000): errCode = 2, detailMessage = Unknown catalog 'nested' mysql> set global enable_nested_namespace=true; mysql> use iceberg.nested.db1; Database changed mysql> select k1 from iceberg.`nested.db1`.nested1; mysql> select nested1.k1 from `nested.db1`.nested1; mysql> select `nested.db1`.nested1.k1 from iceberg.`nested.db1`.nested1; mysql> select iceberg.`nested.db1`.nested1.k1 from nested1; +------+ | k1 | +------+ | 1 | +------+ mysql> refresh catalog iceberg; mysql> refresh database iceberg.`nested.db1`; mysql> refresh table iceberg.`nested.db1`.nested1; Query OK, 0 rows affected (0.01 sec) ``` But, I can execute statement like: ``` use iceberg.`nested.db1`; ``` I don't know why, there is a very strange behavior in MySQL client, when adding back quota, the INIT_DB command can only receive `nested.db1` part, but expect `iceberg.nested.db1`. Also support creating nested database name in internal catalog: ``` create database `db1.db2` ```
Followup apache#56415 Problem Summary: 1. The previous `getNamespace` logic is wrong, we should split the `dbName` by `.` to create namespaces. 2. Allow not specify `oauth.uri` of iceberg rest catalog, to follow the new spec of IRC So we can connect Snowflake open catalog like this: ``` CREATE CATALOG ice PROPERTIES ( 'type' = 'iceberg', 'warehouse' = 'yy_external_catalog3', 'iceberg.catalog.type' = 'rest', 'iceberg.rest.uri' = 'https://xxx.snowflakecomputing.com/polaris/api/catalog', 'iceberg.rest.security.type' = 'oauth2', 'iceberg.rest.oauth2.credential' = 'id:secrete, 'iceberg.rest.oauth2.scope' = 'PRINCIPAL_ROLE:yy_sn_principal_role', 'iceberg.rest.nested-namespace-enabled' = 'true', 's3.endpoint' = 'https://s3.us-west-2.amazonaws.com', 's3.region' = 'us-west-2', 'iceberg.rest.nested-namespace-enabled' = 'true' ); ```
What problem does this PR solve?
Iceberg has 3 levels of metadata: catalog, namespace and table, mapping to Doris' catalog, database and table.
Iceberg support nested namespaces, which means the following namespaces are valid:
So we need to support mapping nested namespace to Doris' database.
This PR add a global variable
enable_nested_namespaceto control this behavior.Default is
false, and no logic is changed.If set to true, Doris can support following statments:
But, I can execute statement like:
I don't know why, there is a very strange behavior in MySQL client, when adding back quota,
the INIT_DB command can only receive
nested.db1part, but expecticeberg.nested.db1.Also support creating nested database name in internal catalog:
Release note
None
Check List (For Author)
Test
Behavior changed:
Does this need documentation?
Check List (For Reviewer who merge this PR)