[enhancement](hive)Initial support for Hive org.openx.data.jsonserde.JsonSerDe#49209
[enhancement](hive)Initial support for Hive org.openx.data.jsonserde.JsonSerDe#49209morningman merged 4 commits intoapache:masterfrom
Conversation
|
Thank you for your contribution to Apache Doris. Please clearly describe your PR:
|
|
run buildall |
|
TeamCity cloud ut coverage result: |
TPC-H: Total hot run time: 32970 ms |
TPC-DS: Total hot run time: 185231 ms |
ClickBench: Total hot run time: 31.05 s |
BE UT Coverage ReportIncrement line coverage Increment coverage report
|
| // in the data to lowercase,and use the last one as the insertion value | ||
|
|
||
| bool _openx_json_ignore_malformed = false; | ||
| // hive : org.openx.data.jsonserde.JsonSerDe, `ignore.malformed.json` prop. |
There was a problem hiding this comment.
move comment before the field
|
run buildall |
|
TeamCity cloud ut coverage result: |
TPC-H: Total hot run time: 34703 ms |
TPC-DS: Total hot run time: 194406 ms |
ClickBench: Total hot run time: 31.92 s |
| } | ||
|
|
||
| public boolean canReadHiveJsonInOneColumn() { | ||
| return ConnectContext.get().getSessionVariable().isReadHiveJsonInOneColumn() |
There was a problem hiding this comment.
There is a sessionVariable instance in HiveScanNode. Use it instead of ConnectContext.get().getSessionVariable()
| || serDeLib.equals(HiveMetaStoreClientHelper.LEGACY_HIVE_JSON_SERDE)) { | ||
| type = TFileFormatType.FORMAT_JSON; | ||
| } else if (serDeLib.equals(HiveMetaStoreClientHelper.OPENX_JSON_SERDE)) { | ||
| if (hmsTable.canReadHiveJsonInOneColumn()) { |
There was a problem hiding this comment.
I think we should return error if READ_HIVE_JSON_IN_ONE_COLUMN is true but the first column is not string?
|
run buildall |
|
PR approved by at least one committer and no changes requested. |
|
PR approved by anyone and no changes requested. |
|
TeamCity cloud ut coverage result: |
BE UT Coverage ReportIncrement line coverage Increment coverage report
|
…JsonSerDe (apache#49209) ### What problem does this PR solve? Problem Summary: Initial support for Hive `org.openx.data.jsonserde.JsonSerDe`(https://github.com/rcongiu/Hive-JSON-Serde). The specific behavior of read is similar to pr apache#43469. By referring to the description in the link, here are some explanations: Support: 1. Querying Complex Fields 2. Importing Malformed Data (serde prop: ignore.malformed.json) Not supported, this parameter will not affect the query results 1. dots.in.keys 2. Case Sensitivity in mappings 3. Mapping Hive Keywords Not supported, but will report an error: 1. Using Arrays 2. Promoting a Scalar to an Array error : [DATA_QUALITY_ERROR]JSON data is array-object, `strip_outer_array` must be TRUE. In order to allow some json strings that do not support parsing to be processed by users, a session variable is introduced: `read_hive_json_in_one_column` (default is false). When this variable is true, a whole line of json is read into the first column, and users can choose to process a whole line of json, such as JSON_PARSE. The data type of the first column of the table needs to be string. Currently only valid for org.openx.data.jsonserde.JsonSerDe.
…onserde.JsonSerDe" (apache#49928) Reverts apache#49209
What problem does this PR solve?
Problem Summary:
Initial support for Hive
org.openx.data.jsonserde.JsonSerDe(https://github.com/rcongiu/Hive-JSON-Serde).The specific behavior of read is similar to pr #43469.
By referring to the description in the link, here are some explanations:
Support:
Not supported, this parameter will not affect the query results
Not supported, but will report an error:
error : [DATA_QUALITY_ERROR]JSON data is array-object,
strip_outer_arraymust be TRUE.In order to allow some json strings that do not support parsing to be processed by users, a session variable is introduced:
read_hive_json_in_one_column(default is false). When this variable is true, a whole line of json is read into the first column, and users can choose to process a whole line of json, such as JSON_PARSE. The data type of the first column of the table needs to be string. Currently only valid for org.openx.data.jsonserde.JsonSerDe.Release note
None
Check List (For Author)
Test
Behavior changed:
Does this need documentation?
Check List (For Reviewer who merge this PR)