branch-3.1: [improve](hive) refactor csv reader #50101 #50379#52404
Merged
morrySnow merged 5 commits intoapache:branch-3.1from Jul 1, 2025
Merged
branch-3.1: [improve](hive) refactor csv reader #50101 #50379#52404morrySnow merged 5 commits intoapache:branch-3.1from
morrySnow merged 5 commits intoapache:branch-3.1from
Conversation
Contributor
|
Thank you for your contribution to Apache Doris. Please clearly describe your PR:
|
Contributor
Author
|
run buildall |
Cloud UT Coverage ReportIncrement line coverage Increment coverage report
|
Contributor
Author
|
run buildall |
Contributor
Cloud UT Coverage ReportIncrement line coverage Increment coverage report
|
…e#50101) Problem Summary: According to the CSV standard format definition, quote characters inside a string should be escaped using a pair of quote characters. However, the current implementation does not handle this case correctly, which may lead to incorrect parsing results when the input string contains quote characters.
- [x] Impl TextReader for reading hive text table. - [x] Cleanup CsvReader - [x] Add cases to test csv and text format behavior changes: 1. Don't parse "\\N" or other string as null for hive OpenCsv table In Hive's OpenCsv table, there is no definition for the representation of null values. This behavior is different from the behavior of importing CSV tables internally in Doris. 2. Fix the bug when reading complex type for hive OpenCsv table The current code incorrectly uses the hive text format to parse the complex types of Hive OpenCsv tables. This PR fixes this behavior and uses the json format for parsing.
dd12ed0 to
e880bcd
Compare
Contributor
Author
|
run buildall |
Cloud UT Coverage ReportIncrement line coverage Increment coverage report
|
Contributor
Author
|
run buildall |
Cloud UT Coverage ReportIncrement line coverage Increment coverage report
|
BE UT Coverage ReportIncrement line coverage Increment coverage report
|
Contributor
Author
|
run p0 |
morrySnow
approved these changes
Jul 1, 2025
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
bp:
#50101
#50379