Skip to content

branch-3.1: [improve](hive) refactor csv reader #50101 #50379#52404

Merged
morrySnow merged 5 commits intoapache:branch-3.1from
suxiaogang223:refactor_csv_3.1
Jul 1, 2025
Merged

branch-3.1: [improve](hive) refactor csv reader #50101 #50379#52404
morrySnow merged 5 commits intoapache:branch-3.1from
suxiaogang223:refactor_csv_3.1

Conversation

@suxiaogang223
Copy link
Contributor

@suxiaogang223 suxiaogang223 requested a review from morrySnow as a code owner June 27, 2025 03:20
@hello-stephen
Copy link
Contributor

Thank you for your contribution to Apache Doris.
Don't know what should be done next? See How to process your PR.

Please clearly describe your PR:

  1. What problem was fixed (it's best to include specific error reporting information). How it was fixed.
  2. Which behaviors were modified. What was the previous behavior, what is it now, why was it modified, and what possible impacts might there be.
  3. What features were added. Why was this function added?
  4. Which code was refactored and why was this part of the code refactored?
  5. Which functions were optimized and what is the difference before and after the optimization?

@suxiaogang223
Copy link
Contributor Author

run buildall

@doris-robot
Copy link

Cloud UT Coverage Report

Increment line coverage 🎉

Increment coverage report
Complete coverage report

Category Coverage
Function Coverage 83.43% (1128/1352)
Line Coverage 67.20% (19628/29209)
Region Coverage 66.94% (9686/14470)
Branch Coverage 56.88% (5267/9260)

@suxiaogang223
Copy link
Contributor Author

run buildall

@hello-stephen
Copy link
Contributor

Cloud UT Coverage Report

Increment line coverage 🎉

Increment coverage report
Complete coverage report

Category Coverage
Function Coverage 83.46% (1130/1354)
Line Coverage 67.24% (19649/29222)
Region Coverage 67.26% (9978/14835)
Branch Coverage 56.91% (5270/9260)

…e#50101)

Problem Summary:
According to the CSV standard format definition, quote characters inside
a string should be escaped using a pair of quote characters. However,
the current implementation does not handle this case correctly, which
may lead to incorrect parsing results when the input string contains
quote characters.
- [x] Impl TextReader for reading hive text table.
- [x] Cleanup CsvReader
- [x] Add cases to test csv and text format

behavior changes:
1. Don't parse "\\N" or other string as null for hive OpenCsv table
In Hive's OpenCsv table, there is no definition for the representation
of null values. This behavior is different from the behavior of
importing CSV tables internally in Doris.
2. Fix the bug when reading complex type for hive OpenCsv table
The current code incorrectly uses the hive text format to parse the
complex types of Hive OpenCsv tables. This PR fixes this behavior and
uses the json format for parsing.
@suxiaogang223
Copy link
Contributor Author

run buildall

@doris-robot
Copy link

Cloud UT Coverage Report

Increment line coverage 🎉

Increment coverage report
Complete coverage report

Category Coverage
Function Coverage 82.87% (1142/1378)
Line Coverage 66.98% (20326/30347)
Region Coverage 66.17% (10003/15118)
Branch Coverage 55.92% (5287/9454)

@suxiaogang223
Copy link
Contributor Author

run buildall

@doris-robot
Copy link

Cloud UT Coverage Report

Increment line coverage 🎉

Increment coverage report
Complete coverage report

Category Coverage
Function Coverage 82.87% (1142/1378)
Line Coverage 66.91% (20306/30347)
Region Coverage 66.17% (10003/15118)
Branch Coverage 55.90% (5285/9454)

@doris-robot
Copy link

BE UT Coverage Report

Increment line coverage 16.93% (53/313) 🎉

Increment coverage report
Complete coverage report

Category Coverage
Function Coverage 45.09% (12376/27449)
Line Coverage 36.08% (109537/303579)
Region Coverage 35.15% (56790/161586)
Branch Coverage 32.27% (30804/95452)

@suxiaogang223
Copy link
Contributor Author

run p0

@morrySnow morrySnow changed the title branch-3.1: Refactor csv reader (#50101 #50379) branch-3.1: [improve](hive) refactor csv reader #50101 #50379 Jul 1, 2025
@morrySnow morrySnow merged commit 1cfd7f5 into apache:branch-3.1 Jul 1, 2025
18 of 21 checks passed
@suxiaogang223 suxiaogang223 deleted the refactor_csv_3.1 branch July 10, 2025 09:01
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants

Comments