[fix](csv reader) fix csv parse error when use enclose with multi-char column separator#54581
Conversation
|
Thank you for your contribution to Apache Doris. Please clearly describe your PR:
|
|
run buildall |
TPC-H: Total hot run time: 33761 ms |
TPC-DS: Total hot run time: 184930 ms |
ClickBench: Total hot run time: 32.39 s |
BE UT Coverage ReportIncrement line coverage Increment coverage report
|
BE Regression && UT Coverage ReportIncrement line coverage Increment coverage report
|
da3b800 to
e284bed
Compare
|
run buildall |
TPC-H: Total hot run time: 33650 ms |
TPC-DS: Total hot run time: 184469 ms |
ClickBench: Total hot run time: 32.23 s |
BE UT Coverage ReportIncrement line coverage Increment coverage report
|
BE Regression && UT Coverage ReportIncrement line coverage Increment coverage report
|
e284bed to
78a8f0c
Compare
|
run buildall |
78a8f0c to
72636a0
Compare
TPC-H: Total hot run time: 33974 ms |
bb6e14a to
a05761c
Compare
|
run buildall |
1 similar comment
|
run buildall |
a05761c to
72e5412
Compare
|
run buildall |
8eeb838 to
e4171d7
Compare
|
run buildall |
TPC-H: Total hot run time: 33927 ms |
TPC-DS: Total hot run time: 183740 ms |
ClickBench: Total hot run time: 32.6 s |
BE UT Coverage ReportIncrement line coverage Increment coverage report
|
BE Regression && UT Coverage ReportIncrement line coverage Increment coverage report
|
|
PR approved by at least one committer and no changes requested. |
BE Regression && UT Coverage ReportIncrement line coverage Increment coverage report
|
…r column separator (#54581) ### What problem does this PR solve? Idx represents the position where the buffer is parsed. If the buffer does not read a complete row, as shown in the following figure, idx will become the length of the buffer, and then the buffer will be expanded. If some of the column separators happen to be at the end of the buffer and some are not read, when reading after expansion, it will be impossible to read the complete column separators, resulting in parsing errors.
…r column separator (apache#54581) Idx represents the position where the buffer is parsed. If the buffer does not read a complete row, as shown in the following figure, idx will become the length of the buffer, and then the buffer will be expanded. If some of the column separators happen to be at the end of the buffer and some are not read, when reading after expansion, it will be impossible to read the complete column separators, resulting in parsing errors.
…th multi-char column separator (#54581) (#55052) pick #54581 Idx represents the position where the buffer is parsed. If the buffer does not read a complete row, as shown in the following figure, idx will become the length of the buffer, and then the buffer will be expanded. If some of the column separators happen to be at the end of the buffer and some are not read, when reading after expansion, it will be impossible to read the complete column separators, resulting in parsing errors.
What problem does this PR solve?
Idx represents the position where the buffer is parsed.
If the buffer does not read a complete row, as shown in the following figure, idx will become the length of the buffer, and then the buffer will be expanded. If some of the column separators happen to be at the end of the buffer and some are not read, when reading after expansion, it will be impossible to read the complete column separators, resulting in parsing errors.

The solution is to do a pre reading of the column separator length when parsing the column separator.
Release note
None
Check List (For Author)
Test
Behavior changed:
Does this need documentation?
Check List (For Reviewer who merge this PR)