[feature](inverted index) add custom analyzer support with pinyin tokenzer and pinyin filter#57097
Conversation
|
Thank you for your contribution to Apache Doris. Please clearly describe your PR:
|
Possible file(s) that should be tracked in LFS detected: 🚨The following file(s) exceeds the file size limit:
Consider using |
cb1f6a1 to
47f090c
Compare
Possible file(s) that should be tracked in LFS detected: 🚨The following file(s) exceeds the file size limit:
Consider using |
47f090c to
d630eaf
Compare
Possible file(s) that should be tracked in LFS detected: 🚨The following file(s) exceeds the file size limit:
Consider using |
|
run buildall |
FE UT Coverage ReportIncrement line coverage |
ClickBench: Total hot run time: 31.89 s |
|
run beut |
d630eaf to
ad1802b
Compare
Possible file(s) that should be tracked in LFS detected: 🚨The following file(s) exceeds the file size limit:
Consider using |
|
run buildall |
TPC-DS: Total hot run time: 190545 ms |
ClickBench: Total hot run time: 28.85 s |
BE UT Coverage ReportIncrement line coverage Increment coverage report
|
BE Regression && UT Coverage ReportIncrement line coverage Increment coverage report
|
FE Regression Coverage ReportIncrement line coverage |
|
run p0 |
BE Regression && UT Coverage ReportIncrement line coverage Increment coverage report
|
FE Regression Coverage ReportIncrement line coverage |
|
run feut |
FE UT Coverage ReportIncrement line coverage |
ad1802b to
04c6c00
Compare
Possible file(s) that should be tracked in LFS detected: 🚨The following file(s) exceeds the file size limit:
Consider using |
|
run buildall |
ClickBench: Total hot run time: 28.9 s |
TPC-DS: Total hot run time: 190528 ms |
ClickBench: Total hot run time: 27.66 s |
BE UT Coverage ReportIncrement line coverage Increment coverage report
|
BE Regression && UT Coverage ReportIncrement line coverage Increment coverage report
|
|
run external |
FE Regression Coverage ReportIncrement line coverage |
|
PR approved by anyone and no changes requested. |
|
PR approved by at least one committer and no changes requested. |
…enzer and pinyin filter (#57097) ### What problem does this PR solve? Support pinyin tokenizer and filter; - tokenizer example ``` CREATE INVERTED INDEX TOKENIZER IF NOT EXISTS pinyin_tokenizer PROPERTIES ( "type" = "pinyin", "keep_separate_first_letter" = "false", "keep_full_pinyin" = "true", "keep_original" = "true", "limit_first_letter_length" = "16", "lowercase" = "true", "remove_duplicated_term" = "true" ); ``` - filter example ``` CREATE INVERTED INDEX TOKEN_FILTER IF NOT EXISTS pinyin_filter1 PROPERTIES ( "type" = "pinyin", "keep_separate_first_letter" = "false", "keep_full_pinyin" = "true", "keep_original" = "true", "limit_first_letter_length" = "16", "lowercase" = "true", "remove_duplicated_term" = "true" ); ```
…enzer and pinyin filter (#57097) ### What problem does this PR solve? Support pinyin tokenizer and filter; - tokenizer example ``` CREATE INVERTED INDEX TOKENIZER IF NOT EXISTS pinyin_tokenizer PROPERTIES ( "type" = "pinyin", "keep_separate_first_letter" = "false", "keep_full_pinyin" = "true", "keep_original" = "true", "limit_first_letter_length" = "16", "lowercase" = "true", "remove_duplicated_term" = "true" ); ``` - filter example ``` CREATE INVERTED INDEX TOKEN_FILTER IF NOT EXISTS pinyin_filter1 PROPERTIES ( "type" = "pinyin", "keep_separate_first_letter" = "false", "keep_full_pinyin" = "true", "keep_original" = "true", "limit_first_letter_length" = "16", "lowercase" = "true", "remove_duplicated_term" = "true" ); ```
…enzer and pinyin filter (apache#57097) ### What problem does this PR solve? Support pinyin tokenizer and filter; - tokenizer example ``` CREATE INVERTED INDEX TOKENIZER IF NOT EXISTS pinyin_tokenizer PROPERTIES ( "type" = "pinyin", "keep_separate_first_letter" = "false", "keep_full_pinyin" = "true", "keep_original" = "true", "limit_first_letter_length" = "16", "lowercase" = "true", "remove_duplicated_term" = "true" ); ``` - filter example ``` CREATE INVERTED INDEX TOKEN_FILTER IF NOT EXISTS pinyin_filter1 PROPERTIES ( "type" = "pinyin", "keep_separate_first_letter" = "false", "keep_full_pinyin" = "true", "keep_original" = "true", "limit_first_letter_length" = "16", "lowercase" = "true", "remove_duplicated_term" = "true" ); ```
…enzer and pinyin filter (apache#57097) ### What problem does this PR solve? Support pinyin tokenizer and filter; - tokenizer example ``` CREATE INVERTED INDEX TOKENIZER IF NOT EXISTS pinyin_tokenizer PROPERTIES ( "type" = "pinyin", "keep_separate_first_letter" = "false", "keep_full_pinyin" = "true", "keep_original" = "true", "limit_first_letter_length" = "16", "lowercase" = "true", "remove_duplicated_term" = "true" ); ``` - filter example ``` CREATE INVERTED INDEX TOKEN_FILTER IF NOT EXISTS pinyin_filter1 PROPERTIES ( "type" = "pinyin", "keep_separate_first_letter" = "false", "keep_full_pinyin" = "true", "keep_original" = "true", "limit_first_letter_length" = "16", "lowercase" = "true", "remove_duplicated_term" = "true" ); ```
What problem does this PR solve?
Issue Number: close #xxx
Related PR: #xxx
Problem Summary:
apache/doris-website#3067
Support pinyin tokenizer and filter;
Release note
None
Check List (For Author)
Test
Behavior changed:
Does this need documentation?
Check List (For Reviewer who merge this PR)