[Enhancement](file-cache) Add fine-grained control for compaction file cache#60609

gavinchou · 2026-02-09T05:04:17Z

Support selective caching of index files only during compaction in cloud mode.

Changes:

Add two mBool configs to control index-only caching for base and cumulative compaction:
- enable_base_compaction_output_write_index_only (default: false)
- enable_cumu_compaction_output_write_index_only (default: false)
Extend RowsetWriterContext with compaction_output_write_index_only field to mark whether only index files should be cached
Modify get_file_writer_options() to accept is_index_file parameter:
- When compaction_output_write_index_only=true and is_index_file=false, set write_file_cache=false to skip caching data files
- Index files continue to be cached normally
Update file writer creation call sites to pass is_index_file parameter:
- Index file writers: pass true
- Segment (data) file writers: pass false

Benefits:

Reduces cache pressure by avoiding caching large data files during compaction
Preserves index file caching for query performance
Provides separate control for base and cumulative compaction strategies
Maintains backward compatibility with default settings

What problem does this PR solve?

Issue Number: close #xxx

Related PR: #xxx

Problem Summary:

Release note

None

Check List (For Author)

Test
- Regression test
- Unit Test
- Manual test (add detailed scripts or steps below)
- No need to test or manual test. Explain why:
  - This is a refactor/code format and no logic has been changed.
  - Previous test can cover this change.
  - No code files have been changed.
  - Other reason
Behavior changed:
- No.
- Yes.
Does this need documentation?
- No.
- Yes.

Check List (For Reviewer who merge this PR)

Confirm the release note
Confirm test cases
Confirm document
Add branch pick label

…e cache Support selective caching of index files only during compaction in cloud mode. Changes: - Add two mBool configs to control index-only caching for base and cumulative compaction: * enable_base_compaction_output_write_index_only (default: false) * enable_cumu_compaction_output_write_index_only (default: false) - Extend RowsetWriterContext with compaction_output_write_index_only field to mark whether only index files should be cached - Modify get_file_writer_options() to accept is_index_file parameter: * When compaction_output_write_index_only=true and is_index_file=false, set write_file_cache=false to skip caching data files * Index files continue to be cached normally - Update file writer creation call sites to pass is_index_file parameter: * Index file writers: pass true * Segment (data) file writers: pass false Benefits: - Reduces cache pressure by avoiding caching large data files during compaction - Preserves index file caching for query performance - Provides separate control for base and cumulative compaction strategies - Maintains backward compatibility with default settings

hello-stephen · 2026-02-09T05:04:25Z

Thank you for your contribution to Apache Doris.
Don't know what should be done next? See How to process your PR.

Please clearly describe your PR:

What problem was fixed (it's best to include specific error reporting information). How it was fixed.
Which behaviors were modified. What was the previous behavior, what is it now, why was it modified, and what possible impacts might there be.
What features were added. Why was this function added?
Which code was refactored and why was this part of the code refactored?
Which functions were optimized and what is the difference before and after the optimization?

gavinchou · 2026-02-09T05:04:53Z

run buildall

freemandealer

tests are need

doris-robot · 2026-02-09T05:38:59Z

TPC-H: Total hot run time: 30229 ms

machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/tpch-tools
Tpch sf100 test result on commit d4b0bc63341533d20fbd3dc92633fef103b9e5db, data reload: false

------ Round 1 ----------------------------------
q1	17628	4450	4266	4266
q2	2019	384	255	255
q3	10144	1331	716	716
q4	10191	790	306	306
q5	7553	2243	1906	1906
q6	196	181	149	149
q7	896	725	605	605
q8	9270	1398	1157	1157
q9	4659	4657	4612	4612
q10	6777	1967	1555	1555
q11	527	304	280	280
q12	343	387	235	235
q13	17783	4084	3209	3209
q14	237	242	214	214
q15	898	803	813	803
q16	683	679	622	622
q17	702	847	504	504
q18	6434	5974	5721	5721
q19	1241	994	645	645
q20	529	505	380	380
q21	2572	1816	1816	1816
q22	360	317	273	273
Total cold run time: 101642 ms
Total hot run time: 30229 ms

----- Round 2, with runtime_filter_mode=off -----
q1	4340	4355	4361	4355
q2	276	338	268	268
q3	2138	2710	2198	2198
q4	1363	1753	1318	1318
q5	4330	4273	4281	4273
q6	228	184	140	140
q7	1856	1790	1665	1665
q8	2475	2746	2486	2486
q9	7728	7561	7549	7549
q10	2932	3103	2748	2748
q11	555	455	453	453
q12	698	759	657	657
q13	3931	4391	3502	3502
q14	302	316	284	284
q15	863	860	846	846
q16	678	729	709	709
q17	1184	1387	1413	1387
q18	8057	7953	7866	7866
q19	916	870	876	870
q20	2055	2194	2076	2076
q21	4850	4444	4460	4444
q22	611	580	513	513
Total cold run time: 52366 ms
Total hot run time: 50607 ms

doris-robot · 2026-02-09T05:55:49Z

ClickBench: Total hot run time: 28.33 s

machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/clickbench-tools
ClickBench test result on commit d4b0bc63341533d20fbd3dc92633fef103b9e5db, data reload: false

query1	0.05	0.05	0.05
query2	0.10	0.05	0.05
query3	0.26	0.09	0.08
query4	1.61	0.11	0.11
query5	0.27	0.25	0.24
query6	1.16	0.67	0.66
query7	0.03	0.03	0.02
query8	0.04	0.04	0.04
query9	0.57	0.50	0.49
query10	0.55	0.55	0.55
query11	0.14	0.09	0.10
query12	0.14	0.11	0.12
query13	0.63	0.62	0.60
query14	1.05	1.05	1.07
query15	0.87	0.85	0.88
query16	0.39	0.38	0.38
query17	1.15	1.15	1.15
query18	0.22	0.21	0.22
query19	2.08	2.02	2.09
query20	0.01	0.01	0.02
query21	15.40	0.26	0.15
query22	5.13	0.06	0.05
query23	15.97	0.28	0.11
query24	2.40	0.69	0.27
query25	0.11	0.06	0.05
query26	0.15	0.13	0.13
query27	0.07	0.06	0.11
query28	5.10	1.15	0.97
query29	12.54	3.92	3.15
query30	0.28	0.13	0.12
query31	2.82	0.65	0.40
query32	3.23	0.59	0.51
query33	3.19	3.23	3.26
query34	16.06	5.39	4.74
query35	4.80	4.85	4.76
query36	0.65	0.50	0.49
query37	0.11	0.07	0.08
query38	0.08	0.04	0.04
query39	0.05	0.03	0.03
query40	0.20	0.17	0.15
query41	0.08	0.03	0.03
query42	0.05	0.03	0.03
query43	0.04	0.04	0.03
Total cold run time: 99.83 s
Total hot run time: 28.33 s

hello-stephen · 2026-02-09T06:44:55Z

BE UT Coverage Report

Increment line coverage 59.26% (16/27) 🎉

Increment coverage report
Complete coverage report

Category	Coverage
Function Coverage	52.73% (19445/36880)
Line Coverage	36.21% (181014/499855)
Region Coverage	32.60% (140560/431139)
Branch Coverage	33.62% (60874/181047)

gavinchou · 2026-02-09T10:07:27Z

run buildall

doris-robot · 2026-02-09T10:55:34Z

TPC-H: Total hot run time: 30683 ms

machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/tpch-tools
Tpch sf100 test result on commit ab494ffaf60242e17d9294d093322c98371f0bfc, data reload: false

------ Round 1 ----------------------------------
q1	17652	4491	4299	4299
q2	2066	352	234	234
q3	10153	1263	731	731
q4	10195	793	308	308
q5	7515	2216	1922	1922
q6	202	177	146	146
q7	900	747	607	607
q8	9271	1388	1166	1166
q9	4677	4575	4653	4575
q10	6768	1930	1559	1559
q11	501	308	277	277
q12	342	378	226	226
q13	17765	4071	3239	3239
q14	235	233	228	228
q15	892	808	813	808
q16	672	707	614	614
q17	694	802	532	532
q18	6524	5874	6186	5874
q19	1248	1107	657	657
q20	609	533	413	413
q21	2786	2015	1974	1974
q22	376	345	294	294
Total cold run time: 102043 ms
Total hot run time: 30683 ms

----- Round 2, with runtime_filter_mode=off -----
q1	4497	4581	4734	4581
q2	264	329	261	261
q3	2496	2982	2451	2451
q4	1463	1849	1421	1421
q5	4619	4725	4521	4521
q6	227	189	141	141
q7	2045	1989	1819	1819
q8	2553	2414	2428	2414
q9	7555	7616	7359	7359
q10	2975	3055	2557	2557
q11	584	471	459	459
q12	752	795	676	676
q13	3907	4266	3514	3514
q14	302	321	290	290
q15	895	826	887	826
q16	687	731	661	661
q17	1147	1313	1304	1304
q18	7524	7347	7337	7337
q19	818	798	799	798
q20	1958	2018	1843	1843
q21	4463	4243	4182	4182
q22	586	527	498	498
Total cold run time: 52317 ms
Total hot run time: 49913 ms

doris-robot · 2026-02-09T11:12:19Z

ClickBench: Total hot run time: 28.54 s

machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/clickbench-tools
ClickBench test result on commit ab494ffaf60242e17d9294d093322c98371f0bfc, data reload: false

query1	0.05	0.04	0.04
query2	0.10	0.05	0.04
query3	0.26	0.08	0.09
query4	1.60	0.11	0.11
query5	0.28	0.24	0.27
query6	1.19	0.69	0.67
query7	0.03	0.02	0.02
query8	0.05	0.04	0.04
query9	0.56	0.49	0.50
query10	0.54	0.57	0.54
query11	0.14	0.10	0.09
query12	0.14	0.10	0.11
query13	0.62	0.61	0.63
query14	1.06	1.08	1.05
query15	0.88	0.88	0.89
query16	0.38	0.39	0.39
query17	1.14	1.18	1.15
query18	0.23	0.21	0.21
query19	2.15	1.93	1.99
query20	0.02	0.02	0.02
query21	15.41	0.26	0.16
query22	5.15	0.06	0.05
query23	16.05	0.30	0.11
query24	1.56	0.47	0.64
query25	0.09	0.05	0.05
query26	0.15	0.13	0.13
query27	0.07	0.05	0.07
query28	3.63	1.17	0.97
query29	12.62	3.95	3.17
query30	0.28	0.13	0.12
query31	2.81	0.65	0.41
query32	3.24	0.60	0.49
query33	3.32	3.22	3.34
query34	16.32	5.43	4.76
query35	4.81	4.81	4.81
query36	0.66	0.50	0.49
query37	0.10	0.07	0.07
query38	0.08	0.05	0.04
query39	0.05	0.03	0.03
query40	0.20	0.17	0.15
query41	0.08	0.03	0.03
query42	0.04	0.03	0.02
query43	0.05	0.04	0.04
Total cold run time: 98.19 s
Total hot run time: 28.54 s

doris-robot · 2026-02-09T11:41:41Z

BE UT Coverage Report

Increment line coverage 88.24% (30/34) 🎉

Increment coverage report
Complete coverage report

Category	Coverage
Function Coverage	52.72% (19446/36883)
Line Coverage	36.22% (181071/499921)
Region Coverage	32.59% (140520/431187)
Branch Coverage	33.63% (60900/181063)

hello-stephen · 2026-02-09T17:54:54Z

BE Regression && UT Coverage Report

Increment line coverage 100.00% (34/34) 🎉

Increment coverage report
Complete coverage report

Category	Coverage
Function Coverage	73.38% (26519/36139)
Line Coverage	56.43% (281430/498683)
Region Coverage	54.18% (236001/435569)
Branch Coverage	55.72% (101275/181767)

freemandealer

LGTM

github-actions · 2026-02-11T09:12:44Z

PR approved by anyone and no changes requested.

dataroaring

LGTM

github-actions · 2026-02-11T15:11:46Z

PR approved by at least one committer and no changes requested.

gavinchou requested a review from dataroaring as a code owner February 9, 2026 05:04

gavinchou added cloud file_cache labels Feb 9, 2026

freemandealer reviewed Feb 9, 2026

View reviewed changes

gavinchou added 2 commits February 9, 2026 17:15

Add UT

23b9f99

Rename conf

ab494ff

gavinchou added dev/4.0.x dev/4.1.x labels Feb 11, 2026

freemandealer approved these changes Feb 11, 2026

View reviewed changes

github-actions bot added the reviewed label Feb 11, 2026

dataroaring approved these changes Feb 11, 2026

View reviewed changes

github-actions bot added the approved Indicates a PR has been approved by one committer. label Feb 11, 2026

dataroaring merged commit 98ca256 into apache:master Feb 11, 2026
34 of 35 checks passed

github-actions bot added the dev/4.0.x-conflict label Feb 11, 2026

Conversation

gavinchou commented Feb 9, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What problem does this PR solve?

Release note

Check List (For Author)

Check List (For Reviewer who merge this PR)

Uh oh!

hello-stephen commented Feb 9, 2026

Uh oh!

gavinchou commented Feb 9, 2026

Uh oh!

freemandealer left a comment

Choose a reason for hiding this comment

Uh oh!

doris-robot commented Feb 9, 2026

Uh oh!

doris-robot commented Feb 9, 2026

Uh oh!

hello-stephen commented Feb 9, 2026

BE UT Coverage Report

Uh oh!

gavinchou commented Feb 9, 2026

Uh oh!

doris-robot commented Feb 9, 2026

Uh oh!

doris-robot commented Feb 9, 2026

Uh oh!

doris-robot commented Feb 9, 2026

BE UT Coverage Report

Uh oh!

hello-stephen commented Feb 9, 2026

BE Regression && UT Coverage Report

Uh oh!

freemandealer left a comment

Choose a reason for hiding this comment

Uh oh!

github-actions bot commented Feb 11, 2026

Uh oh!

dataroaring left a comment

Choose a reason for hiding this comment

Uh oh!

github-actions bot commented Feb 11, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

gavinchou commented Feb 9, 2026 •

edited

Loading