Skip to content

ritaranx/Collab-RAG

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

23 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Collab-RAG

Here is the code repo of our paper Collab-RAG: Boosting Retrieval-Augmented Generation for Complex Question Answering via White-Box and Black-Box LLM Collaboration.

Dataset and Indexing

The raw dataset we use is stored in datasets.zip, following the instruction in this link: link. The corpus can be found at this link. Please use the following command to generate the embeddings for each dataset (suppose the corpus file is named corpus.jsonl).

CUDA_VISIBLE_DEVICES=0 process_wiki.py \
--shard_id={0,1,..,7} \
--shards=8  \
--sentence_embedding_model facebook/dragon-plus-context-encoder \
--total_data 6000000 \
--sentence_embedding_model_save_name dragon \
--dataset hpqa

White Box SLM Decomposition

Please use the command for using SLM for break down questions:

CUDA_VISIBLE_DEVICES=0,1 python slm_decompose.py \
--model_path meta-llama/Llama-3.1-8B-Instruct (you can change to your trained model later) \
--tokenizer meta-llama/Llama-3.1-8B-Instruct \
--temperature 1.0 \
--tensor_parallel_size 2 \
--expname [Your Experiment Name] \
--save_dir test

After this step, the decomposed question will be saved at f"test/{dataset}/prompts_decompose_test_t{args.temperature}_{model_name}/generate.jsonl

Black Box LLM Reader

Please use the command for using SLM for break down questions:

CUDA_VISIBLE_DEVICES=0 python llm_reader.py \
--llm_model gpt-4o-mini \
--expname [Your Experiment Name] \
--temperature 1.0 \
--save_dir test

The final answer will be saved in {args.save_dir}/output/{args.dataset}/f"prompts_{args.llm_model}-{args.expname}.jsonl.

Using Llama Factory for Finetuning

Please directly add the following files into the llama factory repo:

  • dataset_info.json
  • llama3_{sft,dpo}.yaml

See the configs folder for details. The processed data will be available shortly -- stay tuned!

Citation

If you find this repository helpful, please kindly consider citing the corresponding paper. Thanks!

@article{xu2025collab,
  title={Collab-RAG: Boosting Retrieval-Augmented Generation for Complex Question Answering via White-Box and Black-Box LLM Collaboration},
  author={Xu, Ran and Shi, Wenqi and Zhuang, Yuchen and Yu, Yue and Ho, Joyce C and Wang, Haoyu and Yang, Carl},
  journal={arXiv preprint},
  year={2025}
}

About

No description, website, or topics provided.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages