Pararel Dataset Processing Pipeline

You need to first set up the environment.

pip install -r requirements.txt

This document outlines the step-by-step process for handling the Pararel dataset in our pipeline.

1. Preprocessing the Dataset

Before training, the dataset needs to be preprocessed. The following command processes both the test and training sets:

# Process the Pararel test dataset
python template.py --data_path dataset/blank/pararel_test.json --save_path dataset/blank/processed_pararel_test.json --case blank --question_number 3

# Process the Pararel training dataset
python template.py --data_path dataset/blank/pararel_train.json --save_path dataset/blank/processed_pararel_train.json --case blank --question_number 3

2. Generating Model Output

Once the dataset is processed, we generate model predictions using the following commands:

# Generate predictions for the Pararel test dataset
python generate_output.py --data_path dataset/blank/processed_pararel_test.json --save_path result/blank/pararel.json --case blank --generate_vllm --question_number 3 --gpu 0

# Generate predictions for the Pararel training dataset
python generate_output.py --data_path dataset/blank/processed_pararel_train.json --save_path result/blank/pararel.json --case blank --generate_vllm --question_number 3 --gpu 0

3. Comparing Model Output with Ground Truth

To evaluate the model’s performance, compare its predictions against the ground truth labels:

python compare.py --data_path result/blank/pararel.json --case blank --question_number 3

4. Splitting the Dataset into Certain and Uncertain Cases

To improve model robustness, we categorize the dataset into certain and uncertain instances:

python divide_dataset.py --data_path dataset/blank/processed_pararel_train.json --result result/blank/pararel.json --save_path dataset/blank/pararel_split/pararel --case blank

5. Fine-Tuning the Model

To enhance model performance, fine-tune it using the Pararel dataset:

# Fine-tune using LLaMA3
python fine_tune.py --data_path dataset/blank/pararel_split/pararel --save_path models/blank/llama3_pararel --case blank --question_number 3 --gpu 0

# Fine-tune using Qwen
python fine_tune_Qwen.py --data_path dataset/blank/pararel_split/pararel --save_path models/blank/llama3_pararel --case blank --question_number 3 --gpu 0

6. Generating Output After Fine-Tuning

After fine-tuning, we generate new predictions using the updated model:

python generate_output.py --data_path dataset/blank/processed_pararel_test.json --save_path fine_tune_result/blank/pararel.json --lora_model --lora_path models/blank/llama3_pararel --case blank --question_number 3 --gpu 0

7. Comparing Fine-Tuned Model Output with Ground Truth

To assess the improvement, compare the fine-tuned model’s output:

python compare.py --data_path fine_tune_result/blank/pararel.json --case blank --question_number 3

8. Calculating AP Score

To quantify the model’s reliability, compute the AP (Average Precision) score:

# AP Score for fine-tuned model
python calculate_ap.py --data_path fine_tune_result/blank/pararel.json --lora_model --lora_path models/blank/llama3_pararel --case blank --gpu 0

This pipeline ensures a systematic approach to processing, fine-tuning, and evaluating the Pararel dataset. 🚀

If you find this repository helpful, please consider citing our paper to support the research.

@misc{huang2025mactuningllmmulticompositionalproblem,
  title={MAC-Tuning: LLM Multi-Compositional Problem Reasoning with Enhanced Knowledge Boundary Awareness}, 
  author={Junsheng Huang and Zhitao He and Sandeep Polisetty and Qingyun Wang and May Fung},
  year={2025},
  eprint={2504.21773},
  archivePrefix={arXiv},
  primaryClass={cs.CL},
  url={https://arxiv.org/abs/2504.21773}, 
}

Name		Name	Last commit message	Last commit date
Latest commit History 40 Commits
MAC-Tuning		MAC-Tuning
MTI		MTI
Single-QA		Single-QA
dataset		dataset
models		models
result		result
ReadME.md		ReadME.md
calculate_ap.py		calculate_ap.py
compare.py		compare.py
context.txt		context.txt
coqa.py		coqa.py
different.py		different.py
divide_dataset.py		divide_dataset.py
fine_tune.py		fine_tune.py
fine_tune_Qwen.py		fine_tune_Qwen.py
fine_tune_phi.py		fine_tune_phi.py
generate_output.py		generate_output.py
gsm.py		gsm.py
mmlu.py		mmlu.py
multiple_plots.png		multiple_plots.png
pararel.py		pararel.py
requirements.txt		requirements.txt
step.txt		step.txt
template.py		template.py
toy.py		toy.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Pararel Dataset Processing Pipeline

1. Preprocessing the Dataset

2. Generating Model Output

3. Comparing Model Output with Ground Truth

4. Splitting the Dataset into Certain and Uncertain Cases

5. Fine-Tuning the Model

6. Generating Output After Fine-Tuning

7. Comparing Fine-Tuned Model Output with Ground Truth

8. Calculating AP Score

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Pararel Dataset Processing Pipeline

1. Preprocessing the Dataset

2. Generating Model Output

3. Comparing Model Output with Ground Truth

4. Splitting the Dataset into Certain and Uncertain Cases

5. Fine-Tuning the Model

6. Generating Output After Fine-Tuning

7. Comparing Fine-Tuned Model Output with Ground Truth

8. Calculating AP Score

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages