We evaluate the capabilities of current neural retrievers in understanding complex NL queries and semi-structured data. The queries involve diverse types of filtering conditions for structured objects, including exact and semantic matching, numerical and logical reasoning, or comprehensive understanding of multiple fields. The document structure can be dynamic, with potential missing fields and flexible structures (nested lists or dictionaries), making it challenging to query using fixed-schema database indexing. Current powerful LLM-based neural retrievers show promise in providing a unified solution to address the challenges present in this scenario.
We present the Semi-Structured Retrieval Benchmark (SSRB), encompassing 6 domains with 99 different data schemas, totaling 14M data objects, along with 8,485 NL queries of varying difficulty levels.
Given the scarcity of public data, we build SSRB by LLMs in a three-stage data synthesis workflow (figure bellow):
(1) schema generation, creating multiple schema definitions for six manually defined domains;
(2) data triples generation, synthesizing
Based on SSRB, we evaluate two main types of dense retrievers: 1) small-scale encoder-based models like InstructOR and BGE, and 2) LLM-based ones such as E5-mistral. We also include the BM25 lexical retriever for comparison. Our experiments reveal several key findings:
- BM25 struggle with this task,
- encoder-based models, benefiting from BERT-style backbones, provide better performance than BM25,
- LLM-based retrievers achieve notably better performance, highlighting the importance of LLM's powerful semantic understanding and reasoning capabilities in handling complex queries.
However, their absolute performance remains relatively low, indicating the necessity for developing more task-specific retrievers.
git clone https://github.com/vec-ai/struct-ir.git
cd ./struct-irDownload data (less than 10GB):
bash ./scripts/hfd.sh --dataset vec-ai/struct-ir
mv struct-ir dataDownload models (modifity the script to select models):
mkdir models
cd models
bash ../scripts/dl_models.shpip install pandas torch transformers sentence_transformers pytrec_eval pyyamlWarning
Some models have extra required packages need to install, e.g., gritlm,
- bge
- instructor
- jina3
- nomic2
- drama
- e5mistral
- qwen (gte-qwen2-7b)
- gritlm (requires
gritlm) - nvembedv2 (requires
transformers==4.42.4)
MODEL_DIR="./models" python evaluate.py --model_name drama --batch_size 32All embeddings, results, and scores will be save to results/models/MODEL_NAME by default.
python ./scripts/print_tables.pyThen we will have several rows of scores, and numbers are seperated by comma ,.
You could paste into google sheets to generate the table.
TODO

