This repository implements Struct-Bench, a novel evaluation framework and a benchmark to evaluate synthetic data quality relative to a real dataset, where the real dataset features complex inter-field structural relations, and at least some fields contain natural language. We illustrate the dataset level and sample level views into Struct-Bench as follows:
- Project Setup
- Forking & Pull Requests
- Adding and Evaluating Datasets
- Generating Synthetic Datasets
- Key Files & Directories
- Acknowledgement
- Clone the Repo
git clone hhttps://github.com/struct-bench/structpe.git
cd structpe- Install
- Recommended: Create a fresh virtual environment (
condaorvenv). - Then install locally:
pip install .- Or for editable mode (if you intend to develop and push changes):
pip install -e .Adding new datasets to Struct-Bench are welcome! To propose a change or new feature, follow these steps:
-
Fork the Repo on GitHub
- Visit the structpe GitHub page, click “Fork”, and choose your GitHub account.
-
Clone Your Fork
git clone https://github.com/yourfork/structpe.git
cd structpe- Create a Branch
git checkout -b my-new-dataset- Make Your Changes
- Add new files, fix bugs, or implement new features.
- Update or add unit tests in
tests/.
- Push & Open a Pull Request
git commit -am "Add new dataset for XYZ"
git push origin my-new-dataset- Then open a Pull Request on GitHub from your fork.
- Review & Merge
- The maintainers will review your PR, offer feedback, and merge once approved.
Struct-Bench uses a registry pattern to easily integrate more datasets. Here’s how:
-
Create a New File
- In
structpe/dataset/, for example:my_new_dataset.py. - Define your sample class (
MyNewSample) and a container class (MyNewDataset). - Use existing atomic types from
_types.pyor define constraints as needed.
- In
-
Define the context-free grammar (CFG) of the data structure
-
Register the Dataset
- At the end of that file:
from structpe.dataset.registry import register_dataset
register_dataset("my_new_dataset", MyNewDataset)- (Optional) Provide any
dataset_metric(level=...)functions to compute custom metrics. - (Optional) If lines in grammar have fields that are logically comparable, define
compute_node_similarities = [("fieldA", "fieldB"), ...].
Then run:
structpe evaluate \
--private-dataset-name=my_dataset \
--private-dataset-json=data/my_dataset.json \
--synthetic-data=data/synthetic_dataset.json \
--savedir results_my_datasetYou’ll get a comprehensive JSON summarizing correctness, adjacency, grammar, KNN-based metrics, plus your custom dataset metrics.
Please refer to this link for more details on the dataset evaluation framework.
We adopt the DP Fine-tuning (Yu et al. 2021) and Augmented Private Evolution (Aug-PE) algorithm (Xie et al. 2024) to generate synthetic datasets on graph-structured data (ShareGPT, ICLR reviews), tabular data (Water, Arena, Adult), and attribute controllable data (Reviews, Grounding). We adopt the external library microsoft/dp-transformers for DP Fine-tuning, and microsoft/DPSDA for Aug-PE.
We also implement the Aug-PE algorithm in structpe/generator to generate synthetic attribute controllable data.
Generate DP synthetic text with Aug-PE:
from structpe.generator.generation import run_generation_pipeline
synthetic_texts = run_generation_pipeline(
file_path="data/input.tsv",
file_type="tsv", # or "csv" or "json"
dataset_name="my_dataset",
concurrency=4,
init_count=10,
iterations=3,
endpoint="https://myazureendpoint.openai.azure.com/", # replace to your OpenAI endpoint
deployment="gpt-4"
)file_path: Path to your input file (JSON,CSV, orTSV).file_type: Must be"json","csv", or"tsv".concurrency: Number of threads to use for Azure OpenAI calls.init_count: Initial sample count.iterations: How many iteration cycles.endpoint: Your Azure OpenAI endpoint.deployment: Name of the model deployment (e.g.,"gpt-4").
A list of final generated strings.
Please refer to this link for more details on the synthetic data generation.
-
structpe/_types.py
Holds enumerations and atomic range classes (e.g.AtomicRangeInt) used by multiple datasets. -
structpe/dataset/
Holds each dataset definition (search_dataset.py,hotel_booking_dataset.py, etc.) plusregistry.pyfor dynamic dataset lookup. -
structpe/descriptor/descriptor.py
Implements reflection-based serialization so that entire dataset objects can be stored as JSON and reconstructed. -
structpe/evaluator/
Contains theEvaluatorclass (with JSON output) and supporting classes (LLMJudge,Verifier, etc.) for constraint checks, distribution stats, and more. -
structpe/generator/generation.py
Demonstrates how to create synthetic samples from existing dataset descriptions (currently forsearch_query). -
structpe/run.py
Houses the CLI. Subcommands:list datasets: Show registered datasetsrun --dataset-name=XYZ: Instantiate and evaluate a dataset
-
tests/
Contains unit tests such as:test_dataset.py(checks correctness of dataset classes),test_pipeline.py(verifies pipeline logic),test_evaluator.py(tests evaluation output).
Disclaimer: Please expect changes in the framework as we improve it further based on feedback from researchers and practitioners.
