Please note that we are currently working on updating the code and providing scripts!!
IDEAlign is a framework for evaluating alignment between LLM-generated and expert annotations on open-ended, interpretive tasks. It consists of three stages: 1. Benchmarking expert similarity judgments via \emph{odd-one-out} tasks, 2. Validating automated (model) similarity methods (e.g., lexical, embedding-based, topic-based, and LLM-as-a-judge) against expert benchmarks by comparing answer distributions, and 3. Deploying the best-validated model to assess similarity of ideas generated by LLMs and domain experts at scale.