Adhiraj Ghosh

Adhiraj Ghosh (অধীরাজ ঘোষ)

I am a first year ELLIS PhD candidate working with Dr. Matthias Bethge at the University of Tübingen. I am also affiliated with the International Max Planck Research School for Intelligent Systems. My interests are mostly centred around developing data-centric approaches to improve machine learning models across several modalities (text, image, video, audio and 3D) as well as exposing the failure points of these models by creating better eval sets and benchmarking strategies.

I completed my MSc in Machine Learning at the University of Tübingen in 2024, during which I worked at the Computer Graphics Group, which led to an Outstanding Paper award at EMNLP 2023.

Before starting my master's, I used to be a Computer Vision Researcher at the Center of Artificial Intelligence,ZHAW, working on domain adaptation in Optical Music Recognition. I have also worked with Dr. Daniel Lin Wen-Yan at SMU on feature correspondence-based object tracking. I completed my BSc in Electrical and Electronics Engineering in Manipal/Singapore.

I am currently working on (A) curating the best pretraining dataset for MLLMs and (B) studying how recycling image-text pairs into more informative samples can improve MLLM training regimes. I am very eager to collaborate on related projects, so please reach out if you are interested!

Email / CV / Google Scholar / Github / Twitter / WeChat(微信) / YouTube

Recent News

Feb 2026 : Work on concept-aware online batch sampling accepted at CVPR!
Jun 2025 : Started my PhD!
May 2025 : ONEBench accepted at ACL 2025 as a poster!
Nov 2024 : Defended my MSc thesis!
Sep 2024 : No Zero-shot was accepted at NeurIPS as a poster! Check out coverage by Computerphile and AI 'N Stuff!

---- Show More ----

Work Experience

Mar 2023 - Sep 2023: Research Assistant at the Computer Graphics group, Tübingen AI Centre.
May 2021 - Aug 2022: Computer Vision Researcher, Zürich University of Applied Sciences.
Jan 2020 - Dec 2020: Visiting Researcher, Singapore Management University
Jun 2018 - Aug 2019 : Undergraduate Research Intern, Jadavpur University.

Publications

	Concept-Aware Batch Sampling Improves Language-Image Pretraining Adhiraj Ghosh, Vishaal Udandarao^, Thao Nguyen^, Matteo Farina^, Mehdi Cherti, Jenia Jitsev, Sewoong Oh, Elisa Ricci, Ludwig Schmidt, Matthias Bethge. CVPR 2026* Project Page / Paper / DataConcept / Code In this work, we show that concept-aware data curation and online batch sampling improves the downstream performance of contrastive vision-language models. We introduce DataConcept, 128M image-text pairs annotated with concept-centric information, and Concept-Aware Batch Sampling (CABS), a framework to use concept information to curate batches online instead of static curation.
	ONEBench to Test Them All: Sample-Level Benchmarking Over Open-Ended Capabilities Adhiraj Ghosh^, Sebastian Dziadzio^, Ameya Prabhu, Vishaal Udandarao, Samuel Albanie, Matthias Bethge. ACL 2025 (Main) Project Page / Paper / Code To evaluate the vast capabilities of foundation models, we introduce ONEBench – a benchmark that unifies individual test sets into a vast pool of individual data-measurement samples. We shift the focus from singular test-sets to sample-level evaluations, re-structuring static benchmarks to accommodate an ever-expanding pool of datasets and models.
	No "Zero-Shot" Without Exponential Data: Pretraining Concept Frequency Determines Multimodal Model Performance Vishaal Udandarao^, Ameya Prabhu^, Adhiraj Ghosh, Yash Sharma, Philip H.S. Torr, Adel Bibi, Samuel Albanie, Matthias Bethge. NeurIPS 2024 Paper / Code / Let It Wag! Benchmark The impressive empirical performance of VLMs is attributed to test concepts within their pretraining datasets, thus not showcasing "zero-shot" generalization. Instead, they need exponentially more data on a concept to linearly improve performance.
	ViPE: Visualise Pretty-much Everything Hassan Shahmohammadi, Adhiraj Ghosh, Hendrik Lensch. EMNLP 2023 (Outstanding Paper Award) Paper / Code / Dataset / HuggingFace / Music Videos ViPE is the first automated model for translating any arbitrary piece of text into a visualisable prompt. It helps any text-to-image model in figurative or non-lexical language visualisations.
	Real World Music Object Recognition Adhiraj Ghosh^,Lukas Tuggener^, Raphael Emberger^, Pascal Sager^, et al. TISMIR 2023 Paper / Code We present solutions to improve recognition accuracy in Music Object Recognition on low-quality, real-world music sheet data and provide confidence-rated model outputs to enable efficient human post-processing.
	Relation Preserving Triplet Mining for Stabilising the Triplet Loss in Re-identification Systems Adhiraj Ghosh, Kuruparan Shanmugalingam, Wen-Yan Lin WACV 2023 Paper / Code / Video / Poster We propose a new, feature-guided triplet mining scheme for understanding intrinsic pose to solve the intra-class variance problem in re-identification datasets.
	Irony Detection in Bengali Tweets: A New Dataset, Experimentation and Results Adhiraj Ghosh, Kamal Sarkar ICCIDS 2020 Paper / Dataset This paper presents the description of the Bengali irony detection dataset developed by us and reports results obtained on our Bengali irony dataset using SOTA machine learning methodologies.