I am a first year ELLIS PhD candidate working with Dr. Matthias Bethge at the University of Tübingen. I am also affiliated with the International Max Planck Research School for Intelligent Systems. My interests are mostly centred around developing data-centric approaches to improve machine learning models across several modalities (text, image, video, audio and 3D) as well as exposing the failure points of these models by creating better eval sets and benchmarking strategies.
I completed my MSc in Machine Learning at the University of Tübingen in 2024, during which I worked at the Computer Graphics Group, which led to an Outstanding Paper award at EMNLP 2023.
Before starting my master's, I used to be a Computer Vision Researcher at the Center of Artificial Intelligence,ZHAW, working on domain adaptation in Optical Music Recognition.
I have also worked with Dr. Daniel Lin Wen-Yan at SMU on feature correspondence-based object tracking. I completed my BSc in Electrical and Electronics Engineering in Manipal/Singapore.
I am currently working on (A) curating the best pretraining dataset for MLLMs and (B) studying how recycling image-text pairs into more informative samples can improve MLLM training regimes. I am very eager to collaborate on related projects, so please reach out if you are interested!
Dec 2023 : ViPE awarded outstanding paper at EMNLP 2023!
Sep 2023 : Work on Real World Music Object Recognition published in TISMIR.
Oct 2022 : Moved to Germany! Started my MSc at the University of Tübingen.
Aug 2022 : RPTM accepted for oral presentation at WACV 2023. Check out the paper and SOTA comparisons!
Work Experience
Mar 2023 - Sep 2023: Research Assistant at the Computer Graphics group, Tübingen AI Centre. May 2021 - Aug 2022: Computer Vision Researcher, Zürich University of Applied Sciences. Jan 2020 - Dec 2020: Visiting Researcher, Singapore Management University Jun 2018 - Aug 2019 : Undergraduate Research Intern, Jadavpur University.
In this work, we show that concept-aware data curation and online batch sampling improves the downstream performance of contrastive vision-language models. We introduce DataConcept, 128M image-text pairs annotated with concept-centric information, and Concept-Aware Batch Sampling (CABS), a framework to use concept information to curate batches online instead of static curation.
ONEBench to Test Them All: Sample-Level Benchmarking Over Open-Ended Capabilities Adhiraj Ghosh*, Sebastian Dziadzio*, Ameya Prabhu, Vishaal Udandarao, Samuel Albanie, Matthias Bethge.
ACL 2025 (Main) Project Page /
Paper /
Code
To evaluate the vast capabilities of foundation models, we introduce ONEBench – a benchmark that unifies individual test sets into a vast pool of individual data-measurement samples. We shift the focus from singular test-sets to sample-level evaluations, re-structuring static benchmarks to accommodate an ever-expanding pool of datasets and models.
No "Zero-Shot" Without Exponential Data: Pretraining Concept Frequency Determines Multimodal Model Performance
Vishaal Udandarao*, Ameya Prabhu*, Adhiraj Ghosh, Yash Sharma, Philip H.S. Torr, Adel Bibi, Samuel Albanie, Matthias Bethge.
NeurIPS 2024 Paper /
Code /
Let It Wag! Benchmark
The impressive empirical performance of VLMs is attributed to test concepts within their pretraining datasets, thus not showcasing "zero-shot" generalization. Instead, they need exponentially more data on a concept to linearly improve performance.
ViPE: Visualise Pretty-much Everything
Hassan Shahmohammadi, Adhiraj Ghosh, Hendrik Lensch.
EMNLP 2023 (Outstanding Paper Award) Paper /
Code /
Dataset /
HuggingFace /
Music Videos
ViPE is the first automated model for translating any arbitrary piece of text into a visualisable prompt. It helps any text-to-image model in figurative or non-lexical language visualisations.
Real World Music Object Recognition Adhiraj Ghosh*,Lukas Tuggener*, Raphael Emberger*, Pascal Sager*, et al.
TISMIR 2023 Paper /
Code
We present solutions to improve recognition accuracy in Music Object Recognition on low-quality, real-world music sheet data and provide confidence-rated model outputs to enable efficient human post-processing.
Relation Preserving Triplet Mining for Stabilising the Triplet Loss in Re-identification Systems Adhiraj Ghosh, Kuruparan Shanmugalingam, Wen-Yan Lin
WACV 2023 Paper /
Code /
Video /
Poster
We propose a new, feature-guided triplet mining scheme for understanding intrinsic pose to solve the intra-class variance problem in re-identification datasets.
Irony Detection in Bengali Tweets: A New Dataset, Experimentation and Results Adhiraj Ghosh, Kamal Sarkar
ICCIDS 2020 Paper /
Dataset
This paper presents the description of the Bengali irony detection dataset developed by us and reports results obtained on our Bengali irony dataset using SOTA machine learning methodologies.