Siddhartha Gairola

Snowmass Aspen
Snowmass Mountain, Snowmass Village, CO, USA (clicked by me in March, 2020).

I am an ELLIS PhD Student advised by Prof. Bernt Schiele (Max Planck Institute of Informatics, Saarbrücken) and Prof. Francesco Locatello (ISTA). During the course of my PhD I shall be associated with both the Max Planck Institute for Informatics and Institute of Science and Technology, Austria.

Previously, I was a Research Fellow at Microsoft Research (MSR), India in the Technology for Emerging Markets group, where I worked on applications of Computer Vision, Image Processing and Machine Learning to developing low-cost diagnostic solutions for healthcare. Before that, I worked as a Research Intern at Adobe Inc. in the Media and Data Science Research Group on image understanding tasks.

I completed my Master's and B.Tech with Honours in May, 2020 at IIIT Hyderabad, where I was advised by Prof. PJ Narayanan. My work was mainly on learning robust unsupervised style representations for image recognition and retrieval tasks.

Siddhartha Gairola

Research

Current Research: I am broadly interested in Artificial Intelligence (Machine Learning), Computer Vision, Image Processing, Natural Language Processing and their applications to real-world problems. I am particularly interested in building reliable (robust) systems that model visual perception with limited supervision. To this end, for my PhD I will be exploring two major directions:

  1. Interpretability and Robustness of Deep Neural Networks
  2. Learning Powerful (unsupervised) Object-Centric Representations

Published Research: My prior research has involved using machine learning, image processing, and optical physics to tackle key challenges in computer vision and healthcare, including (1) image compositing, (2) self-supervised representation learning, (3) few-shot image segmentation, (4) explainable AI (xAI) (5) anomaly detection, and (6) AI-based medical diagnosis. These have resulted in publications at top conferences like ICLR, IJCAI, WACV, SIGIR, EMBC, IMWUT/Ubicomp and MMM.

When not working on my research, I like to play the piano and guitar, listen to music, read non-fiction, drive motorcycles, and go for a run or hike. I also am really fascinated by paradoxes, can find some here. (I wish I had Hermione's Time-Turner to do much more in a day as much as I'd like to.) (see more at Personal)

Computer Vision, Image Processing and Machine Learning

DAVE figure DAVE figure hover

DAVE : Distribution-aware Attribution via ViT Gradient Decomposition

Adam Wróbel, Siddhartha Gairola, Jacek Tabor, Bernt Schiele, Bartosz Zieliński, Dawid Rymarczyk

Pre-print, Feb. 2026

DAVE is a distribution-aware attribution method for Vision Transformers that produces stable, high-resolution pixel-level explanations while reducing patch/grid artifacts common in ViT saliency maps. It decomposes ViT input gradients to suppress operator-variation noise and enforces local equivariance via averaging over small transforms and perturbations, improving localization and faithfulness across multiple ViT backbones.

How to Probe figure How to Probe figure hover

How to Probe: Simple Yet Effective Techniques for Improving Post-hoc Explanations

Siddhartha Gairola, Moritz Boehle, Francesco Locatello and Bernt Schiele

International Conference on Learning Representations (ICLR), 2025

Post-hoc importance attribution methods are a popular tool for "explaining" Deep Neural Networks (DNNs) and are inherently based on the assumption that the explanations can be applied independently of how the models were trained. Contrarily, in this work we bring forward empirical evidence that challenges this very notion. Surprisingly, we discover a strong dependency on and demonstrate that the training details of a pre-trained model's classification layer (<10% of model parameters) play a crucial role, much more than the pre-training scheme itself. With this finding we also present simple yet effective adjustments to the classification layers, that can significantly enhance the quality of model explanations.

Applied ML, Vision, HCI for Healthcare

Retinoscopy figure Retinoscopy figure hover

Towards Automating Retinoscopy for Refractive Error Diagnosis

Aditya Aggarwal, Siddhartha Gairola, Nipun Kwatra and Mohit Jain et al.

Proceedings of the ACM on Interactive, Mobile, Wearable and Ubiquitous Technologies (IMWUT), Volume 6, Issue 3, 2022

In this work, we automate retinoscopy by attaching a smartphone to a retinoscope and recording retinoscopic videos with the patient wearing a custom pair of paper frames. We develop a video processing pipeline that takes retinoscopic videos as input and estimates the net refractive error based on our proposed extension of the retinoscopy mathematical model. Our system alleviates the need for a lens kit and can be performed by an untrained examiner.

Keratoconus figure Keratoconus figure hover

Keratoconus Classifier for Smartphone-based Corneal Topographer

Siddhartha Gairola, Nipun Kwatra and Mohit Jain et al.

IEEE Engineering in Medicine & Biology Society (EMBC), 2022

In this work, we propose a dual-head convolutional neural network (CNN) for classifying keratoconus on the heatmaps generated by SmartKC. Since SmartKC is a new device and only had a small dataset (114 samples), we developed a 2-stage transfer learning strategy to satisfactorily train our network, achieving a sensitivity of 91.3% and a specificity of 94.2%.

RespireNet figure RespireNet figure hover

RespireNet: A Deep Neural Network for Accurately Detecting Abnormal Lung Sounds in Limited Data Setting

Siddhartha Gairola, Francis Tom, Nipun Kwatra and Mohit Jain

IEEE Engineering in Medicine & Biology Society (EMBC), 2021

RespireNet is a simple CNN-based model, along with a suite of novel techniques—device specific fine-tuning, concatenation-based augmentation, blank region clipping, and smart padding—enabling one to efficiently use small-sized datasets. We perform extensive evaluation on the ICBHI dataset, and improve upon the state-of-the-art results for 4-class classification by 2.2%.

Multimodal Machine Learning: Vision and NLP

Master's Thesis

For my Master's research I was supervised by Prof. P. J. Narayanan at CVIT at IIIT Hyderabad. My research was on the following two tasks (1) representation learning for image style search and retrieval, and (2) color consistent background replacement.

Work Experience

Microsoft Research Research Fellow (Aug, 2020 - Aug, 2022)
Microsoft Research Research Intern (Jan, 2020 - July, 2020)
Adobe Inc. Research Intern (Jun, 2019 - Jan, 2020)

Talks and Presentations

Intriguing Applications and Overlooked Pitfalls of XAI in Visual Models
GMUM Workshop, Jagiellonian University, October 2024
RespireNet: A DNN for Accurately Detecting Abnormal Lung Sounds in Limited Data Setting
EMBC 2021
SimPropNet: Improved Similarity Propagation for Few-shot Image Segmentation
IJCAI 2020
Unsupervised Image Style Embeddings for Retrieval and Recognition Tasks
IEEE WACV 2020

Useful Resources and Writings

  1. Notes and Resources on How to Do Research as a Young Researcher — some nice resources that I maintain on how to do research that have proved to be immensely useful to me, a gold mine.
  2. Resources on writing academic papers. (click here)
  3. Resources on reviewing scientific papers. (click here)
  4. Reproducibility checklist by Joelle Pineau. (click here)
  5. I also maintain some writings, resources and FAQs on Graduate School (PhD) application process here that I update sporadically. (click here)
  6. I do blog sometimes on medium (click here) about research, general thoughts and some personal things.

Open-Source Contribution

Google Summer of Code

I contribute actively to open source organizations — Scilab and LibreOffice. I have been working with Scilab for the past 3 years now. My proposals were selected twice (2017, 2018) as a project for GSoC (Google Summer of Code) Program.

Google Summer of Code 2017
Project Details: Implemented a C/C++ wrapper for Matlab MEX-API on current API Scilab.

Google Summer of Code 2018
Project Details: Implemented a DEMO in C/C++ and Scilab as a working example for the MEX Library in Scilab.

Teaching Experience

Saarland University

Worked as a Teaching Assistant at Saarland University for the courses listed below. The duties involved setting up questions for assignments, examinations and paper corrections.

  • 1. Elements of Data Science and Artificial Intelligence (Winter 2023, 2024)
IIIT Hyderabad

Worked as a Teaching Assistant at IIIT Hyderabad for the courses listed below. The duties involved taking regular tutorials, paper corrections, setting up questions for assignments and conducting evaluations.

  • 1. Digital Logic and Processors (Monsoon 2016)
  • 2. Artificial Intelligence (Spring 2017)
  • 3. Digital Image Processing (Monsoon 2017)
  • 4. Computer Vision (Spring 2018)
  • 5. Digital Image Processing (Monsoon 2018)
  • 6. Computer Graphics (Spring 2019)

Education

Max Planck Institute for Informatics & Saarland University
Ph.D. Student, Computer Science (Sept. 2022 - present)
International Institute of Information Technology - Hyderabad
Master of Science (MS) by Research, Computer Science (2018-2020)
International Institute of Information Technology - Hyderabad
Bachelor of Technology (BTech) with Honours, Computer Science (2014-2018)
St. Joseph's Academy, Dehradun
Senior Secondary, ISC (2012-2013)
St. Joseph's Academy, Dehradun
Secondary, ICSE (2010-2011)