Siddhartha Gairola

I am an ELLIS PhD Student advised by Prof. Bernt Schiele (Max Planck Institute of Informatics, Saarbrücken) and Prof. Francesco Locatello (ISTA). During the course of my PhD I shall be associated with both the Max Planck Institute for Informatics and Institute of Science and Technology, Austria.

Previously, I was a Research Fellow at Microsoft Research (MSR), India in the Technology for Emerging Markets group, where I worked on applications of Computer Vision, Image Processing and Machine Learning to developing low-cost diagnostic solutions for healthcare. Before that, I worked as a Research Intern at Adobe Inc. in the Media and Data Science Research Group on image understanding tasks.

I completed my Master's and B.Tech with Honours in May, 2020 at IIIT Hyderabad, where I was advised by Prof. PJ Narayanan. My work was mainly on learning robust unsupervised style representations for image recognition and retrieval tasks.

Research

Current Research: I am broadly interested in Artificial Intelligence (Machine Learning), Computer Vision, Image Processing, Natural Language Processing and their applications to real-world problems. I am particularly interested in building reliable (robust) systems that model visual perception with limited supervision. To this end, for my PhD I will be exploring two major directions:

Interpretability and Robustness of Deep Neural Networks
Learning Powerful (unsupervised) Object-Centric Representations

Published Research: My prior research has involved using machine learning, image processing, and optical physics to tackle key challenges in computer vision and healthcare, including (1) image compositing, (2) self-supervised representation learning, (3) few-shot image segmentation, (4) explainable AI (xAI) (5) anomaly detection, and (6) AI-based medical diagnosis. These have resulted in publications at top conferences like ICLR, IJCAI, WACV, SIGIR, EMBC, IMWUT/Ubicomp and MMM.

When not working on my research, I like to play the piano and guitar, listen to music, read non-fiction, drive motorcycles, and go for a run or hike. I also am really fascinated by paradoxes, can find some here. (I wish I had Hermione's Time-Turner to do much more in a day as much as I'd like to.) (see more at Personal)

Computer Vision, Image Processing and Machine Learning

DAVE : Distribution-aware Attribution via ViT Gradient Decomposition

Adam Wróbel, Siddhartha Gairola, Jacek Tabor, Bernt Schiele, Bartosz Zieliński, Dawid Rymarczyk

Pre-print, Feb. 2026

arxiv / code / presentation / bibtex

DAVE is a distribution-aware attribution method for Vision Transformers that produces stable, high-resolution pixel-level explanations while reducing patch/grid artifacts common in ViT saliency maps. It decomposes ViT input gradients to suppress operator-variation noise and enforces local equivariance via averaging over small transforms and perturbations, improving localization and faithfulness across multiple ViT backbones.

How to Probe: Simple Yet Effective Techniques for Improving Post-hoc Explanations

Siddhartha Gairola, Moritz Boehle, Francesco Locatello and Bernt Schiele

International Conference on Learning Representations (ICLR), 2025

OpenReview / arxiv / code / bibtex

Post-hoc importance attribution methods are a popular tool for "explaining" Deep Neural Networks (DNNs) and are inherently based on the assumption that the explanations can be applied independently of how the models were trained. Contrarily, in this work we bring forward empirical evidence that challenges this very notion. Surprisingly, we discover a strong dependency on and demonstrate that the training details of a pre-trained model's classification layer (<10% of model parameters) play a crucial role, much more than the pre-training scheme itself. With this finding we also present simple yet effective adjustments to the classification layers, that can significantly enhance the quality of model explanations.

SimPropNet: Improved Similarity Propagation for Few Shot Segmentation

Siddhartha Gairola, Ayush Chopra, Mayur Hemani and Balalji K.

International Joint Conferences on Artificial Intelligence (IJCAI), 2020

IJCAI_Proceedings / pdf / bibtex

Improving similarity propagation to improve one-shot and few-shot image segmentation.

Unsupervised Image Style Embeddings for Retrieval and Recognition Tasks

Siddhartha Gairola, Rajvi Shah, P.J. Narayanan

IEEE Winter Conference on Applications of Computer Vision (WACV '20), 2020

code / project_page / paper / supp_material / bibtex

An unsupervised protocol for learning a neural embedding of visual style of images. The proposed protocol does not leverage categorical labels but a proxy measure for finding stylistically similar and dissimilar images.

Find Me a Sky : A Data-driven Method for Color-Consistent Sky Search and Replacement

Saumya Rawat*, Siddhartha Gairola*, Rajvi Shah, P.J. Narayanan

International Conference on Multimedia Modelling (MMM '18), 2018

project_page / pdf / bibtex

A data driven method for color-consistent sky search and replacement. This technology does not require the use of complex color transfer techniques.

*Both authors contributed equally towards this work.

Applied ML, Vision, HCI for Healthcare

SmartKC++: Improving Performance of Smartphone-Based Corneal Topographers

Vaibhav Ganatra, Siddhartha Gairola, Nipun Kwatra and Mohit Jain et al.

IEEE Winter Conference on Applications of Computer Vision (WACV '25), 2025

Open Access / code / bibtex

Improving the SmartKC image processing pipeline, making it more robust and accurate.

Towards Automating Retinoscopy for Refractive Error Diagnosis

Aditya Aggarwal, Siddhartha Gairola, Nipun Kwatra and Mohit Jain et al.

Proceedings of the ACM on Interactive, Mobile, Wearable and Ubiquitous Technologies (IMWUT), Volume 6, Issue 3, 2022

code / project_page / pdf

In this work, we automate retinoscopy by attaching a smartphone to a retinoscope and recording retinoscopic videos with the patient wearing a custom pair of paper frames. We develop a video processing pipeline that takes retinoscopic videos as input and estimates the net refractive error based on our proposed extension of the retinoscopy mathematical model. Our system alleviates the need for a lens kit and can be performed by an untrained examiner.

Keratoconus Classifier for Smartphone-based Corneal Topographer

Siddhartha Gairola, Nipun Kwatra and Mohit Jain et al.

IEEE Engineering in Medicine & Biology Society (EMBC), 2022

arXiv / pdf

In this work, we propose a dual-head convolutional neural network (CNN) for classifying keratoconus on the heatmaps generated by SmartKC. Since SmartKC is a new device and only had a small dataset (114 samples), we developed a 2-stage transfer learning strategy to satisfactorily train our network, achieving a sensitivity of 91.3% and a specificity of 94.2%.

Smartphone based Corneal Topographer

Siddhartha Gairola, Nipun Kwatra and Mohit Jain

Proceedings of the ACM on Interactive, Mobile, Wearable and Ubiquitous Technologies (IMWUT), Volume 5, Issue 4, 2021

code / project_page / pdf / demo_video

SmartKC is a low-cost smartphone-based corneal topographer. It provides a low-cost solution for the mass screening of keratoconus at scale.

RespireNet: A Deep Neural Network for Accurately Detecting Abnormal Lung Sounds in Limited Data Setting

Siddhartha Gairola, Francis Tom, Nipun Kwatra and Mohit Jain

IEEE Engineering in Medicine & Biology Society (EMBC), 2021

code / arXiv / pdf / bibtex

RespireNet is a simple CNN-based model, along with a suite of novel techniques—device specific fine-tuning, concatenation-based augmentation, blank region clipping, and smart padding—enabling one to efficiently use small-sized datasets. We perform extensive evaluation on the ICBHI dataset, and improve upon the state-of-the-art results for 4-class classification by 2.2%.

Multimodal Machine Learning: Vision and NLP

Identifying Clickbait: A Multi-Strategy Approach Using Neural Networks

Dhruv Khattar, Siddhartha Gairola, Vaibhav Kumar, Yash Kumar Lal, Vasudeva Varma

ACM SIGIR, 2018

code / arxiv / bibtex

Detecting clickbaits using Deep Learning.

Master's Thesis

For my Master's research I was supervised by Prof. P. J. Narayanan at CVIT at IIIT Hyderabad. My research was on the following two tasks (1) representation learning for image style search and retrieval, and (2) color consistent background replacement.

Image Representations for Style Retrieval, Recognition and Background Replacement Tasks

Siddhartha Gairola

Master's Thesis, IIIT Hyderabad, 2020

abstract / pdf

Work Experience

Microsoft Research Research Fellow (Aug, 2020 - Aug, 2022)

Microsoft Research Research Intern (Jan, 2020 - July, 2020)

Adobe Inc. Research Intern (Jun, 2019 - Jan, 2020)

Talks and Presentations

Intriguing Applications and Overlooked Pitfalls of XAI in Visual Models
GMUM Workshop, Jagiellonian University, October 2024

slides

RespireNet: A DNN for Accurately Detecting Abnormal Lung Sounds in Limited Data Setting
EMBC 2021

video

SimPropNet: Improved Similarity Propagation for Few-shot Image Segmentation
IJCAI 2020

video

Unsupervised Image Style Embeddings for Retrieval and Recognition Tasks
IEEE WACV 2020

video

Useful Resources and Writings

Notes and Resources on How to Do Research as a Young Researcher — some nice resources that I maintain on how to do research that have proved to be immensely useful to me, a gold mine.
Resources on writing academic papers. (click here)
Resources on reviewing scientific papers. (click here)
Reproducibility checklist by Joelle Pineau. (click here)
I also maintain some writings, resources and FAQs on Graduate School (PhD) application process here that I update sporadically. (click here)
I do blog sometimes on medium (click here) about research, general thoughts and some personal things.

Open-Source Contribution

I contribute actively to open source organizations — Scilab and LibreOffice. I have been working with Scilab for the past 3 years now. My proposals were selected twice (2017, 2018) as a project for GSoC (Google Summer of Code) Program.

Google Summer of Code 2017
Project Details: Implemented a C/C++ wrapper for Matlab MEX-API on current API Scilab.

Google Summer of Code 2018
Project Details: Implemented a DEMO in C/C++ and Scilab as a working example for the MEX Library in Scilab.

Teaching Experience

Worked as a Teaching Assistant at Saarland University for the courses listed below. The duties involved setting up questions for assignments, examinations and paper corrections.

1. Elements of Data Science and Artificial Intelligence (Winter 2023, 2024)

Worked as a Teaching Assistant at IIIT Hyderabad for the courses listed below. The duties involved taking regular tutorials, paper corrections, setting up questions for assignments and conducting evaluations.

1. Digital Logic and Processors (Monsoon 2016)
2. Artificial Intelligence (Spring 2017)
3. Digital Image Processing (Monsoon 2017)
4. Computer Vision (Spring 2018)
5. Digital Image Processing (Monsoon 2018)
6. Computer Graphics (Spring 2019)

Education

Max Planck Institute for Informatics & Saarland University
Ph.D. Student, Computer Science (Sept. 2022 - present)

International Institute of Information Technology - Hyderabad
Master of Science (MS) by Research, Computer Science (2018-2020)

International Institute of Information Technology - Hyderabad
Bachelor of Technology (BTech) with Honours, Computer Science (2014-2018)

St. Joseph's Academy, Dehradun
Senior Secondary, ISC (2012-2013)

St. Joseph's Academy, Dehradun
Secondary, ICSE (2010-2011)