Oğuzhan Fatih Kar

I am a Machine Learning Researcher at Apple. My research interests are in building generalist multimodal agents that can perceive, reason, and act in physical and digital worlds.

I received my Ph.D. in Computer Science from EPFL, where I was advised by Amir Zamir. My PhD thesis was on building scalable multimodal foundation models that can process diverse inputs such as images, text, 3D, semantics and other sensory data to solve a wide variety of real-world tasks. In 2023/2024, I interned at Google working on vision-language models with Federico Tombari. I received my M.S. and B.S. in Electrical Engineering from METU, where I was advised by Figen Oktem.

Email / Google Scholar / Github / LinkedIn / Twitter

Honors

Chorafas Award for PhD Thesis, 2025
Top Reviewer, NeurIPS 2024
CVPR Best Paper Award Nomination for Cross-Task Consistency, 2020
EPFL EDIC PhD Fellowship, 2019-2020
TUBITAK Full Scholarship during M.S., 2017-2019
METU Bulent Kerim Altay Award, 2015

Recent Work

	How Well Does GPT-4o Understand Vision? Evaluating Multimodal Foundation Models on Standard Computer Vision Tasks R. Ramachandran, A. Garjani, R. Bachmann, A. Atanov, O.F. Kar, A. Zamir* ICLR, 2026 [Website] [Code]
	FlexTok: Resampling Images into 1D Token Sequences of Flexible Length R. Bachmann, J. Allardice, D. Mizrahi, E. Fini, O.F. Kar, E. Amirloo, A. El-Nouby, A. Zamir, A. Dehghan ICML*, 2025 [Website] [Code] [Demo]
	4M-21: An Any-to-Any Vision Model for Tens of Tasks and Modalities O.F. Kar, R. Bachmann, D. Mizrahi, A. Garjani, M. Gao, D. Griffiths, J. Hu, A. Dehghan, A. Zamir NeurIPS*, 2024 [Website] [Code] [Demo]
	BRAVE: Broadening the visual encoding of vision-language models O.F. Kar, A. Tonioni, P. Poklukar, A. Kulshrestha, A. Zamir, F. Tombari ECCV, 2024 [Oral, Top 2%] [Website]
	Unraveling the Key Components of OOD Generalization via Diversification H. Benoit, L. Jiang, A. Atanov, O.F. Kar, M. Rigotti, A. Zamir ICLR*, 2024 [arXiv]
	4M: Massively Multimodal Masked Modeling D. Mizrahi, R. Bachmann, O.F. Kar, T. Yeo, M. Gao, A. Dehghan, A. Zamir NeurIPS, 2023 [Spotlight, Top 4%] [Website]
	Rapid Network Adaptation: Learning to Adapt Neural Networks Using Test-Time Feedback T. Yeo, O.F. Kar, Z. Sodagar, A. Zamir ICCV, 2023 [Website]
	3D Common Corruptions and Data Augmentation O.F. Kar, T. Yeo, A. Atanov, A. Zamir CVPR, 2022 [Oral, Top 4%] [Website] [Code] [Video] [Live Demo] [TrustML Talk]
	Robustness via Cross-domain Ensembles O.F. Kar, T. Yeo, A. Zamir ICCV, 2021 [Oral, Top 3%] [Website] [Code] [Video] [Slides]
	Robust Learning Through Cross-task Consistency A. Zamir, A. Sax, T. Yeo, O.F. Kar, N. Cheerla, R. Suri, Z. Cao, J. Malik, L. Guibas Arxiv, 2020. CVPR, 2020 [Best Paper Award Nominee, Oral] [Live Demo] [Visuals] [Website] [Code] [ECCV 2020 Demo Video]

M.S. Work (2018-2021)

(Complete list on Google Scholar)

	High-resolution Multi-spectral Imaging with Diffractive Lenses and Learned Reconstruction F.S. Oktem, O.F. Kar, C. D. Bezek, F. Kamalabadi IEEE Transactions on Computational Imaging, 2021 [Arxiv]
	Compressive Spectral Imaging with Diffractive Lenses O.F. Kar, F.S. Oktem Optics Letters, 2019 [arXiv]
	Real-time Compressive Video Reconstruction for Spatial Multiplexing Cameras O.F. Kar, A. Gungor, H.E. Guven IEEE GLOBALSIP, 2019 [Visuals]
	Learning-based Regularization for Spatial Multiplexing Cameras O.F. Kar, A. Gungor, H.E. Guven IEEE GLOBALSIP, 2019
	A Transform Learning-based Deconvolution Technique with Super-resolution and Microscanning Applications A. Gungor, O.F. Kar IEEE ICIP, 2019
	A Matrix-free Reconstruction Method for Compressive Focal Plane Array Imaging A. Gungor, O.F. Kar, H.E. Guven IEEE ICIP, 2018

Template

Last Update: March 2026