Ali Vosoughi
Ali Vosoughi
้˜ฟๅŠ›
PhD Candidate advised by Prof Axel Wismueller and Prof Chenliang Xu
University of Rochester
๐Ÿค– Agentic AI Systems ๐ŸŽต Computer Audition ๐Ÿง  Multimodal Reasoning ๐ŸŽฌ Multimodal Generation ๐Ÿฅฝ Immersive Computing ๐Ÿ” Reasoning Verification ๐ŸŽฏ Reinforcement Learning ๐Ÿš€ Large Action Models ๐Ÿ”Š Audio Generation ๐Ÿ“น Video Generation
๐Ÿ“ง ali.vosoughi@rochester.edu
๐Ÿ“ CS Department, Wegmans Hall 3211
๐ŸŽ Apple
Machine Learning Intern
Agentic Multimodal AI
present
๐ŸŽต Smule AI
Research Scientist Intern
Spatial Audio Generation
Junโ€“Sep 2025
๐Ÿข Microsoft Research
Research Intern
Audiovisual LLM
Mayโ€“Aug 2024
๐Ÿš— Bosch AI Research
Research Intern
Audio LLM
Aprโ€“Jul 2023
๐Ÿ›ก๏ธ DARPA PTG
Graduate Researcher
Autonomous AR Copilot
2022โ€“present
๐Ÿ†
First counterfactual audio methods
ICASSP’24 + US Patent US20250124292A1 (published Jan 2025)
๐Ÿค
Autonomous multimodal copilot
Real-time AR demonstrations (DARPA)
๐Ÿ“Š
VERIFY benchmark
Reasoning verification framework

Recent News & Updates

10/2024
๐ŸŽค Presented at SANE 2024, DeepMind Boston
10/2024
๐Ÿ“„ ACM Multimedia 2024 paper accepted
08/2024
๐Ÿ’ผ Research presentation at Microsoft, Seattle
03/2024
๐Ÿ“„ NAACL 2024 paper accepted
02/2024
๐Ÿ“„ IEEE Transactions on Multimedia paper
08/2023
๐ŸŽฏ Two ICCV 2023 papers accepted

Publications

Image

PromptReverb: Multimodal Room Impulse Response Generation Through Latent Rectified Flow Matching
IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) 2026
[Paper][Website]

Image

VERIFY: A Benchmark of Visual Explanation and Reasoning for Investigating Multimodal Reasoning Fidelity
Under Review’26
[Paper][Website][๐Ÿค— Hugging Face]

Image

Quality Over Quantity? LLM-Based Curation for a Data-Efficient Audio-Video Foundation Model
European Signal Processing Conference (EUSIPCO) 2025
[Paper][Website]

Image

EAGLE: Egocentric AGgregated Language-video Engine
ACM International Conference on Multimedia (ACM MM) 2024
[Paper]

Image

PW-VQA: Cross Modality Bias in Visual Question Answering: A Causal View with Possible Worlds VQA
IEEE Transactions on Multimedia (TMM) 2024

[Paper][Code][Website]

Image

OSCaR: Object State Captioning and State Change Representation
North American Chapter of the Association for Computational Linguistics (NAACL) 2024
[Paper][Code]

Image

Video Understanding with Large Language Models: A Survey
IEEE Transactions on Circuits and Systems for Video Technology (TCSVT) 2025
[Paper][Code]

Image

Learning Audio Concepts from Counterfactual Natural Language
IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) 2024
[Paper][Code][Patent]

Image

AVSA-Sep: Separating Invisible Sounds Toward Universal Audiovisual Scene-Aware Sound Separation
IEEE/CVF International Conference on Computer Vision (ICCV) 2023: ICCV AV4D Workshop
[Paper]

Image

MISAR: A Multimodal Instructional System with Augmented Reality
IEEE/CVF International Conference on Computer Vision (ICCV) 2023: ICCV AV4D Workshop
[Paper][Code][Video]

Image

Relation Discovery in Nonlinearly Related Large-scale Settings
IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) 2022
[Paper][Code]

Image

Leveraging Pre-Images to Discover Nonlinear Relationships in Multivariate Environments
European Signal Processing Conference (EUSIPCO) 2021
[Paper]

Image

Large-scale Nonlinear Granger Causality for Inferring Directed Dependence from Short Multivariate Time-series Data
Scientific Reports, Nature Publishing Group (Nature) 2021
[Paper][Code]


Personal Gallery

Image
Ali Vosoughi
Image
Ali Vosoughi