Researcher · Builder · Multimodal AI

Wayner Barrios

Teaching AI to see the world and explain what it understands

About

I started my research making sense of what happens in videos: detecting actions, localizing moments, understanding temporal structure. That work took me through collaborations with CMU, KAUST, and teams at DARPA/IARPA, with publications at top-tier venues in computer vision and AI.

Over time I moved from "what is happening" to "does the model actually understand why", working on multimodal fusion, cross-modal alignment, and learnable attention masks. My most recent work, CRYSTAL, shows that even the best multimodal models can't maintain coherent reasoning for more than a few steps.

I also care about making things work in practice. I built vLLM-MLX to run LLMs and vision-language models efficiently on Apple Silicon, and I've shipped production ML systems for enterprise and government through work with Adobe Research, Samsung Research, Northeastern University, Mount Sinai, Lunenfeld Institute, Universidad del Norte, EAFIT, Universidad CES, and Universidad de Antioquia. I founded Wiqonn to bring serious AI research to Latin America.

Currently finishing my Ph.D. at Dartmouth College (expected Winter 2026), advised by SouYoung Jin, with Soroush Vosoughi, Nikhil Singh, and Juan Carlos Niebles on my committee. Open to research and engineering roles, let's talk.

Research Focus

Multimodal AI Deep Learning LLMs Video Understanding Post-Training Alignment Large-Scale Pretraining Efficient Models Multimodal Reasoning

Latest Paper

arXiv 2026

Beyond Final Answers: CRYSTAL Benchmark for Transparent Multimodal Reasoning Evaluation

W. Barrios, S. Jin

CRYSTAL is a 6,372-instance benchmark that evaluates multimodal reasoning by checking every intermediate step, not just the final answer. Testing 20 models revealed that none can maintain over 60% step accuracy in the right order. We propose Causal Process Reward, a training approach that improves step-level consistency by 32% without manual annotations.

arXiv 2026

Native LLM and MLLM Inference at Scale on Apple Silicon

W. Barrios

Paper Code

arXiv 2025

MoDA: Modulation Adapter for Fine-Grained Visual Grounding in Instructional MLLMs

W. Barrios, A. Villa, J. L. Alcázar, S. Jin, B. Ghanem

Paper

WACV 2025

FT2TF: First-Person Statement Text-To-Talking Face Generation

X. Diao, M. Cheng, W. Barrios, S. Jin

Paper

arXiv 2024

Multi-layer Learnable Attention Mask for Multimodal Tasks

W. Barrios, S. Jin

Paper

ICCV 2023

Localizing Moments in Long Video Via Multimodal Guidance

W. Barrios, M. Soldan, A. M. Ceballos-Arroyo, F. Caba, B. Ghanem

Paper Code

J. Pathology Informatics 2022

Bladder Cancer Prognosis Using Deep Neural Networks and Histopathology Images

W. Barrios, B. Abdollahi, M. Goyal, Q. Song, et al.

Paper

Frontiers Med. Tech. 2022

Medical Decision Support System Using Weakly-Labeled Lung CT Scans

A. Murillo-González, D. González, L. Jaramillo, ..., W. Barrios, ..., O.L. Quintero

Paper

Int. J. Artificial Intelligence 2021

Influence of Preprocessing and Segmentation on the Complexity of Learning Machines in Medical Imaging

J.G. Paniagua, D. Restrepo, L. Ariza-Jiménez, ..., W. Barrios, ..., O.L. Quintero

Paper

WACVW 2019

Minding the Gaps in a Video Action Analysis Pipeline

J. Chen, J. Liu, J. Liang, T.Y. Hu, W. Ke, W. Barrios, D. Huang, A.G. Hauptmann

Paper

TRECVID 2018

Informedia @ TRECVID 2018

J. Chen, P.Y. Huang, J. Liu, J. Liang, T.Y. Hu, W. Ke, W. Barrios, et al.

Paper

CVPR 2017

SCC: Semantic Context Cascade for Efficient Action Detection

F. Caba, W. Barrios, V. Escorcia, B. Ghanem

112 citations

Paper

View all publications

Open Source Projects

vLLM-MLX

OpenAI-compatible server for Apple Silicon. Run LLMs and vision-language models (Llama, Qwen-VL, LLaVA) with continuous batching, MCP tool calling, and multimodal support on M1/M2/M3/M4 chips.

Apple Silicon MLX LLM Inference Multimodal

DGX Spark Fine-tune LLM

LLM fine-tuning with LoRA + NVFP4/MXFP8 quantization on NVIDIA DGX Spark (Blackwell GB10).

Blackwell LoRA Quantization

Guidance Video Grounding

Official PyTorch implementation of ICCV 2023 paper on moment retrieval in long videos.

ICCV 2023 PyTorch Video

ActivityNet

Large-scale benchmark for human activity understanding in videos.

Benchmark Dataset

GeoNode

Open source geospatial platform. Contributed to service virtualization and networking.

Open Source GIS

@waybarrios View all repositories

Tech Stack

Languages & Frameworks Python · C++ · PyTorch · JAX · MLX

Acceleration CUDA · ROCm · FPGA (HLS/Vitis)

Distributed Training FSDP · DeepSpeed · Dynamo · Mixed Precision

Optimization Quantization · Pruning · Distillation

Infrastructure K8s · Docker · AWS · GCP · Vector DBs

Beyond Research

Dog lover: Tala, Luna, Raissa & Naia Passionate traveler Audiophile & musician Water sports enthusiast

Get in Touch

Academic [email protected] Personal [email protected] Business [email protected]