Wayner Barrios

Researcher · Builder · Multimodal AI

Wayner Barrios

Teaching AI to see the world and explain what it understands

About

I started my research making sense of what happens in videos: detecting actions, localizing moments, understanding temporal structure. That work took me through collaborations with CMU, KAUST, and teams at DARPA/IARPA, with publications at top-tier venues in computer vision and AI.

Over time I moved from "what is happening" to "does the model actually understand why", working on multimodal fusion, cross-modal alignment, and learnable attention masks. My most recent work, CRYSTAL, shows that even the best multimodal models can't maintain coherent reasoning for more than a few steps.

I also care about making things work in practice. I built vLLM-MLX to run LLMs and vision-language models efficiently on Apple Silicon, and I've shipped production ML systems for enterprise and government through work with Adobe Research, Samsung Research, Northeastern University, Mount Sinai, Lunenfeld Institute, Universidad del Norte, EAFIT, Universidad CES, and Universidad de Antioquia. I founded Wiqonn to bring serious AI research to Latin America.

Currently finishing my Ph.D. at Dartmouth College (expected Winter 2026), advised by SouYoung Jin, with Soroush Vosoughi, Nikhil Singh, and Juan Carlos Niebles on my committee. Open to research and engineering roles, let's talk.

Research Focus
Multimodal AI Deep Learning LLMs Video Understanding Post-Training Alignment Large-Scale Pretraining Efficient Models Multimodal Reasoning

Open Source Projects

vLLM-MLX

OpenAI-compatible server for Apple Silicon. Run LLMs and vision-language models (Llama, Qwen-VL, LLaVA) with continuous batching, MCP tool calling, and multimodal support on M1/M2/M3/M4 chips.

Apple Silicon MLX LLM Inference Multimodal

DGX Spark Fine-tune LLM

LLM fine-tuning with LoRA + NVFP4/MXFP8 quantization on NVIDIA DGX Spark (Blackwell GB10).

Blackwell LoRA Quantization

Guidance Video Grounding

Official PyTorch implementation of ICCV 2023 paper on moment retrieval in long videos.

ICCV 2023 PyTorch Video

ActivityNet

Large-scale benchmark for human activity understanding in videos.

Benchmark Dataset

GeoNode

Open source geospatial platform. Contributed to service virtualization and networking.

Open Source GIS
Tech Stack
Languages & Frameworks Python · C++ · PyTorch · JAX · MLX
Acceleration CUDA · ROCm · FPGA (HLS/Vitis)
Distributed Training FSDP · DeepSpeed · Dynamo · Mixed Precision
Optimization Quantization · Pruning · Distillation
Infrastructure K8s · Docker · AWS · GCP · Vector DBs
Beyond Research
Dog lover: Tala, Luna, Raissa & Naia Passionate traveler Audiophile & musician Water sports enthusiast