I’m a Senior Machine Learning & Distributed Systems Engineer with 10+ years of experience architecting large-scale AI infrastructure, GPU inference platforms, and multi-agent (MCP) systems. I specialize in mission-critical systems demanding tight control over latency, reliability, orchestration, and control-plane design.
I've shipped:
- Real-time distributed systems at Microsoft
- Identity & risk-scoring engines at AWS
- Multi-agent (MCP) platforms for automating developer workflows end-to-end
- Designed real-time platforms processing 10M+ GPU telemetry events/day (TP99 <120ms, 99.99% uptime, Azure-scale compute)
- Architected global, multi-region orchestration systems using Kubernetes, Synapse, Spark, ADF (2s pipeline latency reduction)
- Built low-level telemetry & diagnostics for Maia 100 AI accelerators (Redfish API integration)
- Developed multiple MCP servers powering:
- Automated PR generation
- Repo-wide code intelligence
- Contextual retrieval from CI/CD + logs
- Issue tracking & GitHub tool integration
- Implemented deterministic workflows, tool-calling chains, and developer automation pipelines
- Designed Mosaic-style agent frameworks (planning, reasoning, orchestration)
- Created end-to-end log-based RAG for diagnostics/investigation workflows
- Built a full restaurant recommendation RAG system with LlamaIndex + Elasticsearch (vector search, hybrid retrieval, embeddings, caching)
- Developed semantic search tools for design docs (Azure AI Foundry + Semantic Kernel). Improved retrieval efficiency by 60%
- Architected microservice-based ML pipelines and anomaly detection frameworks
- Built scalable ETL pipelines (Spark, DynamoDB, Kafka), integrated distributed monitoring/alerting
- Designed REST APIs, CI/CD workflows, and containerized services for cloud platforms
Infrastructure & Distributed Systems
- Kubernetes • Docker • Spark/Flink • Synapse • Redfish API
- Control Plane Design • Autoscaling • Routing
AI / ML / LLM Systems
- Inference Pipelines • Vector Search • RAG • Embeddings • Observability
- Feature Engineering • GPU Telemetry
Agents & MCP
- Tool Calling • Multi-agent Orchestration • PR/Repo Automation • Deterministic Workflows • MCP Servers
Cloud Platforms
- Azure (AI Foundry, Functions, Compute, AI Search)
- AWS (SageMaker, DynamoDB, CloudFormation)
- GCP (Familiar)
Languages
- Python • Go • Java • C++ • Bash • JavaScript/Node • SQL/NoSQL
A complete LlamaIndex + Elasticsearch based system utilizing multi-source ingestion, hybrid retrieval, embeddings, and chat-style personalization.
Multi-agent workflow automation for code review, PR generation, CI/CD understanding, and intelligent repo analysis.
Replicated Databricks Agent Bricks patterns: tool orchestration, structured reasoning, vector-based retrieval, agent messaging layers.
- Building distributed inference & scheduling systems
- Designing latency-aware routing, capacity planning, and control-plane components
- Creating MCP-enabled agent ecosystems for automation & reasoning
- Optimizing GPU utilization and system reliability at scale
- Scaling observability, health monitoring, and model versioning
- Architecting backend systems for mission-critical AI workloads
- Advancing agent orchestration with MCP
- Building deterministically reproducible agent workflows
- Improving inference through caching, batching, and routing
- Developing RAG systems grounded in operational logs & telemetry
- Exploring LLM safety, validation, and structured reasoning integrations
If you’re working on high-performance AI infrastructure, next-gen inference, or agentic frameworks, let’s connect!
I'm especially interested in collaborations where safety, reliability, and real-time performance are paramount.

