I build production-grade AI systems that move past the "wrapper" stage. I'm an AI and Software Engineer focused on the intersection of LLMs, high-performance backends, and MLOps. If it involves shipping RAG pipelines, autonomous agents, or scalable FastAPI infrastructure, I'm probably into it.
- LLMs & RAG: Architecting retrieval pipelines with LangChain and LlamaIndex, optimizing vector search (Pinecone, Chroma, Qdrant), and fine-tuning open-source models like Mistral and Llama 3 on high-compute clusters.
- Backend & Infra: Designing concurrent backends with Python (FastAPI/asyncio), managing state with Redis, and deploying via Docker/Kubernetes on AWS or GCP.
- Voice, Agents & Automation: Building autonomous agents (LangGraph/Autogen) and AI voice systems that handle multi-step workflows and telephony integration.
I prefer shipping real systems over writing research papers. Some recent things I've built:
- RAG over Specialized Domains: A medical quiz generator that transforms complex textbooks into interactive learning tools.
- Intent-driven RecSys: Recommendation engines for media that use user intent and emotion signals rather than just simple metadata.
- Production AI Pipelines: Automated story and image generation pipelines for education platforms, focusing on character consistency and multi-lingual validation.
- Conversational Voice Agents: AI telephony agents for sales and hospitality that handle FAQs and bookings naturally.
- Localized LLM Fine-tuning: Fine-tuning models for specific languages (like Amharic) to handle localized ad generation and sentiment analysis.
- You need a FastAPI service that wraps your model and actually scales.
- You want to move your RAG or agent prototype into a production-ready pipeline.
- You need to optimize vector search speed or implement complex knowledge graphs (Neo4j).
- You want to automate a messy, multi-tool workflow using LLM agents.
📬 Let's talk: linkedin.com/in/biniyamodol


