Data Engineer | AI ENGINEER | ML Engineer | MLOps Engineer | Cloud Engineer
Building intelligent data systems that bridge cloud infrastructure, machine learning, and healthcare innovation
I'm a data engineer and machine learning specialist based in Pretoria, South Africa, with a passion for architecting scalable data pipelines and deploying production-grade machine learning systems. With 3+ years of hands-on experience, I've delivered solutions that process millions of records daily while maintaining 99.5%+ reliability and reducing operational costs by up to 85%.
What drives me: Turning complex data challenges into elegant, automated solutions that create real business value.
- 6+ production pipelines processing 1.2M+ records/hour
- AWS-certified cloud architect specializing in serverless architectures
- 90% accuracy ML models in production
- 30-85% cost reduction across multiple projects
- 99.5% uptime through robust monitoring and error handling
| Certification / Qualification | Provider / Institution | Year |
|---|---|---|
| Diploma in Financial Management | Regent Business School | 2025 |
| Data Engineer Professional | DataCamp | 2025 |
| Associate Data Engineer | DataCamp | 2025 |
| SQL Associate | DataCamp | 2025 |
| Machine Learning Engineer | DataCamp | 2025 |
| AI Engineer For Data Scientists Associate | DataCamp | 2025 |
| IT Automation with Python | 2025 | |
| Azure Solutions Architect | Microsoft | 2025 |
| Azure AI Engineer Associate | Microsoft | 2025 |
| Azure Data Scientist Associate | Microsoft | 2025 |
skills = {
"languages": ["Python", "SQL", "TypeScript", "JavaScript", "Bash"],
"cloud": ["AWS Lambda", "S3", "DynamoDB", "API Gateway", "Step Functions", "Kinesis"],
"data_engineering": ["Airflow", "Pandas", "NumPy", "ETL Pipelines", "Real-time Streaming"],
"ml_ai": ["TensorFlow", "Scikit-Learn", "XGBoost", "CNNs", "Transfer Learning"],
"devops": ["Docker", "CI/CD", "GitHub Actions", "Infrastructure as Code"],
"frontend": ["React", "TailwindCSS", "Vite"]
}
Fintech | Machine Learning | Risk Modeling | XGBoost
End-to-end machine learning system for predicting loan default risk using borrower financial and demographic data.
🎯 Key Features & Results:
- AUC-ROC: 0.945 (excellent predictive performance)
- Comprehensive EDA, outlier handling, and feature engineering (e.g., debt-to-income ratio)
- Model comparison: XGBoost outperformed Logistic Regression and KNN
- Interpretable insights via feature importance and visualizations
- Production-ready pipeline with preprocessing and evaluation metrics
📊 Technical Highlights:
# Top predictive features identified:
- Loan grade & interest rate (highest risk drivers)
- Debt-to-income ratio (engineered feature)
- Home ownership status (renters higher risk)
- Loan amount and percent income💡 Why It Matters:
- Demonstrates real-world fintech application of ML for credit scoring
- Handles class imbalance and provides actionable risk insights
- Strong addition to financial domain expertise (complements Diploma in Financial Management)
🔗 Tech Stack: Python • Pandas • Scikit-learn • XGBoost • Matplotlib • Seaborn
View Project →

Healthcare AI | Computer Vision | Transfer Learning
A production-ready medical imaging classification system leveraging CNNs and transfer learning for diagnostic assistance.
🎯 Key Features:
- Multiple architectures: Custom CNN, VGG16, ResNet50, InceptionV3
- Synthetic medical image generation (X-Ray, Brain MRI)
- Grad-CAM visualization for model interpretability
- Complete training pipeline with data augmentation
- Real-time prediction with confidence scores
🔗 Tech Stack: TensorFlow • Keras • OpenCV • Scikit-learn • Matplotlib • Seaborn
View Project →
Machine Learning | Sports Analytics | Feature Engineering
End-to-end ML system predicting Formula 1 race outcomes with 90% accuracy.
🎯 Results:
- 90% accuracy on race winner predictions
- 95%+ accuracy on podium predictions
🔗 Tech Stack: Python • XGBoost • FastF1 API • Pandas • Scikit-learn • Matplotlib
Interactive Dashboard | Modern UI | Real-time Monitoring
A modern, interactive dashboard showcasing Data Engineering & MLOps capabilities.
✨ Features:
- 📊 Interactive data visualizations
- 🌙 Dark mode with responsive design
- ⚡ Automated CI/CD
🔗 Tech Stack: React • TypeScript • TailwindCSS • Recharts • Vite • GitHub Actions
| Project | Tech Stack | Impact |
|---|---|---|
| Weather Analytics Pipeline | AWS Lambda, S3, DynamoDB | 99.5% uptime, <$0.10/month |
| Inventory Optimization Engine | Python, XGBoost, Scikit-Learn | 92% accuracy, 30% cost reduction |
| Cryptocurrency ETL | Python, Airflow, REST APIs | 85% time savings |
| IoT Processing System | Python, SQLite, JavaScript | 1.2M+ records/hour |
| Financial Automation | Airflow, Pandas, PostgreSQL | 3 days → 4 hours |
I design cloud-native, serverless-first, and event-driven architectures for maximum scalability and cost efficiency:
┌─────────────────────┐ ┌──────────────────────┐ ┌─────────────────────┐
│ 📊 Data Sources │───▶│ 🔄 Ingestion Layer │───▶│ 🏗️ Processing │
│ │ │ │ │ │
│ • REST/GraphQL APIs │ │ • AWS API Gateway │ │ • AWS Lambda │
│ • IoT Sensors/MQTT │ │ • Amazon Kinesis │ │ • Step Functions │
│ • Database CDC │ │ • AWS EventBridge │ │ • Glue ETL Jobs │
│ • File Uploads │ │ • SQS/SNS Queues │ │ • Error Handling │
│ • Streaming Data │ │ • Scheduled Triggers │ │ • Dead Letter Queue │
└─────────────────────┘ └──────────────────────┘ └─────────────────────┘
| Metric | Target | Achieved |
|---|---|---|
| Pipeline Uptime | 99.9% | 99.5% ✅ |
| Processing Latency | <100ms | <200ms ⚡ |
| Cost per TB | <$10 | $15 💰 |
| Error Rate | <0.1% | <0.5% 🎯 |
Local meets global: Combining South African insights with cloud-scale infrastructure
Passion-driven projects: Built an F1 predictor merging hobbies with tech
Serverless advocate: If it can run without a server, I'm interested
Healthcare AI: Applying ML to medical imaging
Fintech & Finance: Leveraging financial management knowledge with ML for credit risk modeling
Continuous learner: Exploring real-time streaming and advanced MLOps
My goal: Build data systems so reliable, they become invisible
I'm always excited to collaborate on data engineering, ML, fintech, or cloud projects. Let's talk!
📧 Email: [email protected]
🔗 LinkedIn: linkedin.com/in/tiiso-letsapa-664990209
💻 GitHub: github.com/Letsapatiiso07
⭐️ "Building intelligent systems that turn data into decisions" 🚀
Based in Pretoria, South Africa 🇿🇦 | Open to remote opportunities worldwide 🌍