Autonomous Machine Learning System for Fair AI Optimization
Prometheus Lab is an autonomous machine learning system that uses Large Language Model (LLM) agents to optimize model fairness and robustness under distribution shift. The system automatically proposes, tests, and refines training strategies to improve worst-group accuracy while maintaining overall model performance.
Machine learning models often perform poorly on out-of-distribution (OOD) data, particularly for minority subgroups. This creates fairness issues in sensitive applications like recidivism prediction, healthcare, and lending.
Prometheus Lab uses an autonomous LLM-driven agent loop to:
- π Auto-detect bias and fairness issues in any uploaded dataset
- π€ Propose novel training strategies using multi-agent AI system
- β‘ Execute experiments with advanced ML techniques
- π Optimize for worst-group accuracy (WGA) while maintaining overall performance
- π― Deliver production-ready fair AI models
# Clone the repository
git clone https://github.com/SACHokstack/Autonomous_AI_Research_Lab.git
cd Autonomous_AI_Research_Lab
# Create and activate virtual environment
python3.12 -m venv venv
source venv/bin/activate # On Windows: venv\Scripts\activate
# Install dependencies
pip install -r requirements.txtCreate a .env file with your API keys:
# Required: Choose one LLM provider
GROQ_API_KEY=your_groq_api_key_here # Recommended: Fast & cheap
# Alternative
Get API Keys:
- Groq (Recommended): console.groq.com - Free tier with Llama 3.3-70B
Development Mode:
python run.pyProduction Mode:
gunicorn src.app:app --bind 0.0.0.0:8000Open your browser to http://localhost:5000 (dev) or http://localhost:8000 (prod)
- Upload CSV: Drop any binary classification dataset
- Auto-Analysis: System detects target variable and protected attributes
- Agent Optimization: Multi-agent AI system optimizes for fairness
- Download Results: Get optimized model and fairness analysis
# Run experiment on built-in datasets
python src/run_experiment.py --dataset compas --strategy balanced_strong_reg
# Auto-analyze any CSV file
python src/auto_analyze.py --file your_data.csv
# Launch autonomous agent loop
python src/auto_lab.py --dataset your_datasetβββββββββββββββββββ βββββββββββββββββββ βββββββββββββββββββ
β Strategy Agent β β Research Agent β β Critic Agent β
β β β β β β
β Proposes new ββββββ Analyzes currentββββββ Evaluates β
β training configsβ β results & issuesβ β feasibility β
βββββββββββββββββββ βββββββββββββββββββ βββββββββββββββββββ
β β β
ββββββββββββββββββββββββββΌβββββββββββββββββββββββββ
β
βββββββββββββββββββ
β Judge Agent β
β β
β Scores & ranks β
β experiments β
βββββββββββββββββββ
- Reweighting/Resampling: Class balancing, importance sampling, undersampling
- Robust Training: Group DRO, focal loss, distributionally robust optimization
- Regularization: L1/L2 penalties, early stopping, strong regularization
- Domain Generalization: Domain-invariant features, mixup, adversarial training
- Primary: Worst Group Accuracy (WGA) - accuracy of worst-performing demographic group
- Secondary: Overall OOD accuracy, ID-OOD gap, group-specific accuracies
- COMPAS: Recidivism prediction with racial/gender bias analysis
- Diabetes: Hospital readmission with demographic fairness
Upload any CSV with:
- Binary classification target (0/1, True/False, Yes/No)
- Demographic/protected attributes (race, gender, age, etc.)
- Multiple features for prediction
# src/llm_client.py - Configure your preferred provider
PROVIDERS = {
'groq': {'model': 'llama-3.3-70b-versatile', 'fast': True, 'cheap': True},
'openai': {'model': 'gpt-4', 'quality': 'high', 'expensive': True},
'google': {'model': 'gemini-pro', 'multimodal': True}
}# src/strategies.py - Customize training strategies
class StrategyConfig:
name: str # Strategy identifier
class_weight: str # 'balanced' or None
l2_C: float # Regularization strength (0.1-10.0)
sample_frac: float # Data sampling fraction (0.3-1.0)
undersample_majority: bool # Class balancing
reg_strength: str # 'weak'/'normal'/'strong'
use_group_dro: bool # Group Distributional Robust Optimizationprometheus-lab/
βββ src/ # Core application (1,625+ lines Python)
β βββ app.py # Flask web interface
β βββ agent_graph.py # LangGraph multi-agent orchestration
β βββ run_experiment.py # Experiment execution engine
β βββ llm_client.py # LLM integration & API management
β βββ auto_lab.py # Autonomous pipeline orchestration
β βββ auto_analyze.py # Dataset analysis & auto-detection
β βββ strategies.py # Training strategy configurations
β βββ models/baseline.py # Logistic regression model builder
β βββ ... # Additional ML utilities
βββ data/ # Benchmark datasets
βββ experiments/ # Experiment results & logs
βββ uploads/ # User-uploaded CSV files
βββ requirements.txt # Python dependencies
βββ run.py # Development server entry point
βββ Procfile # Production deployment (Render/Heroku)
βββ render.yaml # Render.com deployment config
βββ vercel.json # Vercel deployment config
# Already configured - just connect your GitHub repo
# render.yaml handles the deployment automaticallynpm install -g vercel
vercel --prod# Create Dockerfile (not included - customize as needed)
docker build -t prometheus-lab .
docker run -p 8000:8000 prometheus-labheroku create your-app-name
git push heroku main
heroku config:set GROQ_API_KEY=your_key_here- Fairness in ML: Benchmark new fairness techniques
- Distribution Shift: Study robustness under domain shift
- AutoML: Automated machine learning for fairness
- Healthcare: Fair patient risk assessment
- Finance: Unbiased lending and credit scoring
- Criminal Justice: Fair recidivism prediction
- Hiring: Bias-free recruitment algorithms
- LLM Inference: ~2-5 seconds per strategy (Groq Llama 3.3-70B)
- Model Training: ~10-30 seconds per experiment (scikit-learn)
- Full Optimization: ~5-15 minutes for complete fairness analysis
- COMPAS Dataset: 15-25% improvement in worst-group accuracy
- Diabetes Dataset: 10-20% improvement in cross-domain performance
- Custom Datasets: Varies by data quality and bias severity
- Fork the repository
- Create feature branch:
git checkout -b feature-name - Commit changes:
git commit -am 'Add feature' - Push to branch:
git push origin feature-name - Submit Pull Request
# Install development dependencies
pip install -r requirements-dev.txt # If available
# Run tests
python -m pytest tests/ # If test suite exists
# Code formatting
black src/
isort src/This project is licensed under the MIT License - see the LICENSE file for details.
- LangGraph: Multi-agent orchestration framework
- TableShift: Benchmark datasets for distribution shift
- Groq: Fast and affordable LLM inference
- scikit-learn: Machine learning algorithms and metrics
- Issues: GitHub Issues
- Discussions: GitHub Discussions
- Documentation: See
LLM_SETUP.mdfor detailed LLM configuration
Built with β€οΈ for Fair AI Research
Making machine learning more equitable, one model at a time.