- Current F1 Score: 0.23 (epoch 60, still training)
- Baseline: Random = 0.067, so 3.43x improvement
- Architecture: EfficientNet-B0 with 2-layer classifier
- Dataset: NIH Chest X-ray 14, ~10k samples
# EfficientNet-B0 backbone
# Custom classifier: 1280 -> 512 -> 15 outputs
# No sigmoid (BCEWithLogitsLoss handles it)
# Dropout: 0.3, 0.2# ResNet18 backbone
# Direct FC: 512 -> 15 outputs
# Used for balanced training experiments15 Classes:
Atelectasis, Cardiomegaly, Consolidation, Edema, Effusion,
Emphysema, Fibrosis, Hernia, Infiltration, Mass, Nodule,
Pleural Thickening, Pneumonia, Pneumothorax, No Finding
Problem: Severe class imbalance
- "No Finding": ~60% of samples
- "Hernia": ~0.2% of samples
Solution: Balanced sampling in BalancedNIHDataset
- Limits "No Finding" to 25% of training data
- Ensures minimum samples per pathology class
# Input: 224x224 RGB (grayscale X-rays converted)
# Augmentation: RandomCrop, HorizontalFlip, Rotation(5°), ColorJitter
# Normalization: ImageNet stats [0.485, 0.456, 0.406], [0.229, 0.224, 0.225]# Loss: BCEWithLogitsLoss (multi-label)
# Optimizer: AdamW, lr=0.001, weight_decay=1e-4
# Scheduler: ReduceLROnPlateau(patience=3, factor=0.5)
# Batch size: 16
# Gradient clipping: max_norm=1.0Per-condition optimized thresholds (not standard 0.5):
'Pneumothorax': 0.25, # Critical condition
'Mass': 0.20, # Cancer screening
'Pneumonia': 0.22,
'Atelectasis': 0.18,
'Hernia': 0.50, # Rare condition
'No Finding': 0.60 # High threshold to reduce false normalsdef predict(self, image_path: str) -> Dict[str, float]:
# Load image -> RGB conversion -> resize(224,224)
# Forward pass -> sigmoid(logits) -> numpy
# Return dict of condition:probabilityclass BalancedNIHDataset:
# Separate "No Finding" from pathology samples
# Limit pathology samples per condition
# Shuffle and create balanced final datasetbackend/utils/lung_classifier.py # Main model definition and inference
balanced_lung_trainer.py # Balanced training pipeline
train_optimized.py # Full training with metrics tracking
Macro F1: 0.23
- Training trend: Consistent improvement over 60 epochs
- Better performing classes: Cardiomegaly (~0.35), Pneumonia (~0.28)
- Challenging classes: Hernia, Fibrosis (limited training data)
Training stability: Loss decreasing, F1 improving consistently
- Double sigmoid issue: Fixed BCEWithLogitsLoss + model architecture mismatch
- Class imbalance: Implemented balanced sampling strategy
- 14 vs 15 class mismatch: Unified architecture to handle all LABELS
- Dataset inconsistency: Standardized on NIH Chest X-ray 14
NIH Chest X-ray 14 Literature:
- Basic CNN: F1 = 0.15-0.20
- ResNet/DenseNet: F1 = 0.20-0.30 ← Current range
- Ensemble methods: F1 = 0.30-0.40
- SOTA research: F1 = 0.40+
torch==2.7.1
torchvision==0.22.1
datasets==3.6.0
scikit-learn==1.7.0
Pillow==11.2.1
# Load model
classifier = LungClassifierTrainer('path/to/model.pth')
# Inference
predictions = classifier.predict('xray.jpg')
# Returns: {'Pneumonia': 0.78, 'Effusion': 0.23, ...}
# Apply thresholds
significance = classifier.get_statistical_significance(predictions)# Balanced training (handles class imbalance)
python balanced_lung_trainer.py
# Full training pipeline
python train_optimized.py --epochs 50 --batch-size 16- Best model:
models/lung_classifier_BEST.pth - Training history:
models/training_history_optimized.json - Balanced model:
balanced_lung_model.pth