When Pattern-by-Pattern Works: Theoretical and Empirical Insights for Logistic Models with Missing Values

Christophe Muller; Erwan Scornet; Julie Josse

Preprints, Working Papers, ... Year : 2026

When Pattern-by-Pattern Works: Theoretical and Empirical Insights for Logistic Models with Missing Values

(1) , (2) , (3)

1
2
3

Christophe Muller

Function : Author
PersonId : 1555694
IdHAL : christophe-muller
ORCID : 0009-0004-9481-5872

Department of Statistics, University of Oxford, Oxford, United Kingdom

Erwan Scornet

Function : Author

LPSM (UMR_8001) - Laboratoire de Probabilités, Statistique et Modélisation

Julie Josse

Function : Author
PersonId : 754255
IdHAL : julie-josse
ORCID : 0000-0001-9547-891X
IdRef : 22345897X

PREMEDICAL - Médecine de précision par intégration de données et inférence causale

Abstract

Predicting with missing inputs challenges even parametric models, as parameter estimation alone is insufficient for prediction on incomplete data. While several works study prediction in linear models, we focus on logistic models, where optimal predictors lack closed-form expressions. We prove that a Pattern-by-Pattern strategy (PbP), which learns one logistic model per missingness pattern, accurately approximates Bayes probabilities under a Gaussian Pattern Mixture Model (GPMM). Crucially, this result holds across standard missing data scenarios (MCAR and MAR) and, notably, in Missing Not at Random (MNAR) settings where standard methods often fail. Empirically, we compare PbP against imputation and EM methods across classification, probability estimation, calibration, and inference. Our analysis provides a comprehensive view of logistic regression with missing values. It reveals that mean imputation can be used as baseline for low sample sizes and PbP for large sample sizes, as both methods are fast to train and may have good performances in some settings. The best performances are achieved by non-linear multiple iterative imputation techniques that include the response label (Random Forest MICE with response), which are more computationally expensive.

Keywords

Domains

Machine Learning [stat.ML]

Fichier principal

main_HAL.pdf (1.55 Mo)

Origin	Files produced by the author(s)
Licence	HAL Authorization

Connect in order to contact the contributor

https://hal.science/hal-05150753

Submitted on : Monday, February 2, 2026-10:47:10 AM

Last modification on : Thursday, February 5, 2026-3:18:15 AM

Dates and versions

hal-05150753 , version 1 (08-07-2025)

hal-05150753 , version 2 (10-07-2025)

hal-05150753 , version 3 (11-09-2025)

hal-05150753 , version 4 (02-02-2026)

Licence

HAL Authorization

Identifiers

HAL Id : hal-05150753 , version 4
ARXIV : 2507.13024

Cite

Christophe Muller, Erwan Scornet, Julie Josse. When Pattern-by-Pattern Works: Theoretical and Empirical Insights for Logistic Models with Missing Values. 2026. ⟨hal-05150753v4⟩

When Pattern-by-Pattern Works: Theoretical and Empirical Insights for Logistic Models with Missing Values

Abstract

Keywords

Domains

Dates and versions

Licence

Identifiers

Cite

Export

Collections

Altmetric

Share