
New Neural Architecture Search method for multi-image data fusion, including code and two image classification datasets for training and validation.
Every day, we collect more than 150TB of publicly accessible Earth Observation (EO) data. Researchers and practitioners develop and deploy Machine Learning (ML) systems to make sense of this wealth of data, for example, to monitor crops or to find rare phenomena like floods or large emissions. Increasingly, researchers adopt Automated Machine Learning (AutoML) techniques to further improve the performance of their machine learning pipeline and overcome challenges like the sheer diversity of GeoAI tasks and datasets. Automated Machine Learning research focuses on automatically designing and configuring high-performance ML systems. But how do we maintain the trustworthiness of these ML-based systems under increased automation?
In this blog, I’ll share how we developed an AutoML approach for detecting large emissions of greenhouse gases in satellite images. We’ll explore the components of our methods, key findings, and key challenges in designing AutoML systems for EO applications.
Need a recap on AutoML for Earth Observation? Read our previous blog series: Part 1, Part 2, Part 3.
Why do we want to detect plumes?
When we think of satellite data, many of us think of Google Maps’ high-resolution images of the Earth’s surface. Optical images help us track human activity like deforestation or the expansion of cities.
However, not all human activity is visible. Methane leaking from gas pipelines is colourless and odourless, but it still impacts global warming. Similarly, carbon monoxide, another colourless and odourless gas, is co-emitted with the better-known carbon dioxide in incomplete combustion. Satellites carrying specialised instruments such as TROPOMI help us estimate and eventually monitor anthropogenic emissions of these gases.
Large emissions from a point source, such as a leaking gas pipeline, show up as plume-like shapes in the data. These plumes are relatively simple visual features that could be detected using computer vision approaches.
However, there’s a catch. ML-based atmospheric plume detection pipelines are susceptible to false positives due to challenges arising from the data, such as missing pixels due to cloud coverage. If not sufficiently addressed, these false positives put the trust in operational ML-based plume detection systems at risk.
Schuit et al. have already addressed this problem in methane plume detection. But what about other atmospheric plumes, like carbon monoxide? We designed AutoMergeNet, an AutoML system for multi-image data fusion with much more general applicability to create deployment-ready atmospheric plume detection pipelines for gases beyond methane.

AutoML-based multi-image data fusion
The main contribution of our method is the design of AutoMergeNet’s search space, which has three components:
- Multi-branch networks: We include additional data layers from the satellite data product to catch potential false positives. For example, cloud masks inform us which pixels are missing due to clouds. Extracting meaningful features from this data is challenging, as these different fields have different value ranges and distributions. To address this, AutoMergeNet creates multi-branch neural networks with identical but independent input branches to extract features from each data layer.
- Trade-off feature extraction and fusion: Data fusion networks work in two stages: feature extraction from the input data sources, and fusion of the multi-source features. AutoMergeNet optimises the balance between these two stages by trading off the depth of each stage.
- Optimise fusion strategy: finally, AutoMergeNet automatically selects the best-performing strategy for fusing the features from the input sources.
With this search space, we automatically create plume detection pipelines for methane detection and carbon monoxide detection.
Multi-branch fusion outperforms more naive approaches
Our results show that our multi-branch fusion approach is significantly more effective than a naive data fusion approach, where different data layers are simply concatenated and fed to the model as a data cube. AutoMergeNet achieves an average accuracy of 94% percent in methane plume detection and 91% percent in carbon monoxide plume detection.
We applied the best methane plume detection model to a realistic use case to evaluate our approach’s potential for operational use. Our model detected a similar number of plumes to the model developed by Schuit et al. (and which is currently in operational use). But something surprising happened: despite the 94% accuracy on the methane plume detection test set, almost 50% of our model’s detections in the operational scenario were false positives. In short, results on the test set are not fully representative of how a model may perform in real life. We are working on a solution for this problem and have already published a follow-up work addressing some of the issues (see here: Mitigating representation bias caused by missing pixels in methane plume detection). We will share more about that project in a future blog.

A personal perspective
This research, carried out in collaboration with Earth Scientists from the SRON Space Research Organisation Netherlands has taught me two main lessons:
- Collaboration between machine learners and domain experts is crucial. As machine learners, we have an internal database of different modelling techniques and experience with evaluating them. But without domain knowledge, the results can be hard to interpret. Domain experts can instantly spot geographic and other patterns, explain sources of noise, and many other details crucial to dataset construction and evaluation.
- Applications are messy projects compared to more fundamental machine learning projects with nice and clean benchmark datasets. First, you’re not only designing a machine learning model, but often also a dataset. Second, I had to learn about and apply many different ML techniques to address different aspects of the problem. However, it’s very rewarding to work towards solving a concrete, real-world problem.
Implications
While our models are not ready for real-world monitoring—we still have work to do in closing the generalisation gap—our work opens up new possibilities:
- Datasets: We collected and labelled a new dataset for carbon monoxide plume detection and made it available on Zenodo. Together with the methane plume data collected and labelled by SRON, we now have two datasets for further development of atmospheric plume detection models.
- Expansion of applicability domain: AutoMergeNet’s automated approach, though only evaluated in atmospheric plume detection, could also be applied to other multi-image problems. For example, oil spill detection faces similar challenges with false positives.
Automatically creating deployable pipelines?
Researchers and practitioners across Machine Learning for Earth Observation face the problem of the generalisation gap. It’s difficult to address because there are so many potential causes for performance gaps between our fully labelled test sets and data encountered in the wild. Data can suffer from biases, inherent (such as spatio-temporal autocorrelation) or self-inflicted (representation or sampling bias, e.g. how we have more data of the Global North than the Global South).
For AutoML, the problem of the generalisation gap is even more pressing. When we manually develop models, we have many windows into a model’s performance: learning curves, summary statistics, xAI techniques such as attention maps, etc. The AutoML systems we have now have a much narrower view of a pipeline’s performance: often only a single summary statistic like the loss.
Our results have shown that those results are not always enough to predict how well a model will work in real life. Moving forward, we want to continue collaborating with domain experts to design better evaluation procedures and datasets that will help us close this generalisation gap.
Read the full study in IEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing or visit our project pages to access our code and datasets.
About me
Julia Wąsala is a PhD candidate studying Automated Machine Learning for Earth Observation at the ADA Research group at Leiden University and working with atmospheric scientists at the SRON Space Research Organisation Netherlands.

























