Monthly Archives: March 2026

“Nothing about us without us”: reflections from the CODATA Task Group on Citizen-Generated Data for the SDGs

At IDW2025, a group of speakers from around the world spoke on ‘Bridging Data Gaps with Citizen Science for People and Policy‘. Although communities have long been studied within research and had data collected about them, there is growing recognition that communities should have a voice in the data produced on them and the policies made downstream. Carolynne Hultquist, Co-Chair of the CODATA Task Group, set the stage on challenges and opportunities of incorporating Citizen Science in the United Nations Sustainable Development Goals (SDGs) and the global movement towards meaningful engagement in citizen data. Countries around the world are recognizing the value of involving communities within the official statistical process and learning approaches to address concerns on data quality, standards, and ethics in this new paradigm.

If you are interested in finding out more, come and join the session on Citizen Science, SDGs and FAIR data at the RDA Virtual Plenary on Thursday 19th March 2026 (07:00 – 08:30 UTC)!

Citizen-generated data for progress on the SDGs

The Copenhagen Framework on Citizen Data and its implementation – Haoyi Chen, United Nations Statistics Division (UNSD)

“Nothing about us without us.” The UNSD Collaborative on Citizen Data aims to empower citizens and turn them into agents of change. The impacts of empowering communities opens dialogue with public institutions, respects marginalized voices, and expands the power of data production to citizens. Citizen contributions to data are increasingly recognized as critical to societal wellbeing and in support of the ‘leave no one behind’ principles of the 2030 Agenda for Sustainable Development

The Copenhagen Framework on Citizen Data has been developed to address challenges in defining citizen data, roles that citizen and national statistical offices can play in data processes, and action points for the sustainable production and use of citizen data. Key responsibilities of National Statistical Offices include supporting on quality standards/methodologies, building capacity and fostering partnerships, raising awareness on the potential of citizen data, as well as promoting its integration into official statistics. Ensuring the voices of communities are heard can help to address intersectional marginalization, hold institutions accountable, and ensure that data remain relevant and impactful.

Citizen science/citizen-generated data towards inclusive impact at local and global level – Maryam Rabbie, Sustainable Development Solutions Network (SDSN)

SDG 4 aims to ensure inclusive, equitable, quality education for all, yet access remains a major challenge. UNESCO identifies distance as a major access barrier for many primary and secondary learners, with the IIEP Education Policy Toolkit noting that schools should ideally be within 3 kilometers of children’s homes. Through My School Today, SDGs Today demonstrates the transformative role of citizen science in shaping policy. The initiative engages students, communities, governments, and other stakeholders in geo-referencing schools and education facilities across Africa, contributing to a living, up-to-date map of school locations. The initiative collaborates with education ministries and national statistical offices to complement and strengthen official data sources, demonstrating how citizen-generated data can help bridge information gaps and create more responsive, evidence-based education policies.

Regional landscapes of citizen-generated data

The session was made up of short talks from Africa, Asia, Oceania and Latin America on the use of citizen data for monitoring the SDGs. Some countries are leading efforts to prioritize inclusive community participation in monitoring through intentional engagement and subsequent civic outcomes with action to support progress.

Latin America & the Caribbean – Amanda Mayte Vilchez (Cornell University, USA) & Karen Soacha-Godoy (EMBIMOS Research Group (ICM-CSIC, Spain) | (Iberoamerican Participatory Science Network (RICAP)

In the Latin America and Caribbean region, data generated through participatory science projects have informed, either directly or indirectly, indicators related to twelve of the seventeen SDGs, highlighting both the diversity of data produced and the strong potential of these initiatives to help address critical data gaps. . Among these, SDG 15 (Life on Land) was the most frequently informed, reflecting the strong presence of biodiversity-focused participatory studies. SDG 6 (Clean Water and Sanitation), SDG 14 (Life Below Water), SDG 3 (Good Health and Well-being), and SDG 5 (Gender Equality) are all well represented across multiple indicators.

A central critique of the data used to inform SDG indicators is the underrepresentation of minority groups, who often remain invisible in national-level statistics. In the Latin America and Caribbean region approximately 75% of the mapped initiatives involved vulnerable and historically marginalized populations. Among the most frequent communities involved were rural (27%), Indigenous (16%), farmers (12%), and fishing communities (5%), as well as women, youth, Afro-descendant populations, and older adults. The particular attention given to marginalized communities by participatory initiatives reveals citizen-generated data capacity to address this challenge in data production for SDG monitoring in the region. 

Finally, it is important to highlight the initiatives designed to address community information needs. Among the initiatives analyzed, 42% generate action-oriented data, meaning they are conceived not only to produce information but also to ensure that this data is relevant, responsive, and capable of supporting tangible, real-world change within communities in the region. Producing information with them and for them.

Asia – Yaqian Wu (University College London, UK)

Priorities in Asia focus on SDGs, SDG 6 on water quality monitoring and access to water resources, SDG 3 with a focus on air pollution and health impact assessment, SDG 11 for urban planning and informal settlements, SDG 13 on climate action and disaster response, and SDG 16 on governance accountability. 

There are challenges with data standardisation, representativeness and inclusiveness, institutional absorption paths, and financial sustainability. Regional efforts could support Asian countries to formally embed citizen data in official SDG indicator reporting.

Oceania – Carolynne Hultquist (University of Canterbury, NZ)

Projects in the region have a strong environmental focus, especially related to capturing large-scale negative changes alongside human impact. Priority areas involve SDG 11 Sustainable Cities and Communities on 11.5 Reduce the adverse effects of disasters and 11.6 Reduce the environmental impact of cities; SDG 13.3 Climate Action with emphasis on build knowledge and capacity;  SDG 14 Life Below Water on 14.1 – Reduce Marine Pollution: marine litter and 14.8 to Increase scientific knowledge, research, and technology for ocean health and marine wildlife monitoring; and SDG 15 Life on Land on 15.2 end deforestation and restore Degraded Forests and 15.8 to prevent invasive alien species on land and water ecosystems.

Communities in the region grapple with issues on managing data ethically with appropriate cultural considerations. One of these considerations is data sovereignty, particularly for indigenous groups, as a principle of maintaining control, ownership, and usage of data. Many communities are concerned about potential misuse.

Africa – Kehinde Baruwa, & Peter Elias (Co-Chair CODATA Task Group; University of Lagos, Nigeria) & Oluwatimilehin Adenike Shonowo (University of Glasgow)

Across Africa, communities are increasingly generating their own data to fill gaps in official statistics and support local decision-making. An earlier study mapped 53 citizen science initiatives showing the growing role of participatory data in sustainable development. A more recent study examines additional initiatives advancing SDGs 5, 6, 11, 13, and 15 across Kenya, Nigeria, Cameroon, South Africa, and Tanzania.

These initiatives engage youth, women, and residents of informal settlements and rural areas to monitor issues such as urban environments, water quality, gender and health barriers, climate resilience, and ecosystem restoration. While they generate valuable data for advocacy and planning, challenges remain around validation, institutional collaboration, and long-term sustainability. Strengthening partnerships with National Statistical Offices (NSOs) and academic institutions is therefore key to integrating citizen-generated data into national decision-making and SDG reporting.

How Citizen Science is Shaping Progress for SDGs Oluwatimilehin Adenike Shonowo presented on behalf of Dilek Fraisl (Senior Research Scholar, IIASA & Managing Director, CSGP)

Ghana has become the first country to integrate existing citizen science data on marine plastic litter into their official statistics, as well as SDG monitoring and reporting. The results have been used in Ghana’s Voluntary National Review of the SDGs, and reported on the UN SDG Global Database for SDG 14.1.1 on Marine Litter. The results are also informing the integrated coastal and marine management policy in Ghana, currently under development. The initiative has helped to bridge local data collection efforts with global monitoring processes and policy agendas by leveraging the SDG framework. 

“Nothing about us without us”

A motivation of organisations in implementing the Copenhagen Framework on Citizen Data is to highlight the voice of communities that are often left out or left behind. There is recognition of the importance of representation in reporting, especially on marginalized and vulnerable populations. Some countries are leading efforts to prioritize inclusive community participation in monitoring through intentional engagement and subsequent civic outcomes with action to support progress on the SDGs.


Our CODATA Task Group supports this global movement toward meaningful citizen engagement in data. There are local and regional needs which need to be addressed differently in some cases, but there is also a lot of commonality. We have a lot to learn from each other.

In the 2025-2027 iteration of our Task Group we are providing guidance to work between data and policy frameworks to further WorldFAIR+ with the Cross-Domain Interoperability Framework (CDIF) approach in the context of citizen data in alignment with CARE principles and the Copenhagen Framework. We aim to make progress towards interdisciplinary standards for citizen data and metadata across scales to have actionable globally comparable data for the SDGs. The Task Group is partnering with the Citizen Science Global Partnership (CSGP) Air Quality Community of Practice (CoP) for a case study for SDG 11.6.2: Annual mean levels of fine particulate matter (e.g., PM2.5 and PM10) in cities (population weighted). We are committed to using our networks to continue to highlight and promote the use of citizen data to make progress on the SDGs.

Bridging the Glacier Finance Gap in the Decade of Cryospheric Sciences

Bapon Fakhruddin, CODATA TG-FAIR DRR and Shaily Gandhi, IT:U Interdisciplinary Transformation University Austria

Image

On March 3, 2026, the FAIR Data for Disaster Risk Research working group of CODATA convened a webinar titled “Glacier Adaptation and Financing,” bringing together leading experts to address the accelerating retreat of glaciers, the implications for water security and disaster risk, and the persistent gap in climate finance. Moderated by Dr. Shaily Gandhi, the panel featured Dr. Anil Mishra (UNESCO), Dr. Miriam Jackson (Norwegian Water Resources and Energy Directorate), Dr. Dhiraj Pradhananga (Tribhuvan University, Nepal), and Dr. Bapon Fakhruddin (Green Climate Fund). The discussion emphasized the urgency of translating scientific knowledge into institutional action and financial investment.

Follow these links to consult the slides presented and the the webinar recording.

Accelerating glacier loss and its implications

Recent assessments underscore the rapid pace of glacier melt. The Glacier Mass Balance Intercomparison Exercise (GlaMBIE) team (2025) reported that between 2000 and 2023, glaciers globally lost an average of 273 ± 16 gigatonnes of ice annually, with a 36% acceleration in the latter half of the period. This cumulative loss of 6,542 gigatonnes contributed approximately 18 millimeters to global sea-level rise. Projections by Rounce et al. (2023) suggest that even under a 1.5°C warming scenario, global glacier mass could decline by 26 ± 6% by 2100, increasing to 41 ± 11% under a 4°C scenario.

In the Hindu Kush Himalaya (HKH) region, these global trends are mirrored by local observations. Dr. Pradhananga highlighted that snowpacks are thinning, springs are drying, and rainfall is increasingly replacing snowfall. These changes threaten the freshwater supply for billions of people who depend on glacier-fed river systems.

Glacial lake outburst floods: a growing hazard

Glacier retreat contributes to the formation of unstable glacial lakes, increasing the risk of glacial lake outburst floods (GLOFs). A study published in Nature Communications estimated that 15 million people globally live under threat from GLOFs, with more than half residing in India, Pakistan, Peru, and China (Carrivick et al., 2023). The 2023 South Lhonak Lake disaster in Sikkim, India, exemplified this risk. A cloudburst triggered a GLOF that destroyed infrastructure and resulted in at least 14 fatalities and over 100 missing persons (NDTV, 2023). Although the lake had been previously identified as high-risk (Sattar et al., 2021), early warning systems were not fully operational at the time of the event. This incident illustrates the critical need for governance systems that can act on scientific data.

The climate finance gap

Despite the clear risks, mountain regions receive only 3% of global climate finance, and less than 1% of adaptation funding is allocated to glacier protection (Fakhruddin, 2025). The United Nations Environment Programme (UNEP, 2023) estimated the global adaptation finance gap at $194–366 billion annually. The lack of private capital in glacier adaptation is attributed to the absence of direct financial returns, despite the essential role glaciers play in water security, energy production, and food systems.

Financing models and the role of the Green Climate Fund

The Green Climate Fund (GCF) has pioneered innovative financing models to address these challenges. Its approach includes de-risking mechanisms, capital mobilization, bankable project structures, and tailored financing models such as pay-for-success and impact investing strategies (Fakhruddin, 2025). The diagram below illustrates the GCF’s financing framework.

Image

 

Opportunities for Crowd-Sourced and Innovative Financing

Crowd-sourced financing presents a complementary avenue for glacier adaptation. Fakhruddin (2025) proposed models where private capital is tied to successful glacier restoration outcomes. These could include glacier bonds, community investment funds, and climate crowdfunding platforms. While such mechanisms cannot replace large-scale public and multilateral funding, they can raise awareness and engage broader constituencies.

Recommendations for the Decade of Cryospheric Sciences

The panel concluded with a set of actionable recommendations:

  • Expand interoperable glacier and hydrological monitoring networks, including community-based systems.
  • Integrate cryosphere data into national water security and disaster risk management frameworks.
  • Establish legally binding early warning systems for GLOFs and related hazards.
  • Increase adaptation finance for glacier and mountain regions through dedicated funding windows and blended finance.
  • Promote public-private partnerships and innovative financial instruments such as resilience bonds and glacier adaptation funds.
  • Build local capacity through training, community engagement, and integration of traditional knowledge.
  • Use the Decade of Cryospheric Sciences (2025–2034) to set measurable targets and track progress.

Conclusion

The science of glacier change is unequivocal. The challenge lies in aligning institutional action and financial flows with this knowledge. The Decade of Cryospheric Sciences offers a critical window to bridge this gap. Every degree of warming and every dollar invested will shape the future of the world’s glaciers and the communities that depend on them.

 

References

Carrivick, J. L., Tweed, F. S., et al. (2023). Fifteen million people at risk of glacial lake outburst floods. Nature Communications, 14, 487. https://doi.org/10.1038/s41467-023-36033-x

Fakhruddin, B. (2025, April 18). Saving the cryosphere requires innovative financing. Green Climate Fund. https://www.greenclimate.fund/insights/saving-cryosphere-requires-innovative-financing

GlaMBIE Team. (2025). Community estimate of global glacier mass changes from 2000 to 2023. Nature, 639, 382–388. https://doi.org/10.1038/s41586-024-08545-z

NDTV. (2023, October 5). 14 dead, 102 missing in Sikkim flash flood. https://www.ndtv.com/india-news/10-dead-82-missing-14-bridges-collapsed-in-sikkim-flash-flood-4450410

Rounce, D. R., Hock, R., Maussion, F., Hugonnet, R., Kochtitzky, W., Huss, M., Berthier, E., Brinkerhoff, D., Compagno, L., Copland, L., Farinotti, D., Menounos, B., & McNabb, R. W. (2023). Global glacier change in the 21st century: Every increase in temperature matters. Science, 379(6627), 78–83. https://doi.org/10.1126/science.abo1324

Sattar, A., Allen, S., Frey, H., Huggel, C., & Mergili, M. (2021). Modeling glacial lake outburst flood process chains in Sikkim Himalaya: Hazard assessment of two potentially dangerous lakes. EGU General Assembly 2021. https://doi.org/10.5194/egusphere-egu21-10838

United Nations Environment Programme. (2023). Adaptation Gap Report 2023: Underfinanced. Underprepared. https://www.unep.org/resources/adaptation-gap-report-2023

 

Understanding contemporary digital preservation practice: the EOSC EDEN project reports survey findings

By Laura Molloy, CODATA Research Lead

With rising threats to the existence of essential data resources, and mendacious contesting of the historical record, the current moment clearly demonstrates the critical role of high-quality digital preservation practitioners, skills and services. Digital preservation is a complex and diverse profession, often underfunded and sometimes misunderstood. It is important that we understand the current digital preservation landscape as well as possible, in order to support those working around the world in the preservation professions and to provide project outputs that will be of relevance and value to them. Accordingly, CODATA is delighted to be a participant in the European Open Science Cloud (EOSC) project, ‘Enhancing Digital preservation strategies at European and National level’ (EDEN).

Image

EDEN has published the results of a survey which was recently conducted to gather information from the digital preservation community worldwide. Our survey was specifically about the guidance to which preservation practitioners refer, and practices they use, when identifying, selecting, and appraising digital data objects for ‘long-term’ preservation.

This blog post provides an informal overview of how we went about the survey, and what we discovered. We hope this will be of interest to those working in contemporary digital preservation, including managers, practitioners, and those responsible for policy making and training within memory organisations. If you would like further detail on any aspect of this work, the full report can be downloaded at https://doi.org/10.5281/zenodo.17984753.

About the EOSC EDEN project

The EOSC EDEN project, funded by the European Commission [1], seeks to enhance digital preservation strategies at European and national levels. The project is creating a framework to identify what data are candidates for digital preservation. This involves setting standards and protocols for long-term data preservation, which will be determined through an assessment of data usage, quality, and the data’s benefits to science and society.

In addition to the framework, the EOSC EDEN project aims to develop a model for re-appraisal of data throughout its lifecycle. The model for re-appraisal will support the framework for digital preservation by ensuring that preservation efforts remain relevant over time.

The survey activity was led by Laura Molloy, CODATA research lead, who is leading EDEN Task 1.1, ‘Landscape analysis of existing frameworks, guidelines and practices for identification, selection and appraisal of data for long-term preservation’. This task contributes the majority of the landscaping activity in the project. Laura is a qualitative social science researcher by training, with experience in a number of digital preservation projects and initiatives, and has a track record in research and consultancy relating to digital decision-making and information behaviours in varied professional settings. Analytical power was added by other members of the EDEN task team, including work package leader and digital preservation expert Micky Lindlar, and quantitive analyst Maria Benauer, both of Technische Informationsbibliothek (TIB).

Survey design

Understanding contemporary digital preservation includes direct contact with as many current practitioners as possible to understand their real practices—and the reference materials that inform those practices. We also need to build communication with those working in preservation across different types of organisation and in different countries. Accordingly, the EOSC EDEN 2025 survey was carefully designed to be simple to interact with, and to make sense to digital preservation professionals across organisation types, staff levels, geographical locations, and any or no discipline focus [2].

Survey questions were arranged into four main sections:

  1. About your organisation and role;

  2. About frameworks and guidelines for identification, appraisal and selection of data for long-term digital preservation;

  3. About current practices in identification, appraisal and selection of data for long-term digital preservation;

  4. Discipline-specific requirements for long-term preservation of digital objects.

The survey ended with one further short section gathering voluntary contact details, to enable the identification of candidates for any follow-up inquiry.

The questions were a mixture of closed and open questions, i.e. those that can be answered by choosing yes or no (closed questions) or those that require a more discursive, free-text answer to be generated by the respondent (open questions). Accordingly, a mixture of qualitative and quantitative analysis was performed by the task team.

Survey respondents

We received 250 valid responses from 31 states/nations [3]. The majority of responses were from Western Europe, followed by North America, despite focused activities undertaken by the task team to solicit a more evenly-distributed global response.

The size of respondents’ organisations was approximately evenly distributed across micro/small, medium, and large sized organisations [4], each with around a third of the responses. In terms of staff level within the organisation, around two-thirds of respondents were practitioners; just under a third of respondents identified as middle management and a few identified as senior management. We received responses from eighteen organisation types, which we coded into nine wider groupings called ‘organisation classes’, as follows: Academic publisher, Archive, Digital preservation service, Library, Multifunctional, Museum or gallery, Repository, Research performing organisation, Research infrastructure, plus Other/unassigned. The most populous class was ‘archive’ with 67 responses; the least populous was ‘academic publisher’ with one.

Selected findings

There are a few selected findings that were of interest to the task team, and offer some food for thought. These are briefly set out here.

‘Long-term’ preservation

Firstly, the project itself—as well as its subsidiary work packages and tasks—frequently uses the phrase, “long-term preservation”. We were interested to note that this emerges from the data as an unstable concept. One of the most striking findings was the high proportion of respondents who are working at an organisation where there is no agreed or working definition of ‘long-term’ in the digital preservation context. Even those respondents who did have an agreed or working definition of ‘long-term’ offered a wide range of numerical definitions of what that means for them and their preservation work.

Quality checking behaviours

We were interested in investigating two sets of quality checks: quality checks upon ingest and subsequent quality checks throughout the data preservation period. We asked various questions about if and how exactly these checks are carried out. We found that a majority of respondents do carry out quality checks of various kinds upon ingest but that this drops dramatically when we examine the occurrence of subsequent quality checks throughout the preservation period. This is a complex area for analysis and we would like to investigate more through some follow-up interviewing during 2026.

Commonalities with FAIR data

We note with interest the existing connections indicated by respondents, between the
digital preservation realm and the set of ideas currently designated ‘FAIR data’. These connections appeared in two different places in the survey responses.

First, respondents were asked about their usual preservation period; that is to say, the length of time that the organisation usually initially commits to holding and maintaining a preservation copy of a given data object. Here, respondents introduced a recurring—and pretty passionate—discussion about the importance of maintaining findability and access, whatever the agreed preservation period; and we noted that maintenance of findability and access was a much more important issue for many respondents than the existence of any shared agreement about the length of the preservation period.

Second, we provided respondents with a list of frameworks, standards and guidelines that we had gathered from desk research and professional experience. These were presented as likely reference resources for practitioners when they were working on identification, selection and appraisal of digital data objects in their day-to-day work. We asked respondents to indicate whether they were aware of each document and/or used it in their preservation work. The FAIR Guiding Principles was one of these documents. Respondents reported a high level of awareness and use of the FAIR principles (ranking 4th of 15 options). This reminds us that some of the ideas now encapsulated in the FAIR principles have been, to some extent, bedrocks of preservation practice for years, and suggests that digital preservation practitioners are aware of recent events in the FAIR data movement. (It is worth noting, however, that there is no similar visibility at this time of the TRUST or CARE principles within the responses from our participants.)

Needs of designated communities/threats to FAIRness of data over time

We asked a question about the extent to which the respondent understands any unmet needs of their organisation’s designated community [5]. Elsewhere in the survey, we also asked a question on the respondent’s view of threats to the ‘FAIRness’ of their preserved data over time. Some common themes emerged from the responses to these two questions. This suggests that these common themes may be issues of cross-cutting importance for the digital preservation practitioner community.

The most frequently highlighted issues here were: issues around sensitive / protected data; the challenges of data volume; and issues around access provision. Two of the top three designated community needs—data volume and access issues—recur in the top answers around threats to FAIR over time. Sensitive data issues were flagged in three responses, and the other designated community needs—long-term provision of service; lack of useful policy/directive; software preservation; provenance issues and various format problems—also all recur at low rates in the threats to FAIR over time. This is not particularly surprising as these are clearly frequently experienced challenges in the practice of preservation. But it is interesting to see that they are considered by respondents both from the perspective of directly meeting the needs of the community i.e. user-centred approaches, and also the arguably more theoretical perspective introduced when considering keeping digital data objects FAIR. Ultimately, though, FAIR data are data that meet user needs. It is a useful piece of validation that these themes recur in the responses to these two questions.

To conclude…

The EDEN task team is delighted by the response to the survey and thanks all participants.

Next steps within the task include some follow-up interviewing with consenting respondents to further explore the relationship between different information behaviours: for example, how quality checking is monitored; whether designated community needs are monitored and if so whether this impacts preservation activity; the role of data policy; and the role of organisational acquisition strategy. This work will be reported upon by the end of 2026.

In addition, certain findings from this enquiry are potentially useful for future work by CODATA, specifically the upcoming EU-funded project, ‘Developing and Implementing the Cross-Domain Interoperability Framework for EOSC’ (CDIF4EOSC), and the CODATA Task Group on Research Data Quality Management.

A full breakdown of data analysis and the findings we have heretofore identified is beyond the scope of this blog post, and can be found in the full report which is freely available online at https://doi.org/10.5281/zenodo.17984753. Any questions or feedback can be directed to the task leader at laura @ codata.org. For more information about the EOSC EDEN project please visit the project website, https://eden-fidelis.eu/.

 

[1] EDEN has received funding from the EU’s Horizon Europe research and innovation
programme under Grant Agreement no. 101188015.

[2] Although we note the use of English as the primary language of the survey may have been a limiting factor for some potential respondents.

[3] As defined by the United Nations member states available at the time of survey publication (May 2025).

[4] As defined by the European Commission.

[5] “Designated community” is defined in the EDEN Milestone 1.1 report (https://doi.org/10.5281/zenodo.16992452), based upon the OAIS definition (http://www.oais.info/), as: “A group of users, now or in the future, who can understand and use the Objects preserved. The designated community is whom the Objects are preserved for. It can be made of several user communities and the definition can change over time.”

Sustaining Research Data Capacity: Reflections from a CODATA Journey (2017–2025)

Felix Emeka Anyiam (Initial Co-Lead CODATA Connect 2019-2024)

In this post, Felix Emeka Anyiam, who was Initial Co-Lead of CODATA Connect, our Early Career Researcher initiative, from 2019-2024, reflects on his experiences over eight years of participating in CODATA activities.  In particular, he emphasizes the benefits of sustained collaborations and connections: “long-term, networked training matters more than one-off workshops” and praises the CODATA Connect and CODATA Data Schools model which allowed students to return in more responsible, leadership roles.  Felix’s story shows the CODATA Connect provided an environment and collaborations that benefited Felix in this journey. But it also shows how Felix’s open and generous character, his enthusiasm to participate, brought rewards. Please enjoy this uplifting story!  Simon HODSON, Executive Director, CODATA.

Welcome to Trieste: how it started, 2017

In August 2017, at the International Centre for Theoretical Physics (ICTP) in Trieste, Italy, I encountered research data science not merely as a set of analytical tools, but as a global public good. I arrived as a public health researcher from Nigeria, trained in epidemiology and biostatistics, seeking stronger quantitative approaches to interrogate health systems data. I left with something more enduring: an entry point into a global ecosystem shaped by CODATA’s commitment to open science, equity, and long-term capacity building.

Image

The CODATA-RDA Research Data Science Summer School in Trieste offered more than technical instruction. It introduced a way of thinking about data, FAIR by design, ethically governed, and shared across disciplines and borders. Participants from low- and middle-income countries (LMICs) were not positioned as beneficiaries, but as peers and future contributors. CODATA functioned not as a sponsor, but as a convenor of people, ideas, and responsibility. That distinction would shape my professional trajectory in the years that followed.

Continuity as capacity: returning, deepening, expanding (2017–2018)

One year later, in August 2018, I returned to ICTP for the Climate Data Science Advanced Workshop, again under the CODATA-ICTP collaboration, with Clement Onime and Simon Hodson among the local organisers. This second invitation proved pivotal. It reinforced the idea that capacity building is most effective when it is iterative and cumulative, allowing participants to deepen expertise, cross disciplinary boundaries, and apply learning to new problem domains.

Image

 

The 2018 programme expanded my analytical perspective beyond health to climate systems, environmental data, and computational modelling. Skills that later proved essential for interdisciplinary work at the intersection of climate, urban systems, and public health. More importantly, it signalled something fundamental about the CODATA model: participation was not episodic. There was an intentional pathway for return, growth, and contribution.

From Participant to Contributor: Teaching, Networks, and Leadership (2018–2025)

Following my initial training through the CODATA-RDA Research Data Science programmes in Trieste, the relationships established during those early years began to translate into sustained international collaboration. It was through these engagements that the foundations were laid for the first Urban Data Science Summer School in 2018, hosted by the Summer–Winter School at CEPT University, Ahmedabad, India, in collaboration with CEPT Faculty at the time, Dr Shaily Gandhi, marking an important expansion of CODATA-enabled capacity building beyond the initial training context. My role as a co-instructor extended this work to undergraduate and postgraduate cohorts.

Building on this momentum, the programme evolved into a more structured and geographically diverse initiative. The second edition of the Urban Data Science Summer School took place from 13 to 23 May 2019 (https://shailygandhi.github.io/UrbanDataScience2019/). These successive schools reflected not only the maturation of an academic programme, but also the strength of the collaborative networks that had emerged from CODATA’s training ecosystem, and networks sustained through shared curriculum development, co-teaching, and long-term professional exchange since those early connections in Trieste.

Image

That same year, I was appointed inaugural co-lead of CODATA Connect, the organisation’s Early Career and Alumni Network, a role I held from 2019 to 2024 (https://codata.org/initiatives/data-skills/codata-connect/members/). CODATA Connect was established to address a persistent gap in global training initiatives: what happens after the workshop concludes. Rather than allowing capacity gains to dissipate, the network was designed as a continuity mechanism, enabling early-career researchers to remain engaged, visible, and supported within the wider CODATA ecosystem.

Working collaboratively with co-leads and core members from India, Costa Rica, Europe, Africa, Asia, Australia, and Latin America, CODATA Connect evolved into a distributed, peer-led platform for sustained skills development and exchange. Together, we coordinated a series of research skills webinars, thematic workshops, and podcast series that translated FAIR data principles, reproducibility, and ethical data stewardship into applied, domain-specific contexts. These activities included structured webinar series on research skills and reproducibility, smart and resilient cities, and open data practices, as well as hands-on technical workshops, such as training on distributed computing using Spark with R, explicitly targeted at early-career researchers in resource-constrained settings.

In parallel, CODATA Connect supported the development of cross-institutional podcast series, including Data for Resilient Cities, Data–Knowledge–Action for Urban Systems, Data for Disaster Risk Reduction, and Open GeoAI, which brought together researchers, practitioners, and policy actors to explore how open data, geospatial analytics, and AI can inform urban resilience, disaster risk reduction, health, and sustainable development. These initiatives not only expanded the reach of CODATA’s data-skills agenda but also created durable knowledge artefacts that continue to serve as learning resources beyond the immediate training context.

Throughout this period, my own contributions were embedded within this collective effort alongside colleagues such as Shaily R. Gandhi (Initial Lead-India), Mariana Cubero-Corella (Costa Rica), Anup Kumar Das (India), Neema Sumari (Tanzania), Kishore Sivakumar (Netherlands), Adenike Shonowo (Nigeria), Jacqueline Stephens (Australia), Jaime Rugeles (Colombia), Zhifang Tu (China), and others. We worked to ensure that CODATA Connect remained inclusive, interdisciplinary, and globally representative. The emphasis was consistently on peer mentorship, leadership development, and translation of open science principles into local research practice, particularly within low- and middle-income country contexts.

This trajectory reached a moment of continuity in August 2025, when I returned once again to ICTP, Trieste, this time not as a participant, but as a tutor and co-lead for the CODATA-RDA Advanced Workshop on Urban Data Science https://indico.ictp.it/event/10990).

Image

Having first attended the CODATA-RDA programmes as a student in 2017 and 2018, returning as a facilitator underscored the iterative nature of CODATA’s capacity-building model. Alongside colleagues Dr Shaily Gandhi (ITU Linz, Austria) and Dr Neema Sumari (Sokoine University of Agriculture, Tanzania), I contributed to hands-on sessions on geospatial analytics for urban planning and policy, predictive modelling for population dynamics, infrastructure, and health-risk assessment, and decision-support systems for resilient and sustainable cities.

The 2025 workshop brought together researchers from multiple regions to deepen expertise in big-data analytics, computational infrastructure, urban and environmental data science, and ocean-science data, all grounded in FAIR principles and ethical data stewardship. Contributing to the same platform that had shaped my own formation in research data science reinforced a central lesson of this journey: effective capacity building is not a single intervention, but a networked process sustained through collaboration, continuity, and shared responsibility, where today’s participants become tomorrow’s instructors, mentors, and stewards of the global data ecosystem.

Broadening horizons: global exposure through CODATA-enabled opportunities

Alongside teaching and network leadership, CODATA-enabled pathways opened doors to broader global engagement. I was selected to participate in the International Training Workshop on Open Science and the SDGs hosted by the Chinese Academy of Sciences in Beijing in 2023, contributing to discussions on ethical data reuse and sustainable development. These collaborations produced the peer-reviewed article: Statements on Open Science for Sustainable Development Goals in the Data Science Journal, in which I was a co-author (https://doi.org/10.5334/dsj-2024-049). Earlier, I had been selected for Topics in Digital and Computational Demography at the Max Planck Institute for Demographic Research (Germany) and for the ALPSP Virtual Conference and Awards in the United Kingdom, one of only 20 global recipients.

Travel grants from CODATA supported participation in the ICTP Trieste programme (2018) and the Science for Development Workshop in South Africa (2020), underscoring CODATA’s practical commitment to inclusion. These experiences reinforced a consistent message: global capacity building is strongest when financial, intellectual, and institutional barriers are addressed together.

This period of sustained engagement and international collaboration was also marked by formal recognition from the wider scientific community. In 2025, I was inducted into Sigma Xi, The Scientific Research Honor Society, in recognition of my research contributions and commitment to advancing science in the public interest. While this honour is conferred independently, it reflects the cumulative impact of long-term investment in research training, open science practice, and global collaboration. The skills, networks, and values cultivated through CODATA’s capacity-building ecosystem were central to developing the kind of research profile and scholarly orientation that such recognition acknowledges.

SAIL 2025 as a milestone, not the destination

In 2025, I was invited to present at the Symposium on Artificial Intelligence for Learning Health Systems (SAIL 2025), co-hosted by Harvard Medical School and convened around a shared commitment to equity-driven, ethically grounded applications of artificial intelligence in healthcare. My presentation drew on doctoral research that applied machine-learning methods to examine inequities in HIV self-testing uptake across sub-Saharan Africa, using large-scale demographic health survey data from 24 countries (https://sail.health/event/sail-2025/program/).

Image

The study employed Classification and Regression Tree (CART) and Random Forest models to identify socio-demographic predictors of willingness to self-test for HIV. Beyond methodological performance, the analysis foregrounded a persistent equity concern: rural populations, individuals with lower levels of education, and those in lower-income groups remain systematically underserved. The work demonstrated how predictive analytics, when designed transparently and interpreted responsibly, can inform targeted, community-embedded public health interventions rather than reinforce existing disparities.

What made participation in SAIL 2025 particularly significant, however, was not the event itself but the lineage that made meaningful engagement possible. The ability to work confidently across disciplinary boundaries, to interrogate data quality and representativeness, to foreground ethics and FAIR principles, and to communicate complex analytical approaches to diverse audiences was not acquired in isolation. These capacities were cultivated incrementally through long-term engagement with CODATA-led training programmes, teaching roles, and international peer networks.

Across plenary sessions, panels, and technical discussions at SAIL, a consistent message emerged: AI should not be framed as a luxury innovation for high-resource health systems, but as a practical, scalable tool for strengthening learning health systems where access, quality, and data infrastructure remain uneven. Conversations around AI-enabled clinical decision support in low- and middle-income countries, data governance for learning health systems, and patient-centred innovation resonated strongly with principles long emphasised within CODATA’s capacity-building ecosystem.

Several themes from the symposium were especially aligned with this trajectory. First, the centrality of context, that AI systems must be designed to work within real-world constraints rather than idealised data environments. Second, the discussions highlighted that data quality and equity cannot be treated separately: AI systems trained on incomplete, biased, or poorly governed datasets are likely to reinforce existing health disparities rather than mitigate them. Third, the importance of trust, transparency, and explainability, particularly when deploying models in sensitive or high-stakes health domains. Finally, there was a strong emphasis on collaboration over competition, underscoring the need for interdisciplinary and cross-sector partnerships to advance AI for health responsibly.

Seen through this lens, SAIL 2025 was not a destination, but a convergence point, where years of sustained capacity building translated into frontier research engagement. It affirmed that long-term investment in data skills, ethical reasoning, and global research networks enables researchers, particularly those working in LMIC contexts, to contribute meaningfully to shaping emerging conversations at the intersection of AI and health.

Rather than standing apart from earlier stages of training and collaboration, SAIL 2025 illustrated the cumulative effect of CODATA’s model: a pathway in which early exposure evolves into leadership, stewardship, and the application of advanced methods to questions of equity and public value.

From skills to stewardship: Governance and Responsibility

More recently, my engagement with CODATA has extended beyond training and programme delivery into data governance, interoperability, and infrastructure stewardship. I currently serve as a member of the Cross-Domain Interoperability Framework (CDIF) Working Group and Advisory Group, where I contribute to the development and review of interoperability standards, emerging CDIF profiles, and strategic oversight for globally connected data ecosystems. This work involves close collaboration with an international body of senior experts, as well as ongoing technical discussions focused on enabling responsible data reuse across domains.

In parallel, I serve as a reviewer for the Data Science Journal and have contributed to CODATA’s Smart Cities Task Group and the Resilient and Healthy Cities Working Group, with a particular focus on data-driven approaches to urban health, climate resilience, and risk reduction. These roles reflect an increasing emphasis on stewardship, helping to shape not only how data are analysed, but how they are governed, shared, and translated into public value within complex socio-technical systems.

This evolution from skills acquisition to systems-level responsibility has been further strengthened through formal engagement with public-sector digital governance. In December 2025, I completed the AI and Digital Transformation in Government programme delivered by Saïd Business School, University of Oxford, in collaboration with UNESCO. The programme offered a rigorous, practice-oriented exploration of how governments can responsibly harness artificial intelligence and data-driven technologies to deliver inclusive, ethical, and effective public services.

Key areas of focus included AI ethics and governance, human-centred service design, digital leadership, cyber resilience, and the management of systemic change within public institutions. Importantly, the programme foregrounded the role of evidence, accountability, and institutional capacity in ensuring that digital transformation serves citizens rather than exacerbates existing inequalities.

Taken together, these governance, editorial, and policy-oriented engagements reflect a central lesson of sustained capacity building: technical competence must ultimately be matched by institutional responsibility. The transition from learning how to use data to helping shape the frameworks that govern its use represents a critical step in ensuring that data science and AI contribute to equitable, trustworthy, and socially grounded outcomes at scale.

What this journey tells us about sustaining capacity

Several lessons emerge from this journey. First, long-term, networked training matters more than one-off workshops. Skills persist when they are reinforced through return, teaching, and community. Second, effective capacity building produces leaders and stewards, not just analysts. Third, continuity, supported by mentorship, alumni networks, and governance roles, is essential for translating training into durable impact, particularly in LMIC contexts.

Looking ahead

As data science and artificial intelligence increasingly shape global responses to health, climate, and development challenges, CODATA’s model offers a compelling blueprint. Capacity building is not an event; it is a commitment sustained over time. For early-career researchers, particularly those working in resource-constrained settings, CODATA continues to demonstrate what is possible when openness, equity, and continuity are placed at the centre of scientific practice.

Short Biography of the Author

Felix Emeka Anyiam is a public health researcher and data scientist based at the University of Port Harcourt, Nigeria. His work focuses on the ethical and equitable application of data science and artificial intelligence to health systems, urban resilience, and development challenges in low- and middle-income countries. An alumnus and long-term contributor to CODATA-led Research Data Science programmes, he has served as co-instructor in CODATA-RDA Advanced Workshops, inaugural co-lead of CODATA Connect (the Early Career and Alumni Network), and a member of multiple CODATA task and working groups. His research and teaching emphasise FAIR data principles, reproducibility, and responsible data governance within global and local research ecosystems.

February 2026: Publications in the Data Science Journal

Image

ImageTitle: FIP Check: A Rubric-Based Tool for Assessing FAIR Implementation Profiles and Enabling Resources
Author: Sungha Kang, John Graybeal, Barbara Magagna, Erik Schultes, Nancy Hoebelheinrich, Chris Erdmann, Ismael Kherroubi Garcia, Julianne Christopher, Christine R. Kirkpatrick
URL: http://doi.org/10.5334/dsj-2026-008
ImageTitle: Development of Technology Convergence Assessment Framework for Poly crisis
Author: Rania Elsayed Ibrahim, Tshiamo Motshegwa, Abdelaziz Elfadaly, Alaa A. Elbiomy, Mai Ramadan Ibraheem
URL: http://doi.org/10.5334/dsj-2026-007
ImageTitle: Bridging the Data Discovery Gap: User-Centric Recommendations for Research Data Repositories
Author: Mingfang Wu, Felicitas Löffler, Brigitte Mathiak, Fotis Psomopoulos, Uwe Schindler, Amir Aryani, Jordi Bodera Sempere, Antica Culina, Andreas Czerniak, Chris Erdmann, Kathleen Gregory, Nick Juty, Allyson Lister, Ying-Hsang Liu, Samantha Pearman-Kanza
URL: http://doi.org/10.5334/dsj-2026-006
ImageTitle: Essential Aspects of Tools for Developing Scientific Data Management Plans
Author: Fabiano Couto Corrêa Silva, Sandra de Albuquerque Siebra, Laura Vilela Rodrigues Rezende, Alexandre Faria de Oliveira, Denise Oliveira de Araújo
URL: http://doi.org/10.5334/dsj-2026-005
ImageTitle: Data Management in a Community-Based Birth Cohort: What the SEMILLA Study Teaches Us
Author: Nataly Cadena, Fadya Orozco, Stephanie Montenegro, Fabián Muñoz, Alexis J. Handal
URL: http://doi.org/10.5334/dsj-2026-004
ImageTitle: Implementing the FAIR and CARE Principles Simultaneously: Emerging Insights from IPBES
Author:  Renske M. Gudde, Rainer M. Krug, Yanina V. Sica, Howard P. Nelson, Félicie Françoise, Manuela Gómez-Suárez, Aidin Niamir
URL: http://doi.org/10.5334/dsj-2026-003