AI-Assisted Metadata Enrichment
"AI for metadata extraction in sequence data"
Machine-readable, standards-compliant metadata are the foundation for the automated reuse of scientific data. In recent years, petabytes of sequencing data have been generated worldwide and made available in repositories. For biodiversity research, these data hold immense potential—yet only minimal, often unstructured metadata are typically available. Important contextual information is frequently included only in publications and is therefore not machine-accessible. At the same time, AI models are opening up new possibilities for systematically analyzing such unstructured texts and extracting metadata to enrich existing databases. However, appropriate tools have so far been developed largely in isolation. Within this Topic Table project, we aim to foster exchange on this topic, identify synergies, advance shared standards, and address challenges such as quality assurance, interoperability, and sustainability.
The project “AI Tools for Metadata Extraction in Sequence Databases” is being carried out within the framework of the NFDI4Biodiversity Topic Tables for which the first call was issued in 2025—a collaborative format that connects stakeholders across the biodiversity community to jointly develop solutions in research data management. Learn more about the Topic Tables here.
Objectives
- Exchange and networking of activities related to AI-assisted metadata extraction
- Publication of a white paper on current developments and future challenges
- Development of a concept for the sustainable enrichment of metadata in sequence data repositories
Planned Activities
- Workshop 1 (hybrid): Overview of the current state and planned developments (What AI tools are already available? Which projects are currently underway? What will be needed in the future?), as well as formation of the author team for the white paper
- Workshop 2 (online): How have the projects from Workshop 1 progressed? What new developments have emerged? (preliminary topics)
- Workshop 3 (hybrid): What challenges arise in practical application and implementation? What comes next after the Topic Table? (preliminary topics)
- Regular online meetings of the core team to work on the white paper
Project Duration
- Status: Active
- Start: 03/2026
- Expected End: 05/2027
Team
Project Leads/Contact Persons
- Dr. Christiane Hassenrück, Leibniz Institute for Baltic Sea Research Warnemünde (christiane.hassenrueck@io-warnemuende.de)
- Prof. Dr. Birgit Gemeinholzer, University of Kassel (Birgit.Gemeinholzer@uni-kassel.de)
- Dr. Stephanie Jurburg, Helmholtz Centre for Environmental Research (stephanie.jurburg@ufz.de)
Additional Contributors
The Topic Table team also includes staff from the following institutions:
- European Bioinformatics Institute (EBI) at the European Molecular Biology Laboratory
- Global Biodiversity Information Facility (GBIF)
- Wismar University of Applied Sciences
Get Involved
The planned workshops, as well as participation in the white paper, are open to the community. Invitations to the workshops will be distributed via the open community mailing list. The first workshop will take place on 25–26 June 2026 at the Leibniz Institute for Baltic Sea Research Warnemünde. For any questions, please feel free to contact the persons listed above.
About the Topic Tables
The NFDI4Biodiversity Topic Tables provide a collaborative space for the biodiversity community to advance key topics in research data management. Based on an open call, four topics were selected for 2026, which are integrated into the consortium’s work program and supported organizationally. NFDI4Biodiversity provides the structural framework, facilitates professional networking, and ensures that results are made visible and can be sustainably developed further.
The aim is to bring together expertise, align existing approaches, and produce tangible outputs—such as white papers, guidelines, or roadmaps. The resulting contributions support practical work with biodiversity data, promote shared standards, and strengthen reliable, interoperable data use within the community.
Topic Table projects are generally designed to run for one year. To stay informed about upcoming calls, feel free to subscribe to our community mailing list and follow us on LinkedIn.
An overview of all current Topic Tables can be found here.