2026-01-01 - 2026-12-31
Millions of microbe-related omics datasets are submitted to public repositories such as the European Nucleotide Archive (ENA), where high-quality metadata is essential to enable reuse and accelerate microbiological research and innovation. However, a major bottleneck is the attrition of metadata between sample collection, sequencing, and final submission. Complex submission workflows and the perceived burden of proper curation often discourage researchers, resulting in records with incomplete, inconsistent, or missing biological metadata.
While metadata is frequently treated as an administrative afterthought, this use case promotes that high-quality, standardized metadata is captured and curated from the very start of the data generation process. By connecting existing tools developed within the NFDI4Microbiota consortium, we will conceptualize a robust, generalized framework that ensures metadata is collected, FAIR-compliant, actively curated (with the support of data stewards), and securely linked to sequencing data right upon requesting data from sequencing providers.
To allow for this end-to-end solution, a web platform that enables the initiation of sequencing requests with cooperating sequencing centers, will also capture and validate biological metadata upfront, assigning unique identifiers to each sample. To minimize data protection concerns and administrative overhead, only these identifiers and minimal technical instructions are passed to sequencing centers (e.g., EMBL GeneCore) for the initiation of data generation. In parallel, NFDI4Microbiota data stewards curate and enrich the biological metadata. Once sequencing is complete, raw data and technical metadata are returned and automatically re-linked to the curated biological profiles via the shared identifiers. The result is a fully annotated dataset with high-quality metadata, ready for downstream analysis (e.g., through CloWM) and submission to public repositories.
To validate this framework, the EMBL GeneCore sequencing facility will serve as a real-world testing ground, demonstrating the feasibility of our cohesive data brokerage pipeline in a high-throughput environment.

You are a …
phenotypic traits
integration
metagenomics
microbiome