Introduction
The European Nucleotide Archive (ENA) is a fully open repository dedicated to storing raw sequencing data, assemblies, and annotation data. The ENA Data Submission Toolbox simplifies the submission of sequence data, including raw reads and assembled sequences by offering a single-step submission process, a graphical user interface, tabular-formatted metadata and client-side validation, for every sample checklist supported at ENA.
Its components
ENA upload CLI
Command line tool allowing submission of raw reads data and respective metadata to ENA using tabular files or an excel sheet. Programmatically submit study, sample, run and experiment objects without the need of logging in to the ENA Webin interface.
Key features:
- Submit raw sequencing data and metadata
- High volume submissions
- Support for all sample types
- Use tabular files or an excel sheet to easily capture the metadata
- Add, modify, cancel and release ENA objects (study, experiment, run and sample) taking away the need login into ENA Webin.
- Safe credential management using a credentials file
- Client side validation using ENA checklists (samples) and official ENA XSD files (run, experiment and study)
- Compatible with the provided tsv/xlsx templates to fill in the metadata (see below)
Documentation + source Install using pip
Galaxy
Galaxy is an open-source platform for FAIR data analysis that enables users to use tools from various domains through its graphical web interface. The 2 tool wrappers listed below make Galaxy your one stop shop for data preprocessing, analysing and submitting. Both tools can be installed through the Galaxy toolshed.
Galaxy ENA upload tool
This is the Galaxy tool wrapper of the ENA-upload-cli mentioned above. The integration with Galaxy gives the command line tool a graphic user interface and adds support for interactive submissions.
Key features:
- Graphical user interface making it easy to use
- Raw read submissions
- ENA-upload-cli at its core
- Add and modify ENA objects
- User based credential management
- Possibility to set a system-wide brokering account
- Easy data upload/management
- Available at useGalaxy Europe, useGalaxy Belgium and useGalaxy Australia
Use at useGalaxy.be Source Tutorial on GTN Galaxy toolshed
Galaxy assembly submission tool
Galaxy wrapper to submit consensus sequences to ENA in an interactive way. The tool has the Webin-CLI script of ENA at its core and supports all sample checklists.
Key features:
- Interactive submission of the metadata
- Possibility to set a brokering account
- Easy data upload/management
- Available at useGalaxy Belgium
Use at useGalaxy.be Source Tutorial on GTN Galaxy toolshed
Metadata templates
Tabular-format and xlsx spreadsheet metadata templates required to submit data to ENA using the ENA-upload-cli or GALAXY ENA upload tool. A GitHub Action is put in place to automatically keep these templates up to date with the ENA sample checklists.
Key features:
- GitHub repo hosting tsv and xlsx templates for every checklist
- GitHub Actions to keep up to date with ENA XSD/checklist files to guarantee compatibility
- Versions in sync with the ones from ENA-upload-CLI
Docker deployment
When you can not use the Galaxy instances at useGalaxy.eu, .be and .au, possibly due to GDPR reasons, we also offer a ready to use Docker container. The Docker container is shipped with the previously mentioned Galaxy tools and deploys locally a fully usable Galaxy instance.
Key features:
- Easy deployment of a local Galaxy instance
- Data stays on-premise until submission
- Contains:
- ENA-upload tool
- ENA Assembly submission tool
- Tool to easily clean raw reads from human reads
Documentation + source Download image from Quay.io
Using DataHub to manage your metadata
DataHub is a free and open platform for easier Research metaData Management in Life Sciences. Based on FAIRDOM-SEEK, DataHub offers users the ability to effortlessly create sample metadata templates derived from ENA-specific templates for seamless compatibility with the ENA repository.
Key features:
- Powered by FAIRDOM-SEEK open-source software, DataHub facilitates effective research metadata management
- Easily craft metadata templates tailored to various repositories
- Promoting and supporting ENA standards and checklists ensures data consistency and compliance
- Streamline data exchange with structured metadata export adhering to the ISA-JSON standard
- Promote ISA-JSON as a machine-actionable metadata carrier, enhancing interoperability
Publication
Roncoroni, M., Droesbeke, B., Eguinoa, I., De Ruyck, K., D’Anna, F., Yusuf, D., Grüning, B., Backofen, R., & Coppens, F. (2021). A SARS-CoV-2 sequence submission tool for the European Nucleotide Archive. Bioinformatics, 37(21), 3983–3985. https://doi.org/10.1093/bioinformatics/btab421