Data Management for Omics Data

Genomics

Genome

The full complement of genetic material in an organism is called its genome. Therefore, genomics involves studies that are conducted at the level of the genome (Scitable by Nature Education).

Recommended repositories

European Nucleotide Archive (ENA). The European Nucleotide Archive (ENA) provides a comprehensive record of the world’s nucleotide sequencing information, covering raw sequencing data, sequence assembly information and functional annotation.
European Genome-phenome Archive (EGA). The EGA provides a service for the permanent archiving and distribution of personally identifiable genetic and phenotypic data resulting from biomedical research projects. Data at EGA was collected from individuals whose consent agreements authorise data release only for specific research use to bona fide researchers. Strict protocols govern how information is managed, stored and distributed by the EGA project. Nature Genetics 47, 692–695, (2015)| https://doi.org/10.1038/ng.3312.

Metabolomics

Metabolome

The metabolome is the global collection of all low molecular weight metabolites that are produced by cells during metabolism, and provides a direct functional readout of cellular activity and physiological status (from Advances in Genetics, 2016, ScienceDirect).

Metabolomics is the large-scale study of small molecules, commonly known as metabolites, within cells, biofluids, tissues or organisms (EMBL-EBI training).

Recommended repositories

MetaboLights. MetaboLights is the first general purpose, open access repository for metabolomics studies, their raw experimental data and associated metadata (EMBL-EBI train online).

Proteomics

Proteome

A proteome is the complete set of proteins expressed by an organism (Scitable by Nature Education). Proteomics is the large scale study of all the proteins present in a cell, tissue or organism at any one time.

Recommended repositories

PRoteomics IDEntifications (PRIDE) database. PRIDE is a repository for mass spectra data. PRIDE database was set up to enable public data deposition of Mass Spectrometry (MS) proteomics data, providing access to the experimental data described in scientific publications. The main focus of PRIDE is to support the deposition of shotgun MS/MS proteomics datasets.
UniProt. UniProt is a curated and annotated protein sequence knowledge base. The mission of UniProt is to provide the scientific community with a comprehensive, high-quality and freely accessible resource of protein sequence and functional information.

Transcriptomics

Transcriptome

A transcriptome is the full range of messenger RNA, or mRNA, molecules expressed by an organism. In contrast with the genome, which is characterized by its stability, the transcriptome actively changes. In fact, an organism’s transcriptome varies depending on many factors, including stage of development and environmental conditions (Scitable by Nature Education).

Two main techniques are used to study the transcriptome, namely microarray and RNA-seq. RNA-seq is based on Next Generation Sequencing (NGS) platforms.

Recommended repositories

ArrayExpress. ArrayExpress Archive of Functional Genomics Data stores data from high-throughput functional genomics experiments, and provides these data for reuse to the research community.