Skip to content Skip to footer

Existing data

Before investing time, effort and resources into generating new data, look into what has already been done in the past in your field. By reusing data you increase its value and minimize redundancy.

Sources of existing data

Bibliographic research

Start with a bibliographic research. If you find an interesting publication without any available data, you can contact the authors and request access to their data. If their data are not available or you didn’t find any interesting publication, you can look for existing data in many repositories. Existing data can be described in Data papers. Data papers provide peer-reviewed descriptions of publicly available datasets or databases and link to the data source in repositories. Data papers can be published in dedicated journals, such as Scientific Data, or be a specific article type in conventional journals.

Data repositories

Repositories or databases can also contain data that are not linked to any manuscript, article or paper. Repositories can be general, data type specific or discipline specific.

ELIXIR Deposition Databases for Biomolecular Data

ELIXIR recommends the following databases for specific data type

  • Functional genomics: ArrayExpress
  • Computational models of biological processes: BioModels
  • Descriptions and metadata about biological samples used in research: BioSamples
  • Descriptions of biological studies: BioStudies
  • Personally identifiable genetic and phenotypic data resulting from biomedical research projects: EGA
  • Electron microscopy density maps of macromolecular complexes and subcellular structures: EMBD
  • Nucleotide sequence information: ENA
  • Genetic variation data from all species: EVA
  • Molecular interaction data: IntAct
  • Metabolomics experiments and derived information: MetaboLights
  • Biological macromolecular structures: PDBe
  • Proteomics experiments and derived information: PRIDE

Scientific journals and communities have compiled a number of lists and registries of recommended repositories, searchable by discipline and other characteristics.

Before reusing existing data

  • Check if a licence is attached and if it allows you to reuse the data for your intended purpose.
  • Make sure that the dataset is well described with high quality metadata and documentation.
  • Verify the quality of the data. Look for a data quality proof or run a quality test before using the data.
  • Decide which version (if present) of the data you will use.
    • You can decide to always use the version that is available at the start of the project. In this case, you need to make sure that you and others, who want to reproduce your results, can access that specific version at a later stage too.
    • You can update to the latest versions if new ones come out during your project. In this case, consider that you may need to re-do all your calculations based on a new version of the dataset and make sure that everything stays consistent.

How to cite an existing dataset

Complete citation

Author(s), Year, Dataset Title, Identifier, Repository, Version.

Short citation

Identifier, Version (if applicable).

Identifiers are machine readable alphanumeric strings provided by repositories. Identifiers can be:

  • Accession number
    example: E-MTAB-NNNN
  • DOIs
    example: doi: 10.1038/d41586-018-03071-1