Start with the end in mind

It is good practice to share research data at the end of a project, especially if used for a publication.

A recommended way to share data is by deposition into an online repository, that will store your data and make it accessible to others. Repositories can be general or discipline specific; we suggest using disciplinary repositories to increase the impact of your data among your peers.

Before generating data, consider in what repository you want to share it at the end of the project. This gives you the advantage of structuring the dataset and the metadata according to the repository’s guidelines since the beginning, enabling an easy data submission at the end.

By starting with the end in mind, you will avoid spending time restructuring the dataset and rewriting metadata at the end of the project, when collecting the necessary information can be difficult.

Choose the appropriate repository for your dataset

You can find the right repository for your dataset by following the steps described below.

  1. Consider possible ethical and legal implications of your dataset (contact the ethical committee and the legal team in your institution, if necessary).

    • Funders’ requirements: check what are the requirements of your research funder about the results of the research and the underlying data.
    • Data access restrictions: check if there are legal or ethical reasons to restrict access to your data.
    • Data reusability: check if there is any reason to limit reusability of your data by others or if you need to put a specific licence on the data.
  2. When these aspects of your dataset are clear to you, you can start narrowing down possible choices by using some of the following tools:

    • gathers information about existing repositories and allows you to filter them based on access and licence types. Specifically, you can select repositories by access type to the repository itself, defining whether/how users can access the database in general (database access), and access type to each datasets stored in the repository (data access). Filtering by database and data licence is also possible.
    • and FAIRsharing websites gather details for repositories, which you can filter by discipline, data type, taxonomy and many other features.
    • The Nature journal, Scientific Data compiled a list of recommended repositories grouped by discipline.
    • For biomolecular data in Life Sciences, we recommend using repositories listed in ELIXIR Deposition Databases for Biomolecular Data. Moreover, EMBL-EBI data submission wizard will help you to find the right repository depending on your data type, in a few simple steps.
  3. At this stage you should have a more clear idea about the appropriate repository for your data. So, you can visit the websites of the repositories that you selected and read more in detail about their access policy, licence and submission procedure.

What you need to know about your favourite repositories

Before start generating or collecting data, there are three things that you need to learn about the selected repositories:

  1. Metadata schema and ontology
    Information about a specific metadata schema and ontology required by a data repository should be found on the repository’s website, usually under the “Help” or “Submit” section. Take this information and use it to describe your dataset when you start generating or collecting data. Read more about metadata schema in practice and ontology.

  2. File formats
    Information about file format(s) accepted by a data repository should be found on the repository’s website, usually under the “Help” or “Submit” section. Knowing what file formats are accepted by the repository of your choice, at the beginning of the project, allows you to immediately use the appropriate format for your datafiles. Read more about recommended file formats.

  3. Costs
    Costs for data sharing and storage could be significant, so it is wise to take these costs into account when asking for research fundings and in your data management plan (DMP). Usually, these costs depend on the volume of the data and the required duration of the service. ELIXIR Deposition Databases for Biomolecular Data and other online repositories offer sharing and storage service free of charge; however, some repositories can have a different fee policy. Therefore, it can be useful to read “Terms and Conditions” of the repository of your choice, before generating data.

Write your DMP and start your research

Hopefully, these steps help you start writing you DMP and describing your datasets according to standards used by online repositories, so that the data submission process will be smooth.

General information about metadata, ontology, file formats and costs can be found in the section Make your DMP of this DM Hub. Specific information about repositories listed in the ELIXIR Deposition Databases for Biomolecular Data can be found in the “Data Management for Omics Data” section.