Data Management in Simple Steps
Start with the end in mind
It is good practice to publish research data at the end of a project, especially if used for a publication of an article.
A recommended way to publish data is by deposition into an online repository, that will store your data and make it accessible to others. Repositories can be general or discipline specific; we suggest using disciplinary repositories to increase the impact of your data among your peers.
Before generating data, consider in what repository you want to publish it at the end of the project. This gives you the advantage of structuring the data and the metadata according to the repository’s guidelines since the beginning, enabling an easy data submission at the end.
By starting with the end in mind, you will avoid spending time restructuring the data and rewriting metadata at the end of the project, when collecting the necessary information can be difficult.
Choose the appropriate repository for your data
You can find the right repository for your data by following the steps described below.
Consider possible ethical and legal implications of your data (contact the ethical committee and the legal team in your institution, if necessary).
- Data availability requirements : check what are the requirements of your research funder, your institution or scientific journal about the availability of research data.
- Data access restrictions: check if there are legal or ethical reasons to restrict access to your data.
- Data reusability: check if there is any reason to limit reusability of your data by others or if you need to put a specific licence on the data.
- When these aspects of your data are clear to you, you can start narrowing down possible choices by using some of the following tools:
- For biomolecular data in Life Sciences, we recommend using repositories listed in ELIXIR Deposition Databases for Biomolecular Data. Moreover, EMBL-EBI data submission wizard will help you to find the right repository depending on your data type, in a few simple steps.
- The Nature journal, Scientific Data compiled a list of recommended repositories grouped by discipline.
- re3data.org gathers information about existing repositories and allows you to filter them based on access and licence types. Specifically, you can select repositories by access type to the repository itself, defining whether/how users can access the database in general (database access), and access type to each dataset stored in the repository (data access). Filtering by database and data licence is also possible.
- re3data.org and FAIRsharing websites gather details for repositories, which you can filter by discipline, data type, taxonomy and many other features.
- At this stage you should have a more clear idea about the appropriate repository for your data. So, you can visit the websites of the repositories that you selected and read more in detail about their access policy, licence and submission procedure.
What you need to know about your favourite repositories
Before start generating or collecting data, there are three things that you need to learn about the selected repositories:
Metadata schema and ontology
Information about a specific metadata schema and ontology required by a data repository should be found on the repository’s website, usually under the “Help” or “Submit” section. Take this information and use it to describe your data when you start generating or collecting data. Read more about metadata schema in practice and ontology.
Information about file format(s) accepted by a data repository should be found on the repository’s website, usually under the “Help” or “Submit” section. Knowing what file formats are accepted by the repository of your choice, at the beginning of the project, allows you to immediately use the appropriate format for your datafiles. Read more about recommended file formats.
Costs for data sharing and storage could be significant, so it is wise to take these costs into account when asking for research fundings and in your data management plan (DMP). Usually, these costs depend on the volume of the data and the required duration of the service. ELIXIR Deposition Databases for Biomolecular Data and other online repositories offer sharing and storage service free of charge; however, some repositories can have a different fee policy. Therefore, it can be useful to read “Terms and Conditions” of the repository of your choice, before generating data.
Write your DMP and start your research
Hopefully, these steps help you start writing you DMP and describing your data according to standards used by online repositories, so that the data submission process will be smooth.
General information about metadata, ontology, file formats and costs can be found in the section Make your DMP of this DM Hub. Specific information about repositories listed in the ELIXIR Deposition Databases for Biomolecular Data can be found in the “Data Management for Omics Data” section.