Skip to content Skip to footer

Datasets information

Give information about sets of numbers, words, images, video/audio, scripts, algorithms and software used or generated in the project to describe or analyse research subjects and materials (sample, specimen, biomolecules and other materials).

Research materials and step-by-step experimental protocols used or generated during the project should be shared through centralized repositories and platforms. This information can be specified in the DMP.

1. Will you generate/collect new data and/or make use of existing data? State the reasons and specify how you will use existing data.

Asked by

[FWO, ERC, H2020, BELSPO, HorizonEurope]

Meaning

Please, state the following in your answer:

  • If there are existing data or not about your topic.
  • If there are restrictions or costs on the reuse and share of third-party data (specify the details in the next questions).
  • If re-use of any existing data has been considered but discarded and why (ex: incomplete data, data not reusable for legal reasons etc…).
  • How existing data will be used: will you always use a specific version or update to the latest versions?

Example answers

  • I will re-use existing open access datasets on [add your topic] by combining them with my new data; no limits for sharing and re-use.
  • I will only generate new data; existing data on [this topic] are sparse and inadequate because [give reasons].
  • I will re-use only open access metadata of [Dataset Name] with closed access data.
  • I will pay [X]€ to access [Dataset Name] from [Company Name]: data will not be shared, only metadata will be open access.

Mapping among funders’ DMP templates

Funder DMP section DMP question
FWO Data description Will you generate/collect new data and/or make use of existing data?
BELSPO Data description Will you generate/collect new data and/or make use of existing data?
ERC Summary collected/generated dataset
H2020 Data Summary Specify if existing data is being re-used (if any) and how
HorizonEurope Data Summary Will you re-use any existing data and what will you re-use it for? State the reasons if re-use of any existing data has been considered but discarded.
ScienceEurope 1a. How will new data be collected or produced and/or how will existing data be re-used? • Briefly state the reasons if the re-use of any existing data sources has been considered but discarded. • State any constraints on re-use of existing data if there are any.

2. Name and list here all datasets that will be used and/or generated in this project. Add a reference for existing datasets.

Asked by

[ERC]

Meaning

The name of a dataset should be short and unique in this DMP; you could also assign a unique number or ID to each dataset in this DMP. The definition of what is a dataset is extremely project-specific. Try to define each dataset based on:

  • the used technique (RNA-seq, Imaging, LC/MS etc…)
  • and/or the repository that could be used for data publication
  • and/or sample origin (organisms, literature, etc…)
  • and/or collection methods (experiment, simulation, survey, questionnaire…)

Reference for existing dataset: any identifier or accession number for keeping track of data provenance.

Example answers

  • Existing datasets: [name and access number or reference or DOI]. New datasets: [your analysis type] by [your technique] on [your organism]. Software [Name] for [your application].
  • New: RNA-seq on Arabidopsis. Phenotype analysis by imaging. Existing: Arabidopsis dataset E-MTAB-XXXX.
  • New: Methylation by LC/MS.
  • New: Biomodel of X protein complex.

Mapping among funders’ DMP templates

Funder DMP section DMP question
FWO na na
BELSPO na na
ERC Summary dataset reference and name
H2020 na na
HorizonEurope na na
ScienceEurope 1a. How will new data be collected or produced and/or how will existing data be re-used? Explain how data provenance will be documented.

3. Per dataset, state its purpose, explain the relation to the objectives of the project, specify to whom it will be useful.

Asked by

[H2020, HorizonEurope]

Meaning

A description of the purpose of the data to be collected/generated will help reviewers understand the impact of your research on academic community, industry and society.

Example answers

  • Datasets [Name] and [Name] are needed to evaluate the role of [your factor] on [your variable], as described in the objective number [X] in this project. Other researchers and industries involved in [list topics] will be interested in these data.

Mapping among funders’ DMP templates

Funder DMP section DMP question
FWO na na
BELSPO na na
ERC na na
H2020 Data Summary State the purpose of the data collection/generation. Explain the relation to the objectives of the project. Outline the data utility: to whom will it be useful
HorizonEurope Data Summary What is the purpose of the data generation or re-use and its relation to the objectives of the project? To whom might your data be useful (‘data utility’), outside your project?
ScienceEurope na na

4. Per dataset, state its origin/source.

Asked by

[FWO, ERC, H2020, BELSPO, HorizonEurope]

Meaning

Non exhaustive list of possible attributes for data source/origin or collection mode:

  • Observational, Experimental
  • Quantitative, Qualitative
  • Simulation
  • Derived/compiled from other sources
  • Digital (born-digital or digitized) or non-digital nature (e.g. paper surveys, questionnaires…)
  • Surveys or questionnaires or interviews,
  • Primary (generated by the researcher for a particular research purpose or project),
  • Secondary (originally created by someone else for another purpose)
  • Raw, Processed
  • etc

Specifying where the data come from, or when and by whom data will be generated/collected helps to identify implications for privacy (GDPR), IP and other legal or ethical aspects.

Example answers

  • Datasets [Name] and [Name] are [laboratory/field/preclinical experiments] on [organism].
  • Datasets [Name] consists of measurements performed in the lab by a partner in a different (part of the) country.
  • Dataset [Name] is a combination of existing data and new experimental data of [X] and [Y] platform/consortium.
  • The source of [dataset name] is a collection of existing [studies/books/publications].
  • The source of datasets [Name] and [Name] are quantitative [survey/interview/questionnaire] collected by a team of survey takers we hire.
  • Dataset [Name] is a [qualitative or quantitative observational] study on [population/topic/subject].

Mapping among funders’ DMP templates

Funder DMP section DMP question
FWO Data description Describe the origin, type and format of the data (per dataset) and its (estimated) volume
BELSPO Data description Describe the origin, type and format of the data (per dataset) and its (estimated) volume
ERC Summary dataset origin
H2020 Data Summary Specify the origin
HorizonEurope Data Summary What is the origin/provenance of the data, either generated or re-used?
ScienceEurope 1a. How will new data be collected or produced and/or how will existing data be re-used? Explain which methodologies or software will be used if new data are collected or produced.

5. Per dataset, state digital format(s) of raw and processed data files, distinguishing proprietary from open format(s).

Asked by

[FWO, ERC, H2020, BELSPO, HorizonEurope]

Meaning

  • Raw and processed data file formats can be instrument/software-dependent, so check the formats generated by the instrument/software you will use.
  • Try to convert proprietary format into open format to ensure that your data will be usable in the future.
  • The required digital formats of raw and processed data could vary depending on the data repository you will use to share the data, so check what formats are accepted by the chosen repository on its website.

Example answers

  • Raw and/or processed [numeric/video/audio/text…] data files of datasets [Name] and [Name] are in this [proprietary/open] [formats].
  • Raw images of the dataset “Phenotype analysis by imaging” are in open format JPEG; spreadsheet of the processed quantification data are in open .csv format.

Mapping among funders’ DMP templates

Funder DMP section DMP question
FWO Data description Describe the origin, type and format of the data (per dataset) and its (estimated) volume.
BELSPO Data description Describe the origin, type and format of the data (per dataset) and its (estimated) volume.
ERC Summary data type and format
H2020 Data Summary Specify the types and formats of data generated/collected
HorizonEurope Data Summary What types and formats of data will the project generate or re-use?
ScienceEurope 1b. What data (for example the kind, formats, and volumes), will be collected or produced? • Give details on the kind of data: for example numeric (databases, spreadsheets), textual (documents), image, audio, video, and/or mixed media. • Give details on the data format: the way in which the data is encoded for storage, often reflected by the filename extension (for example pdf, xls, doc, txt, or rdf). • Justify the use of certain formats. For example, decisions may be based on staff expertise within the host organisation, a preference for open formats, standards accepted by data repositories, widespread usage within the research community, or on the software or equipment that will be used. • Give preference to open and standard formats as they facilitate sharing and long-term re-use of data (several repositories provide lists of such ‘preferred formats’).

6. What methods or software tools are needed to access data files in proprietary format? Is documentation about the software needed to open the data file provided in the metadata? Is it possible to provide the relevant software (e.g. in open source code)?

Asked by

[ERC, H2020, HorizonEurope]

Example answers

  • Data files in proprietary formats [x,y,z] can be accessed by the software [X and Y], which are open; software info will be described in the documentation associated with the data files.
  • Format [x] can only be opened with the proprietary software [Y]; no open format nor open software exists for this data type; software info will be described in the documentation associated with the data files.
  • Software to access data files will be provided as open source.

Mapping among funders’ DMP templates

Funder DMP section DMP question
FWO na na
BELSPO na na
ERC Making data openly accessible How the data can be accessed
H2020 Making data openly accessible Specify what methods or software tools are needed to access the data? Is documentation about the software needed to access the data included? Is it possible to include the relevant software (e.g. in open source code)?
HorizonEurope Making data accessible Will documentation or reference about any software be needed to access or read the data be included? Will it be possible to include the relevant software (e.g. in open source code)?
ScienceEurope 5c. What methods or software tools are needed to access and use data? •Indicate whether potential users need specific tools to access and (re-)use the data. Consider the sustainability of software needed for accessing the data.

7. Per dataset, state its expected volume at the end of the project.

Asked by

[FWO, ERC, H2020, BELSPO, HorizonEurope]

Meaning

Data volume doesn’t have to be precise; a realistic range of the data volume is sufficient.

How to estimate dataset volume:

  • Consider at least all raw data files. Check if processed data are also required by repositories or journals.
  • Estimate file size per sample or experiment, based on files previously generated using similar setting.
  • Multiply the estimated file size by the number of samples or experiments you are going to generate during the project.

Example answers

  • Phenotype analysis: X images in Y format is about XXX GB; RNA-seq on Arabidopsis: 200 files are about 200GB.

Mapping among funders’ DMP templates

Funder DMP section DMP question
FWO Data description Describe the origin, type and format of the data (per dataset) and its (estimated) volume.
BELSPO Data description Describe the origin, type and format of the data (per dataset) and its (estimated) volume.
ERC Summary dataset expected size
H2020 Data Summary State the expected size of the data (if known)
HorizonEurope Data Summary What is the expected size of the data that you intend to generate or re-use?
ScienceEurope 1b. What data (for example the kind, formats, and volumes), will be collected or produced? Give details on the volumes (they can be expressed in storage space required (bytes), and/or in numbers of objects, files, rows, and columns).