Definition of metadata
Metadata means information about the data. In other words, metadata summarizes basic description and information about the data, which makes finding and understanding the data itself easier, for humans and machines.
Many distinct types and categories of metadata have been defined (see also NISO 2004, How to FAIR). However, one important distinction is between metadata that gives information about the overall study/project (such as authors, aims, date etc) and metadata that applies at the individual data point or observation level (such as variables names and relation between files). Usually, these two metadata types need to be provided in different type of documentation.
Few simple and intuitive examples of “metadata vs data” can be found here.
What metadata are for
Metadata is as important as data, since it provides all the necessary information to enable finding, understanding and reuse of data by anyone. For example, if you want to find a book in a library, you can usually search for the book by using the following metadata in the library catalogue:
- Publication year
Without these metadata it would be impossible to find the book you are looking for.
Metadata schemas and where to find them
A metadata schema is a fixed set of attributes (or metadata fields) about the data that needs to be provided. Some attributes are mandatory, some are only recommended or optional. When creating metadata, it is good practice to not invent your own schema, but to make use of existing metadata schemas accepted as standards by several communities. There are many standard metadata schemas, some generic, while others discipline-specific.
- Generic metadata schemas, such as Dublin Core, tend to be easy to use and widely adopted, but often need to be expanded in order to cover more specific information.
- Discipline-specific schemas, such as MIAPPE, have a much richer vocabulary and structure, but tend to be highly specialized and only understandable by researchers in that area. The European Nucleotide Archive (ENA) developed sample checklists to meet the needs of different research communities to describe biological samples.
Metadata schema and data repositories
Data repositories can use:
- Standard metadata schemas
The RDA keeps an open knowledge base on research metadata standards along with the repositories that use them. Repositories can be selected based on metadata standard by using re3data.org.
- Repository-Developed metadata schemas
Some repositories have decided that current standards do not fit their metadata needs, and so they have created their own requirements. Information about a specific metadata schema required by a data repository should be found on the repository’s website, usually under the “Help” or “Submit” section. Moreover, lists of Repository-Developed metadata schemas can be found on DCC and RDA websites.
Since data repositories can require that submitted datasets must be described according to a specific metadata schema, it is recommended to consider in what repository your data could be published already at the beginning of your project. Knowing what metadata will be required allows you to easily keep track of all that information, while collecting the data or performing the experiments.