Data Citation Standards and Practices

Data Citation Overview

Similar to established practices and expectations regarding the citation of journal articles, book chapters, etc., citation of data sets is necessary in order to acknowledge work that has contributed to your own research. Unlike print resources, however, standard formats for data citation have not been widely established or formalized, but instead are still under active development. 

This guide offers some general guidelines for creating data citations with sufficient information to provide proper attribution to the original creators of the data set, facilitate impact tracking, and support the reusability and reproducibility of your own data and findings.

Data Citation Basics

While the specific format and content of accepted data citations vary from discipline to discipline and between publishers,  the minimum recommended elements of a data citation are:

  • Author(s): Including the names of each individual and/or organization which contributed to the creation of the dataset.
  • Date: Minimally, the year in which the dataset was published.
  • Title: This may include version information, in addition to a base title for the dataset or study.
  • Publisher
  • Identifier: Ideally, a unique, global, persistent identifier such as a Handle or DOI.

Additional elements may be used as needed, in particular to document which version of a study or dataset has been accessed.

The need for and definition of the components of well-structured and complete data citations is clearly defined in the broadly endorsed Joint Declaration of Data Citation Principles [1] - 

Data should be considered legitimate, citable products of research. Data citations should be accorded the same importance in the scholarly record as citations of other research objects, such as publications[1].

Credit and Attribution
Data citations should facilitate giving scholarly credit and normative and legal attribution to all contributors to the data, recognizing that a single style or mechanism of attribution may not be applicable to all data[2].

In scholarly literature, whenever and wherever a claim relies upon data, the corresponding data should be cited[3].

Unique Identification
A data citation should include a persistent method for identification that is machine actionable, globally unique, and widely used by a community[4].

Data citations should facilitate access to the data themselves and to such associated metadata, documentation, code, and other materials, as are necessary for both humans and machines to make informed use of the referenced data[5].

Unique identifiers, and metadata describing the data, and its disposition, should persist --  even beyond the lifespan of the data they describe[6].

Specificity and Verifiability 
Data citations should facilitate identification of, access to, and verification of the specific data that support a claim.  Citations or citation metadata should include information about provenance and fixity sufficient to facilitate verfiying that the specific timeslice, version and/or granular portion of data retrieved subsequently is the same as was originally cited[7].

Interoperability and Flexibility
Data citation methods should be sufficiently flexible to accommodate the variant practices among communities, but should not differ so much that they compromise interoperability of data citation practices across communities[8].


1. Data Citation Synthesis Group: Joint Declaration of Data Citation Principles. Martone M. (ed.) San Diego CA: FORCE11; 2014