Digital Data Management, Curation, and Archiving

National Institutes of Health

Per the NIH Genomic Data Sharing Policy published in 2014, researchers submitting proposals for projects that will generate large-scale human or non-human genomic data must include a genomic data sharing plan. Regarding which of the sections below are recommended or mandatory, the following guidance is provided by the NIH:

[A]ll applicants proposing to generate human or non-human data, elements 1 and 2, a description of the data type and the data repository, should be provided at the time of the application. Applicants proposing to generate human data should also provide information addressing elements 3-5 (data submission and release timeline, IRB assurance of the genomic data sharing plan, and appropriate use of the data, respectively) and, if applicable, element 6 (request for an exemption to the submission) prior to award. Applicants proposing to generate non-human data need also to address element 3 (data submission and release timeline) prior to award.

Alfred P. Sloan Foundation

The grant proposal guidlines for projects funded by the Sloan Foundation are available from http://www.sloan.org/apply-for-grants/grant-proposals/, and may vary depending on the type of project and amount of funds requested.

Each set of guidelines, however, requires the submission of an Information Products Appendix which specifically includes research data and requires PIs to address the following:

Description

  • What information products will be created in the course of this project?
  • What format(s) will those information products take? Please list as appropriate.
  • Does the project involve organization or analysis of pre-existing materials, and if so, what are the relevant licensing or sharing arrangements?
  • Is the work subject to any superseding policies (such as a university open access mandate or other organization-wide policies)?

Management

  • What tools, platforms, and processes will be used to manage project assets as they are created and used through the grant period?
  • Who will be responsible for managing project assets during the grant period?

Dissemination

  • What channels will be used to disseminate grant products to target audiences?
  • Under what license(s), in what timeframe, and (if applicable) at what cost will grant products be available?
  • If the project relies on project assets that are proprietary or otherwise not available for wide dissemination, how will the final grant products be reproducible (in the case of research findings) or otherwise accessible for future use?

Archiving and Stewardship

  • How will you ensure the long-term durability of grant products after the funding period ends?
  • How long beyond the grant term will grant products be maintained and by whom?

Department of Defense

Per the DoD Public Access Plan released in February, 2015, supplementary data management plans are integral to all contract or grant proposal packages. The supplement will describe how data management will adhere to DoD policy on the dissemination and sharing of research products. Note that the DoD's Public Access plan specifically references a Department implemented data management and discovery network, which utilizes a common core metadata schema. Links are provided below.

In broad terms, the DMP will describe:

  • The types of data, software, curriculum materials, and other materials to be produced in the course of the project that are publicly releasable
  • The standards to be used for data and metadata format and content
  • Conditions for access and sharing including provisions for appropriate protection of privacy, confidentiality, security, intellectual property, or other rights or requirements
  • Conditions and provisions for re-use, re-distribution, and the creation of derivative works
  • Plans for archiving data sets, or data samples, and other digitally formatted scientific data, and for preservation of access thereto
  • If, for legitimate reasons, the data cannot be preserved and made available for public access, the plan will include a justification citing such reasons.

From the DoD Public Access Plan: http://www.dtic.mil/dtic/pdf/dod_public_access_plan_feb2015.pdf

Department of Education - Institute of Education Sciences

From the Institute of Education Sciences (IES) Policy Statement on Data Sharing in IES Research Centers:

Data sharing provides opportunities for other researchers to review, confirm or challenge study findings, which is an important aspect of the scientific process. In addition, data sharing can enhance scientific inquiry through a variety of other analytic activities, including the use of shared data to: test alternative theories or hypotheses; explore different sets of research questions than those targeted by the original researchers; combine data from multiple sources to provide potential new insights and areas of inquiry; and/or conduct methodological studies to advance education research methods and statistical analyses.

Type(s) and Format(s) of Data to Be Shared

Provide a description of the data to be collected or used. Explain what the contents of each dataset will be, including if known the number and types of files as well as file sizes. Additionally, information about data collection and quality assurance methods should be included here. Consider the following questions:

  • What data will be generated in the research?
  • What data types will you be creating or capturing?
  • How will you capture or create the data?
  • If you will be using existing data, state that fact and include where you got it. What is the relationship between the data you are collecting and the existing data?
  • What data will be preserved and shared?

Procedures for Maintaining Confidential Data

Per the IES, data sharing may not compromise the rights and privacy of human subjects. Because investigators will be expected to share data, consideration should be given to study design and procedures which will facilitate access while protecting the rights of participants and confidentiality of the data. In particular, data use and sharing information should be provided as part of the informed consent process, while data to be shared should be free of identifiers that would allow direct or deductive disclosure of study participants.

Refer to applicable IRB protocols, and consider the following:

  • Have participants consented to data sharing?
  • What are the risks of disclosure? Are there other ethical or privacy issues associated with the data?
  • What steps will be taken to de-identify the data?
  • How may data be aggregated to reduce the risk of disclosure?

Roles and Responsibilities

Explain how the responsibilities regarding the management of your data will be delegated. This should include time allocations, project management of technical aspects, training requirements, and contributions of non-project staff - individuals should be named where possible. Remember that those responsible for long-term decisions about your data will likely be the custodians of the repository/archive you choose to store your data. While the costs associated with your research (and the results of your research) must be specified in the Budget Justification portion of the proposal, you may want to reiterate who will be responsible for funding the management of your data. Consider these questions:

  • Outline the staff/organizational roles and responsibilities for implementing this data management plan.
  • Who will be responsible for data management and for monitoring the data management plan?
  • How will adherence to this data management plan be checked or demonstrated?
  • What process is in place for transferring responsibility for the data?
  • Who will have responsibility over time for decisions about the data once the original personnel are no longer available?

Expected Schedule for Data Sharing

Per the IES:

Timely data sharing is important to the scientific process. IES thus expects that data will be shared no later than when the main findings from the final study dataset are published in a peer-reviewed scholarly publication.

The IES acknowledges that there may be issues associated with providing access to data when the data collected are proprietary (e.g., when a published curriculum is being evaluated). Any restrictions on data sharing, such as a delay of disclosing proprietary data, should be presented in the DMP.

Documentation to Be Provided

Describe the format of your data, and think about what details (metadata) someone else would need to be able to use these files. Describe the structural standards that you will apply in making data and metadata available.

  • Which file formats will you use for your data and why?
  • What naming conventions will you use?
  • How will you organizing your directories?
  • What form will the metadata describing/documenting your data take?
  • How will you create or capture these details?
  • Which metadata standards will you use and why have you chosen them? (e.g. accepted domain-local standards, widespread usage)
  • What contextual details (metadata) are needed to make the data you capture or collect meaningful?

Method of Data Sharing

The template provided by online DMPTool offers the following guidance:

IES acknowledges that there are several methods to share data. These include the investigator taking on the responsibility for data sharing, which may involve making data available to the requestor through a variety of means, including their institutional or personal website Use of a data archive or data enclave. Archives can be particularly attractive for investigators concerned about a large volume of requests, vetting requests, or providing technical assistance for users seeking help with analyses. Researchers can use a data archive or enclave when datasets cannot be distributed to the general public, for example, because of participant confidentiality concerns, third-party licensing, or use agreements that prohibit redistribution. Use of some combination of these methods. A mixed method for data sharing (is allowed) that allows for more than one version of the dataset and provides different levels of access depending on the version. Consider the following: Will you share data via a repository, handle requests directly or use another mechanism?If your method of sharing is with an archive, which archive/repository/database have you identified as a place to deposit data?What procedures does your intended long-term data storage facility have in place for preservation and backup?What is the long-term strategy for maintaining, curating and archiving the data?What metadata/documentation will be submitted alongside the data or created on deposit/ transformation in order to make the data reusable?What related information will be deposited?What costs if any will your selected sharing method charge (In the budget and budget justification sections of the application, include and describe the costs of data sharing)?

https://dmptool.org/guidance

Generally, describe your long term plans for storing your data and making it available. You should include information about:

  • Where will your data be stored.
  • How your data will be saved.
  • What additional metadata is saved with the data beyond that mentioned above.
  • If known, what back up and preservation measures are in place.

Sharing and Reuse Requirements

Specifically, the IES requires researchers to specify whether or not interested parties will be subject to the conditions of a formal data sharing agreement. Keeping in mind the need to balance sharing requiremetns with legal and ethical duties to protect the privacy of study participants as noted above, consider the following:

  • With whom will you share the data?
  • Conditions for access, or other limitations on use.

Any Circumstances that Prevent Data Sharing

Describe any circumstances that prevent data sharing, including statutory confideniality requirements (HIPAA, FERPA, etc.).

Department of Energy

Per the Department of Energy's Statment on Digital Data Management, proposals submitted after October 1, 2014 will be required to include a Data Management Plan. The DOE's suggested elements of a data management plan with links to resources are provided below.

 

Department of Transportation

NOTE: At the request of DOT lawyers, the DMP template provided by the online DMPTool is accompanied by the following disclaimer:

“This tool serves to provide guidance for how to prepare a Data Management Plan (DMP). The output of this tool does not constitute an approved government form. Those preparing DMPs for submission to the U.S. Department of Transportation (USDOT) should use their best judgment in determining what information to include. USDOT has identified five (5) broad areas that should be addressed in a DMP, but is not requiring any specific information to be included in any submitted DMP. USDOT may, at its discretion, establish an Office of Management and Budget-approved information collection. Once approved, the information collection will become a form with a control number, and certain DMP elements may become mandatory.”

With those caveats in mind, researchers are encouraged to refer to the detailed template provided by the online DMPTool.

Gordon and Betty Moore Foundation

The recommended sections and guiding questions provided in the linked template below are taken from the foundation website's DMP guide. Each question may not apply to a given project, but researchers should answer as completely as possible those which are relevant.

Institute of Museum and Library Services

The Institute of Museum and Library Services (IMLS) requires a DMP for projects that develop digital content. The requirement includes specific recommendations and questions depending on whether a project involves the creation of digital datasets, software tools or electronic systems, and/or collections or databases of new content or metadata. All researchers are required to complete the section covering Copyright and Intellectual Property Rights, along with whichever other additional sections apply.

The Digital Stewardship Supplementary Information and the Digital Product forms below include more information. Please contact Research Data Services with any questions about addressing the requirements.

Joint Fire Science Program

The Joint Fire Science Program requires submission of a maximum two page DMP with all proposals. From the JFSP application requirements:

It is the intent of the Joint Fire Science Program (JFSP) that all data collected or generated through JFSP funds be of high quality and be made freely available to others within a reasonable time period.

NASA

NASA's Data Management Plan requirements are described in the administration's 2014 Public Access Plan. With limited exceptions including human subjects research, proprietary data, and sensitive or export controlled data, the requirements apply to all NASA employees and recipients of NASA research funds. Requirements as broadly described on the NASA-Funded Research Results webpage include:

  • All proposals or project plans submitted to NASA for scientific research funding will be required to include a DMP. The DMP should describe whether and how data generated through the course of the proposed research will be shared and preserved (including timeframe), or explain why data sharing and/or preservation are not possible or scientifically appropriate. At a minimum, DMPs must describe how data sharing and preservation will enable validation of published results or how such results could be validated if data are not shared or preserved.
  • DMPs must provide a plan for making research data that underlie the results and findings in peer-reviewed publications digitally accessible at the time of publication or within a reasonable time period after publication. This includes data (or how to access data) that are displayed in charts and figures. This does not include preliminary data; laboratory notebooks; drafts of scientific papers, plans for research; peer-review reports; communications with colleagues; or physical objects, such as laboratory specimens. This requirement could be met by including the data as supplementary information to the published article, through NASA archives, or other means. The published article should indicate how these data can be accessed. (http://www.nasa.gov/open/researchaccess/data-mgmt)

The online DMPTool provides a template with additional guidance.

NEH Office of Digital Humanities

Similar to the NSF, the NEH Office of Digital Humanities requires a short DMP, not to exceed two pages, to be submitted as a supplementary document. Current documentation, accessible from the links provided below, notes that DMPs are considered during the peer review of proposals, and that post award reports are expected to include discussion of compliance with the plan.

National Institute of Justice

Since 2014, the NIJ has required funding applicants to submit a 1-2 page Data Archiving Plan with all proposals. In the plan, researchers are asked to demonstrate their recognition that data sets resulting from NIJ funded research must be submitted for archiving (typically to the National Archive of Criminal Justice Data) and to describe how the data will be curated or managed to facilitate replication of results. Per the NIJ's Data Archiving Plans for NIJ Funding Applicant website, the plan must briefly describe:

  • Anticipated manipulations of original, intermediate and final data sets
  • Methods of documentation of such manipulations
  • Preparation of the submission for archiving 

More information is available on the website. A Data Archiving Plan template is also available from the online DMP Tool.

NOAA

From NOAA's Data Management Procedural Directive:

NOAA Administrative Order (NAO) 212‐15, Management of Environmental Data and Information, states that environmental data is to be managed based upon a lifecycle that includes developing and following a data management plan...The goal of the Data Management plan is to ensure that data are properly collected, documented, made accessible, and preserved for future use in a NOAA Data Center or other longterm archive facility.

USDA-NIFA

Key concepts as defined in the Data Management Plan for NIFA-Funded Research Projects documentation released April, 2015:

  • Data (digital and non-digital) are the ultimate outputs of most research investments from NIFA.
  • Appropriate data management is critical to all research enterprises and helps preserve outcomes and outputs of public and private investment.
  • Access and sharing of digital and non-digital data helps increase the scope and outcomes of scientific discoveries, sometimes beyond the initial boundaries of the research.
  • A DMP should be a core component of a research planning process and should contain adequate information for successful implementation.
  • The type of data that needs to be stored, preserved, and shared depend on the type of research, the scientific discipline, and financial implications.
  • Adequate resources must be available to implement the DMP.

Essentail elements of a USDA NIFA Data Management Plan are described in the sections below.

US Geological Survey

The United States Geological Survey provides comprehensive data management planning guidance covering the full spectrum of the research data lifecycle. Accordingly, data management plans which fully address the concepts and issues defined by the USGS will be more comprehensive than the two-page, high level overviews requested by other agencies and sponsors. More detailed information and sample plans are available from USGS Data Management.

The template and suggestions provided below are taken from the USGS DMP template provided by the online https://dmptool.org/DMPTool.

Project and Contact Information

The USGS DMP template generally follows the USGS Science Data Lifecycle Model, a high level view of how data relates to project workflows from data planning to preservation and publishing. This template is not prescriptive but meant as guidance for individuals and Centers/Programs who want to create their own Data Management Plans.

Consider these topics and questions describing very basic information about the project and the appropriate contacts:

  • What is the name of the project? Include any identifiers related to the project (e.g. Project ID, Funding ID etc).
  • What is the name of the Center/Program and Branch that oversees the project?.
  • Summary description of the project and reason why the data is being collected.
  • What is the project start and expected end dates for the project?.
  • Include any web links with more information related to the project, if applicable.
  • Who is the main point of contact for the project and its data? List an alternate point of contact, if any. (Include name, title, e-mail address, agency/organization, phone, and mailing address, as appropriate.).
  • If there are collaborating/funding agencies and organizations, who are they and who are the main points of contact?

Plan and Acquire

Plan and Acquire elements of the USGS Science Data Lifecycle: Plan refers to planning considerations before the handling of the project's data assets. Acquire describes the activities related to new or existing data that are collected or generated.

Consider these topics and questions:

  • How will the data be acquired (newly collected or using existing datasets)?
  • If acquiring existing datasets (e.g. NLCD) include the name, format, a persistent identifier, and source citation, if any.
  • Are there any restrictions or agreements such as Memorandum of Understandings (MOUs) for use and storage?
  • If collecting new data, are there special processes or procedures for collecting the data (e.g. licenses, permissions, equipment, software)?
  • What is the estimated volume of the data collected, transformed, and/or generated? e.g. megabyte (MB), GB, TB, or PB.
  • Will the data be static or is there a possibility that new data will continue to be added?
  • Are the appropriate hardware, software, and staff resources part of the budget for data management activities?

Describe and Manage Quality

Describe and Manage Quality elements of the USGS Science Data Lifecycle: Describe emphasizes documentation of every stage of the lifecycle to ensure the data assets and methods can be understood, evaluated for validity, and potentially reused. Manage Quality includes considerations for quality assurance and quality control (QA/QC) measures.

Consider these topics and questions:

  • How many new datasets will be created? List the anticipated title of each dataset.
  • What are the data types and formats, in which the data will be maintained?
  • Briefly describe the data processing steps or provide the scientific workflow and identify any software or technology needs where applicable.
  • How will the metadata for each dataset be created? Who will be responsible for the metadata creation and update?
  • Which metadata standard will be used to describe each dataset (e.g. FGDC-CSDGM, ISO 19115 series, or other as appropriate)?
  • What procedures will be used for ensuring data quality (QA/QC)? If using a known standard or protocol, include the citation source.

Backup, Secure and Preserve

Backup/Secure and Preserve elements of the USGS Science Data Lifecycle: Backup/Secure involves managing risks and accessibility to the data throughout the lifecycle. Preserve highlights important activities that should be taken to ensure long-term preservation of data, metadata, ancillary products, and additional documentation.

Consider these topics and questions:

  • Where will the data be stored in the short-term? Is it properly secured and environmentally controlled?
  • What will be the approach for routine backup of the data (frequency, duration, software, media)? Will the data be stored in multiple places and on different media types (recommended minimum of 3 copies with 1 stored in an offsite location)?
  • Describe any potential access restrictions such as the data contain Personally Identifiable Information (PII) and any practices to ensure access will be restricted.
  • What will be the final format of the data product and will there be any software needs? Will the data format be appropriate for long-term preservation?
  • Where will the data and metadata be preserved in the long-term and by which sponsoring Program (if in collaboration)? Who will be the point of contact?
  • If costs are associated with long-term storage, how will they be provided for?

Publish and Share

Publish and Share elements of the USGS Science Data Lifecycle: Publish and Share highlight important considerations related to traditional peer-reviewed publications and dissemination of the data through Web sites, data catalogs, social media and other outlets.

Consider these topics and questions:

  • How will the data itself be shared and made available to the public (e.g. web page, tool or application, data portal, repository, USGS Data Series)? Are there data release policies that need to be followed?
  • Will there be access or use restrictions on the data (e.g. sensitive data, restricted data, privacy, software with license restrictions, etc.)? Provide justification for the restriction citing any policies or legal reasons.
  • How can someone overcome these restrictions (e.g. fees, non-disclosure statements, special authorization, data embargo or hold, MOUs/MOAs)?
  • Identify any anticipated publications or electronic outlets (e.g. peer-reviewed articles, information/fact sheets, web pages) resulting from the data. If a USGS publication, indicate type (e.g. Open File Report, Provisional Release etc).
  • Where will your metadata be stored to provide an access point for discovery by users and harvest by catalogs such as the USGS Science Data Catalog?
  • How and where will you obtain a persistent identifier for the data (e.g. digital object identifier)?

DMP Tool Online

A good resource for developing any data management plan is the DMP Tool Online.

By selecting University of New Mexico from the drop down list on the sign in page, you will be able to access templates created by our Data Librarians, as well links to UNM specific recommendations and resources.

Sample Data Management Plans