Digital Data Management, Curation, and Archiving

NSF General Purpose DMP

Roles and Responsibilities

Explain how the responsibilities regarding the management of your data will be delegated. This should include time allocations, project management of technical aspects, training requirements, and contributions of non-project staff - individuals should be named where possible. Remember that those responsible for long-term decisions about your data will likely be the custodians of the repository/archive you choose to store your data. While the costs associated with your research (and the results of your research) must be specified in the Budget Justification portion of the proposal, you may want to reiterate who will be responsible for funding the management of your data. Consider these questions:

  • Outline the staff/organizational roles and responsibilities for implementing this data management plan.
  • Who will be responsible for data management and for monitoring the data management plan?
  • How will adherence to this data management plan be checked or demonstrated?
  • What process is in place for transferring responsibility for the data?
  • Who will have responsibility over time for decisions about the data once the original personnel are no longer available?

Types of Data Produced

Give a short description of what 'data' will mean in your research - explain what the contents of each dataset will be, including size and amount if known. It would also help if you can identify your methods for collecting data. Consider these questions:

  • What data will be generated in the research?
  • What data types will you be creating or capturing?
  • How will you capture or create the data?
  • If you will be using existing data, state that fact and include where you got it. What is the relationship between the data you are collecting and the existing data?
  • What data will be preserved and shared?

Data and Metadata Standards

This portion of the DMP asks you to combine an explanation of the format of your data and how that format will allow for fast and easy access to the data. One of the main thrusts of the DMP requirement is the NSF's intention to encourage data sharing among researchers - consider this when answering the questions below. Think about how you can not only make your data available to researchers "on-demand," but also how you can more proactively make your data accessible without a specific request. In this section you are also asked to account for issues of privacy, confidentiality and ownership that may arise from the dissemination of your data. Think about what you have done to comply with your obligations in your Institutional Review Board Protocol. Consider these questions:

  • Which file formats will you use for your data, and why?
  • What transformations (to more shareable formats) will be necessary to prepare data for preservation / data sharing?
  • What form will the metadata describing/documenting your data take?
  • How will you create or capture these details?
  • Which metadata standards will you use and why have you chosen them? (e.g. accepted domain-local standards, widespread usage).
  • What contextual details (metadata) are needed to make the data you capture or collect meaningful?
  • What other types of information should be shared regarding the data, e.g. the way it was generated, analytical and procedural, information?
  • What metadata/ documentation will be submitted alongside the data or created on deposit/ transformation in order to make the data reusable?
  • How and when will you make the data available? (Include resources needed to make the data available: equipment, systems, expertise, etc.)
  • What is the process for gaining access to the data?
  • Will any permission restrictions need to be placed on the data?
  • Are there ethical and privacy issues? If so, how will these be resolved?
  • What have you done to comply with your obligations in your IRB Protocol?
  • Who will hold the intellectual property rights to the data and how might this affect data access?
  • What and who are the intended or foreseeable uses/users of the data?

Policies for Access and Sharing

The main reason a Data Management Plan is required, is for you to think about how you prepare (manage) your data for sharing and describe how you will actively share your data with non-group members after the project is completed. Since publicly funded research is funded by tax dollars, most funding agencies expect your data to be made easily available to others. If you are going to embargo or withhold all or part of your data, it is important you explain why. This might include publishing agreements, commercial or patent interests, private or sensitive data, among other reasons.

NOTE: You should be as open as possible with the products of your research. NSF and most other funding agencies use this as part of the evaluation of your proposal. Grant proposals and DMPs which seem to limit access to the products resulting from the grant do get reviewed poorly in this area.

You should describe:

  • When your data will be available

  • How you will make your data available

    • Email request (avoid this, as research has shown only 20% of data that says its available this way is actually retrievable)

    • Domain Specific Repository (ie. Genbank)

    • Institutional Repository

    • Journal

  • Embargo Periods

    • A reasonable delay to publish is acceptable

    • Delay for patents, although provisional patents can be obtained through the STC at UNM

    • Who will hold the intellectual property rights on the data

    • Any issues with private, sensitive or secret data

  • Describe how data will be anonymized if need be.

  • Locations to ecologically or culturally sensitive sites are removed or obscured

Policies for Re-use and Redistribution

Who will be allowed to use your data, how will they be allowed to use your data and will they be allowed to disseminate your data? If you are planning on restricting access, use or dissemination of the data, you must explain in this section how you will codify and communicate these restrictions.

A good strategy for doing this is applying a license to your data. A license can guide others in using your data under the conditions you choose. There are several licensing schemes and tools online that can help you with creating a license of your data, such as Creative Commons used by websites such as Flickr.

Data Storage and Preservation of Access

This portion of the Data Management Plan asks the researcher to provide a long-term strategy for archiving and preserving the data from the research described in the proposal. Consider these questions:

  • What is the long-term strategy for maintaining, curating and archiving the data?
  • Which archive/repository/database have you identified as a place to deposit data?
  • What procedures does your intended long-term data storage facility have in place for preservation and backup?
  • How long will/should data be kept beyond the life of the project?

UNM's Suggested Answer Text:

Data will be archived for a minimum of 10 years at the UNM Libraries’ Digital Repository after the grant ends. After this time, the data will be appraised per established collection and archival management policies for transfer to an external repository, longer-term archiving, or alternative disposition. The UNM Digital Repository is an Open Archives Initiative (OAI) compliant repository, which enables Dublin Core metadata and dataset objects to be shared and harvested by other archival and discovery systems through the OAI-PHM protocol.

The UNM Digital Repository is maintained by the UNM Libraries. Archive staff will also provide daily file integrity and format verification and will create and maintain technical and administrative metadata using the widely adopted Metadata Encoding and Transmission Standard and Preservation Metadata Implementation Strategies metadata standards. These additional metadata include digital file signatures and checksums for bitwise integrity validation and chain of custody documentation. Primary responsibility for curating and preparing the data for archiving will rest with the Libraries’ Data Curation Librarian.

Additional Possible Data Management Requirements

You are asked to explain in this section how you plan on satisfying any additional, program-specific data management requirements, if any exist. If not you may leave this section blank.

NSF AGS

The Atmospheric & Geospace Sciences division of the NSF Geosciences directorate provides a data management plan template in PDF format. The form and additional guidance are provided on the division's website: https://www.nsf.gov/geo/geo-data-policies/ags/index.jsp

NSF BIO

NSF's Directorate for Biological Sciences periodically updates its DMP guidance. The current version is linked below.

NSF CHE

NSF Division of Chemistry's updated advice to PIs on data management plans includes chemistry specific comments on the essential components of a DMP elaborated elsewhere. Reference is made to multiple resources, with a note that DMP's "should not be generic."

NSF CISE

The NSF CISE provides DMP guidance, linked below. Among other details of note, CISE provides guidance on selecting or evaluating repositories for data sharing and preservation and describes how progress in data management may be addressed in annual and final project reports.

NSF EAR

The NSF Earth Sciences division of the Geosciences directorate provides some elaboration, linked below, on the general NSF data sharing policy. Specifically, EAR guidance affirms that "preservation of all data, samples, physical collections and other supporting materials needed for long-term earth science research and education is required of all EAR-supported researchers."

Because the EAR data sharing policy was released in 2010, PIs are encouraged to contact us for any updated information. Notably, the EAR policy defines a 2 year maximum embargo period for exclusive data use by researchers. Though still potentially in effect, this policy contrasts with NSF's more recent Public Access Plan, which specifies a one year maximum for embargos.

NSF EHR

In its DMP guidance, the NSF EHR directorate acknowledges issues of privacy and other constraints relevant to human subjects research. PIs are encouraged, among other things, to consider "the lowest level of aggregated data that PIs might share with others in the scientific community, given that community's norms on data."

NSF ENG

DMP guidance provided by the NSF ENG directorate notes that engineering proposals may involve proprietary or other data relevant to eventual commercialization. PIs are encouraged to distinguish between released and restricted data, but to describe how all data will be managed. 

NSF OCE

The Division of Ocean Sciences (OCE) of the NSF Directorate for Geosciences (GEO) published an update to the OCE sample and data policy in December 2016. The linked template below, provided by the online DMP Tool, includes recommendations for addressing DMP requirements. PIs submitting proposals to OCE are encouraged to take note of the Special Digital Data Guidance detailed in the sample and data policy regarding designated Federal National Data Centers where certain types of ocean digital data must be deposited.

NSF SBE

Similar to the EHR directorate, in its DMP guidance, the NSF SBE directorate acknowledges issues of privacy and other constraints relevant to social sciences and economics research. PIs are encouraged, among other things, to consider "the lowest level of aggregated data that PIs might share with others in the scientific community, given that community's norms on data."

What If You Are Not Generating Data?

The Data Management Plan is required for all proposals submitted to the NSF. Proposals submitted without a DMP will be returned for resubmission or rejected unread.

For investigators submitting proposals which will not generate or acquire data (as defined by the sponsoring directorate), the DMP is still required. Within the DMP, PIs may simply state that the project is not anticipated to generate data or samples that require management and/or sharing.

Post-Award Monitoring

  • Annual reports must include information about progress made in data management and sharing of research products.
  • Final project reports should document:
    •   Data produced during the award period
    •   Data that will be retained after the award expires.
    •   Dissemination plans **and** verification that it will be available for sharing.
    •   Data and metadata formats available to others.
    •   Where available data has been deposited for public access.