Digital Data Management, Curation, and Archiving

Data Sharing and DMP Requirements

As federal requirements for the sharing of and public access to the products of sponsored research evolve, researchers can expect the various funding agencies to release data management planning and sharing policies. Within the past year, and in response to the White House Office of Science and Technology Policy (OSTP) memo of February, 2013, multiple agencies including NASA and the Department of Energy have published Public Access Plans which, once in effect, will formalize the expectations and requirements for funded researchers to efficiently manage and share their data.

While there is some variation among sponsor DMP requirements, below is a list of general topics to address. Within this guide you will find more specific information and resources for meeting different data management plan (DMP) requirements.

More general information about the OSTP memo and agency Public Access Plans is available here.

 

Elements of a Data Management Plan

While many sponsors provide specific guidance on the types of information to include in a Data Management Plan (DMP), not all do. If the funder to whom you are submitting a proposal does not have published guidelines, or if you are simply interested in developing a high level overview of data management practices for currently unfunded research, the sections described here provide a general outline for capturing necessary information.

Roles and Responsibilities

Explain how the responsibilities regarding the management of your data will be delegated. This should include time allocations, project management of technical aspects, training requirements, and contributions of non-project staff - individuals should be named where possible. Remember that those responsible for long-term decisions about your data will likely be the custodians of the repository/archive you choose to store your data. While the costs associated with your research (and the results of your research) must be specified in the Budget Justification portion of the proposal, you may want to reiterate who will be responsible for funding the management of your data. Consider these questions:

  • Outline the staff/organizational roles and responsibilities for implementing this data management plan.
  • Who will be responsible for data management and for monitoring the data management plan?
  • How will adherence to this data management plan be checked or demonstrated?
  • What process is in place for transferring responsibility for the data?
  • Who will have responsibility over time for decisions about the data once the original personnel are no longer available?

Expected Data

Give a short description of what 'data' will mean in your research - explain what the contents of each dataset will be, including size and amount if known. It would also help if you can identify your methods for collecting data. Consider these questions:

  • What data will be generated in the research?
  • What data types will you be creating or capturing?
  • How will you capture or create the data?
  • If you will be using existing data, state that fact and include where you got it. What is the relationship between the data you are collecting and the existing data?
  • What data will be preserved and shared?

Period of Data Retention

This section will allow you to account for any delay in the accessibility of your data after your research is done. Consider any reasons why you would not make the data immediately available - for instance, maybe you have political, commercial or patent concerns that will require you to postpone access to the data you produce. Consider these questions:

  • How long will the original data collector/creator/principal investigator retain the right to use the data before opening it up to wider use?
  • Explain details of any embargo periods for political/commercial/patent or publisher reasons.

Data Format and Dissemination

This portion of the DMP asks you to combine an explanation of the format of your data and how that format will allow for fast and easy access to the data. One of the main thrusts of the DMP requirement is the NSF's intention to encourage data sharing among researchers - consider this when answering the questions below. Think about how you can not only make your data available to researchers "on-demand," but also how you can more proactively make your data accessible without a specific request. In this section you are also asked to account for issues of privacy, confidentiality and ownership that may arise from the dissemination of your data. Think about what you have done to comply with your obligations in your Institutional Review Board Protocol. Consider these questions:

  • Which file formats will you use for your data, and why?
  • What transformations (to more shareable formats) will be necessary to prepare data for preservation / data sharing?
  • What form will the metadata describing/documenting your data take?
  • How will you create or capture these details?
  • Which metadata standards will you use and why have you chosen them? (e.g. accepted domain-local standards, widespread usage).
  • What contextual details (metadata) are needed to make the data you capture or collect meaningful?
  • What other types of information should be shared regarding the data, e.g. the way it was generated, analytical and procedural, information?
  • What metadata/ documentation will be submitted alongside the data or created on deposit/ transformation in order to make the data reusable?
  • How and when will you make the data available? (Include resources needed to make the data available: equipment, systems, expertise, etc.)
  • What is the process for gaining access to the data?
  • Will any permission restrictions need to be placed on the data?
  • Are there ethical and privacy issues? If so, how will these be resolved?
  • What have you done to comply with your obligations in your IRB Protocol?
  • Who will hold the intellectual property rights to the data and how might this affect data access?
  • What and who are the intended or foreseeable uses/users of the data?

Data Storage and Preservation of Access

This portion of the Data Management Plan asks the researcher to provide a long-term strategy for archiving and preserving the data from the research described in the proposal. Consider these questions:

  • What is the long-term strategy for maintaining, curating and archiving the data?
  • Which archive/repository/database have you identified as a place to deposit data?
  • What procedures does your intended long-term data storage facility have in place for preservation and backup?
  • How long will/should data be kept beyond the life of the project?

UNM's Suggested Answer Text:

The data will be archived for a minimum of 10 years at the University of New Mexico (UNM) Libraries' LoboVault institutional and data repository. After this time, the data will be appraised per established collection and archival management policies for transfer to an external repository, long term archiving in LoboVault, or alternative disposition. LoboVault is an Open Archives Initiative (OAI) compliant repository built using the DSpace repository application, which enables Dublin Core metadata and data set objects to be shared and harvested by other archival systems through the OAI-PMH protocol.

LoboVault is a designated long term digital archives resource maintained by the University of New Mexico Libraries. In addition to the use of Dublin Core for descriptive metadata, the archive provides daily file integrity and format verification and will additionally create and maintain technical and administrative metadata using the widely adopted Metadata Encoding and Transmission Standard (METS) and Preservation Metadata Implementation Strategies (PREMIS) metadata standards. These additional metadata include digital file signatures and checksums for bitwise integrity validation and chain of custody documentation. Primary responsibility for curating and preparing the data for archiving rests on the Data Librarians at the University of New Mexico Libraries.

Additional Possible Data Management Requirements

You are asked to explain in this section how you plan on satisfying any additional, program-specific data management requirements, if any exist. If not you may leave this section blank.

DMP Tool Online

A good resource for developing any data management plan is the DMP Tool Online.

By selecting University of New Mexico from the drop down list on the sign in page, you will be able to access templates created by our Data Librarians, as well links to UNM specific recommendations and resources.

Sample Data Management Plans

ICPSR Guidelines for Effective DMPs

The ICPSR provides guidance and sample language for developing data management plans.