Digital Data Management, Curation, and Archiving

National Institutes of Health

Per the NIH Genomic Data Sharing Policy published in 2014, researchers submitting proposals for projects that will generate large-scale human or non-human genomic data must include a genomic data sharing plan. Regarding which of the sections below are recommended or mandatory, the following guidance is provided by the NIH:

[A]ll applicants proposing to generate human or non-human data, elements 1 and 2, a description of the data type and the data repository, should be provided at the time of the application. Applicants proposing to generate human data should also provide information addressing elements 3-5 (data submission and release timeline, IRB assurance of the genomic data sharing plan, and appropriate use of the data, respectively) and, if applicable, element 6 (request for an exemption to the submission) prior to award. Applicants proposing to generate non-human data need also to address element 3 (data submission and release timeline) prior to award.

1. Data Type

Explain whether the research being considered for funding involves human data, non-human data, or both. Denote the type of genomic data that will be shared (e.g., sequence, transcriptomic, epigenomic, and/or gene expression data) and whether it is individual-level data, aggregate-level data, or both. Also list any other information that is anticipated to be shared such as relevant associated data (e.g., phenotype or exposure data) and information necessary to interpret the data (e.g., study protocols, data collection instruments, survey tools).

2. Data Repository

Identify the data repositories to which the data will be submitted, and for human data, whether the data will be available through unrestricted4 or controlled-access. For human genomic data, investigators are expected to register all studies in the database of Genotypes and Phenotypes (dbGaP) by the time data cleaning and quality control measures begin in addition to submitting the data to the relevant NIH-designated data repository (e.g., dbGaP, Gene Expression Omnibus (GEO), Sequence Read Archive (SRA), the Cancer Genomics Hub) after registration.

Non-human data may be made available through any widely used data repository, whether NIH-funded or not, such as GEO, SRA, Trace Archive, Array Express, Mouse Genome Informatics, WormBase, the Zebrafish Model Organism Database, GenBank, European Nucleotide Archive, or DNA Data Bank of Japan.

3. Data Submission and Release Timeline

Provide a timeline for sharing data in a timely manner. The Supplemental Information to the GDS Policy provides expectations for the timelines of data submission and release based on the level of data processing. In general, NIH will release human genomic data no later than six months after the data have been submitted to NIH-designated data repositories and cleaned, or at the time of acceptance of the first publication, whichever occurs first, without restrictions on publication or other dissemination of research findings.

Investigators should make non-human genomic data publicly available no later than the date of initial publication. However, availability before publication may be expected for certain data, projects (e.g., data from projects with broad utility as a resource for the scientific community such as microbial population-based genomic studies), or by the funding NIH IC.

4. IRB Assurance of the Genomic Data Sharing Plan

State whether an Institutional Review Board (IRB) or analogous review body has reviewed the genomic data sharing aspects of your project, or provide a timeline for such review. IRB review of the investigator's proposal for data submission is an element of the Institutional Certification which assures that the proposal for data submission and sharing is appropriate. Please keep in mind that an Institutional Certification is generally required for extramural investigators prior to NIH grant award along with other Just-in-Time information or finalization of a contract. For NIH intramural investigators, an Institutional Certification memorandum should be completed and sent from the SD, or delegate, to the IC Genomic Program Administrator (GPA) before research is begun, whenever possible.

Regarding Institutional Certification, see the NIH Points to Consider for IRBs and Institutions in their Review of Data Submission Plans for Institutional Certifications Under NIH's Policy for Genomic Data Sharing: https://gds.nih.gov/pdf/PTC_for_IRBs_and_Institutions.pdf

5. Appropriate Uses of the Data

The appropriate use of the data should be described. Under the GDS Policy, data is expected to be shared for broad research purposes. If such use of the data is not appropriate, as expressed in informed consent documents of the research participants whose data are included in the dataset, any limitations on the data use should be described in the Institutional Certification. NIH provides standard language8 to guide the development of data use limitations.

6. Request for an Exception to Submission

If submission of human data generated in the study would not be appropriate because the Institutional Certification criteria cannot be met, the investigator should explain why in the genomic data sharing plan and describe an alternative mechanism for data sharing. If the funding IC grants an exception to submission, the research will be registered in dbGaP and the reason for the exception and the alternative sharing plan will be described. For NIH intramural studies, the NIH Deputy Director for Intramural Research will make the final decision on the exception request, after the IC has made its determination.

Regarding Institutional Certification, see the NIH Points to Consider for IRBs and Institutions in their Review of Data Submission Plans for Institutional Certifications Under NIH's Policy for Genomic Data Sharing: https://gds.nih.gov/pdf/PTC_for_IRBs_and_Institutions.pdf

Alfred P. Sloan Foundation

The grant proposal guidlines for projects funded by the Sloan Foundation are available from http://www.sloan.org/apply-for-grants/grant-proposals/, and may vary depending on the type of project and amount of funds requested.

Each set of guidelines, however, requires the submission of an Information Products Appendix which specifically includes research data and requires PIs to address the following:

Description

  • What information products will be created in the course of this project?
  • What format(s) will those information products take? Please list as appropriate.
  • Does the project involve organization or analysis of pre-existing materials, and if so, what are the relevant licensing or sharing arrangements?
  • Is the work subject to any superseding policies (such as a university open access mandate or other organization-wide policies)?

Management

  • What tools, platforms, and processes will be used to manage project assets as they are created and used through the grant period?
  • Who will be responsible for managing project assets during the grant period?

Dissemination

  • What channels will be used to disseminate grant products to target audiences?
  • Under what license(s), in what timeframe, and (if applicable) at what cost will grant products be available?
  • If the project relies on project assets that are proprietary or otherwise not available for wide dissemination, how will the final grant products be reproducible (in the case of research findings) or otherwise accessible for future use?

Archiving and Stewardship

  • How will you ensure the long-term durability of grant products after the funding period ends?
  • How long beyond the grant term will grant products be maintained and by whom?

Department of Education - Institute of Education Sciences

From the Institute of Education Sciences (IES) Policy Statement on Data Sharing in IES Research Centers:

Data sharing provides opportunities for other researchers to review, confirm or challenge study findings, which is an important aspect of the scientific process. In addition, data sharing can enhance scientific inquiry through a variety of other analytic activities, including the use of shared data to: test alternative theories or hypotheses; explore different sets of research questions than those targeted by the original researchers; combine data from multiple sources to provide potential new insights and areas of inquiry; and/or conduct methodological studies to advance education research methods and statistical analyses.

Type(s) and Format(s) of Data to Be Shared

Provide a description of the data to be collected or used. Explain what the contents of each dataset will be, including if known the number and types of files as well as file sizes. Additionally, information about data collection and quality assurance methods should be included here. Consider the following questions:

  • What data will be generated in the research?
  • What data types will you be creating or capturing?
  • How will you capture or create the data?
  • If you will be using existing data, state that fact and include where you got it. What is the relationship between the data you are collecting and the existing data?
  • What data will be preserved and shared?

Procedures for Maintaining Confidential Data

Per the IES, data sharing may not compromise the rights and privacy of human subjects. Because investigators will be expected to share data, consideration should be given to study design and procedures which will facilitate access while protecting the rights of participants and confidentiality of the data. In particular, data use and sharing information should be provided as part of the informed consent process, while data to be shared should be free of identifiers that would allow direct or deductive disclosure of study participants.

Refer to applicable IRB protocols, and consider the following:

  • Have participants consented to data sharing?
  • What are the risks of disclosure? Are there other ethical or privacy issues associated with the data?
  • What steps will be taken to de-identify the data?
  • How may data be aggregated to reduce the risk of disclosure?

Roles and Responsibilities

Explain how the responsibilities regarding the management of your data will be delegated. This should include time allocations, project management of technical aspects, training requirements, and contributions of non-project staff - individuals should be named where possible. Remember that those responsible for long-term decisions about your data will likely be the custodians of the repository/archive you choose to store your data. While the costs associated with your research (and the results of your research) must be specified in the Budget Justification portion of the proposal, you may want to reiterate who will be responsible for funding the management of your data. Consider these questions:

  • Outline the staff/organizational roles and responsibilities for implementing this data management plan.
  • Who will be responsible for data management and for monitoring the data management plan?
  • How will adherence to this data management plan be checked or demonstrated?
  • What process is in place for transferring responsibility for the data?
  • Who will have responsibility over time for decisions about the data once the original personnel are no longer available?

Expected Schedule for Data Sharing

Per the IES:

Timely data sharing is important to the scientific process. IES thus expects that data will be shared no later than when the main findings from the final study dataset are published in a peer-reviewed scholarly publication.

The IES acknowledges that there may be issues associated with providing access to data when the data collected are proprietary (e.g., when a published curriculum is being evaluated). Any restrictions on data sharing, such as a delay of disclosing proprietary data, should be presented in the DMP.

Documentation to Be Provided

Describe the format of your data, and think about what details (metadata) someone else would need to be able to use these files. Describe the structural standards that you will apply in making data and metadata available.

  • Which file formats will you use for your data and why?
  • What naming conventions will you use?
  • How will you organizing your directories?
  • What form will the metadata describing/documenting your data take?
  • How will you create or capture these details?
  • Which metadata standards will you use and why have you chosen them? (e.g. accepted domain-local standards, widespread usage)
  • What contextual details (metadata) are needed to make the data you capture or collect meaningful?

Method of Data Sharing

The template provided by online DMPTool offers the following guidance:

IES acknowledges that there are several methods to share data. These include the investigator taking on the responsibility for data sharing, which may involve making data available to the requestor through a variety of means, including their institutional or personal website Use of a data archive or data enclave. Archives can be particularly attractive for investigators concerned about a large volume of requests, vetting requests, or providing technical assistance for users seeking help with analyses. Researchers can use a data archive or enclave when datasets cannot be distributed to the general public, for example, because of participant confidentiality concerns, third-party licensing, or use agreements that prohibit redistribution. Use of some combination of these methods. A mixed method for data sharing (is allowed) that allows for more than one version of the dataset and provides different levels of access depending on the version. Consider the following: Will you share data via a repository, handle requests directly or use another mechanism?If your method of sharing is with an archive, which archive/repository/database have you identified as a place to deposit data?What procedures does your intended long-term data storage facility have in place for preservation and backup?What is the long-term strategy for maintaining, curating and archiving the data?What metadata/documentation will be submitted alongside the data or created on deposit/ transformation in order to make the data reusable?What related information will be deposited?What costs if any will your selected sharing method charge (In the budget and budget justification sections of the application, include and describe the costs of data sharing)?

https://dmptool.org/guidance

Generally, describe your long term plans for storing your data and making it available. You should include information about:

  • Where will your data be stored.
  • How your data will be saved.
  • What additional metadata is saved with the data beyond that mentioned above.
  • If known, what back up and preservation measures are in place.

Sharing and Reuse Requirements

Specifically, the IES requires researchers to specify whether or not interested parties will be subject to the conditions of a formal data sharing agreement. Keeping in mind the need to balance sharing requiremetns with legal and ethical duties to protect the privacy of study participants as noted above, consider the following:

  • With whom will you share the data?
  • Conditions for access, or other limitations on use.

Any Circumstances that Prevent Data Sharing

Describe any circumstances that prevent data sharing, including statutory confideniality requirements (HIPAA, FERPA, etc.).

Department of Transportation

NOTE: At the request of DOT lawyers, the DMP template provided by the online DMPTool is accompanied by the following disclaimer:

“This tool serves to provide guidance for how to prepare a Data Management Plan (DMP). The output of this tool does not constitute an approved government form. Those preparing DMPs for submission to the U.S. Department of Transportation (USDOT) should use their best judgment in determining what information to include. USDOT has identified five (5) broad areas that should be addressed in a DMP, but is not requiring any specific information to be included in any submitted DMP. USDOT may, at its discretion, establish an Office of Management and Budget-approved information collection. Once approved, the information collection will become a form with a control number, and certain DMP elements may become mandatory.”

With those caveats in mind, researchers are encouraged to refer to the detailed template provided by the online DMPTool.

Gordon and Betty Moore Foundation

The recommended sections and guiding questions are taken from the foundation website's DMP guide. Each question may not apply to a given project, but researchers should answer as completely as possible those which are relevant.

Data Description

  • What data will be collected during this project?
  • How many different data formats are anticipated? Please list formats.
  • When will the data be collected, when will they be entered into electronic databases, and what databases will harbor the data?
  • Does this project involve organization or analysis of pre-existing data, and what are the data sharing arrangements for these data?
  • What are the anticipated data products (e.g., databases, analyses, tools)?
  • What kinds of metadata will be associated with the data?
  • Who is the owner of the data?

Data Management

  • Where (physically) will the data be stored?
  • What type of data access or data distribution mechanism and software will be used?
  • Will the location or software for initial data entry differ from the data archive?
  • How will metadata be stored, and what provisions will be made to enable metadata searching capability?
  • Who will be responsible for entering and maintaining data archives, and over what period of time will archives be maintained?
  • What data quality controls and assurances will be provided?
  • Who will contribute to the database?
  • Will proprietary data be used? If so, describe the permissions obtained to use the data.

Data Sharing

  • Who are the potential data users?
  • What is the appropriate timing for release of data to the public or relevant users, and why?
  • When will archived data be openly available to other users?
  • If data from non-GBMF-supported or previous projects are integral to the successful completion of the Grant Purposes, will the non-GBMF-supported and/or pre-existing data also be made freely available?
  • How will other users (i.e., beyond the grantee and GBMF) access data and metadata?
  • Are the publicly available data in raw form? If not, what treatments have been applied to the data prior to their being released to the public?
  • How long beyond the grant term will the data be maintained and by whom?
  • Does the proposed grant include provisions for future hardware upgrades in the event that data is to be stored and maintained well beyond the project period of the Grant?
  • If data analysis tools are to be created as a consequence of the Grant, will a tutorial be available for training of future users of the data, and if so, how can it be accessed?
  • Will a data sharing agreement be required between outside vendors? If so, a brief description of the agreement needs to be provided in the grant proposal.
  • Is a Creative Commons type license appropriate for sharing the data? Why or why not?
  • How will appropriate attribution to the data provider be provided?
  • Do you anticipate publishing a "Data Release Paper" for referencing and sharing the data?

Institute of Museum and Library Services

The Institute of Museum and Library Services (IMLS) requires a DMP for projects that develop digital content. The requirement was recently revised to include specific recommendations and questions depending on whether a project involves the creation of digital datasets, software tools or electronic systems, and/or collections or databases of new content or metadata. All researchers are required to complete the section covering Copyright and Intellectual Property Rights, along with whichever other additional sections apply.

Copyright and Intellectual Property Rights

This section is mandatory for all projects, and should address the following questions:

  1. What will be the copyright or intellectual property status of the content you intend to create? Will you assign a Creative Commons license to the content? If so, which license will it be? For information about Creative Commons licenses, visit the website at http://us.creativecommons.org/.

  2. What ownership rights will your organization assert over the new digital content, and what conditions will you impose on access and use? Explain any terms of access and conditions of use, why they are justifiable, and how you will notify potential users of the digital resources.

  3. Will you create any content or products which may involve privacy concerns, require obtaining permissions or rights, or raise any cultural sensitivities? If so, please describe the issues and how you plan to address them.

Digital Content

In addition to providing the required information about copyright and intellectual property rights, projects creating new digital content should provide the following information:

Creating New Digital Content

  1. Describe the digital content you will create and the quantities of each type and format you will use.

  2. List the equipment and software that you will use to create the content or the name of the service provider who will perform the work.

  3. List all the digital file formats (e.g., XML, TIFF, MPEG) you plan to create, along with the relevant information on the appropriate quality standards (e.g., resolution, sampling rate, pixel dimensions).

Digital Workflow and Asset Maintenance/Preservation

  1. Describe your quality control plan (i.e., how you will monitor and evaluate your workflow and products).

  2. Describe your plan for preserving and maintaining digital assets during and after the grant period (e.g., storage systems, shared repositories, technical documentation, migration planning, commitment of organizational funding for these purposes). Please note: Storage and publication after the end of the grant period may be an allowable cost.

Metadata

  1. Describe how you will produce metadata (e.g., technical, descriptive, administrative, preservation). Specify which standards you will use for the metadata structure (e.g., MARC, Dublin Core, Encoded Archival Description, PBCore, PREMIS) and metadata content (e.g., thesauri).

  2. Explain your strategy for preserving and maintaining metadata created and/or collected during your project and after the grant period.

  3. Explain what metadata sharing and/or other strategies you will use to facilitate widespread discovery and use of the digital content created during your project (e.g., an Advanced Programming Interface, contributions to the DPLA or other support to allow batch queries and retrieval of metadata).

Access and Use

  1. Describe how you will make the digital content available to the public. Include details such as the delivery strategy (e.g., openly available online, available to specified audiences) and underlying hardware/software platforms and infrastructure (e.g., specific digital repository software or leased services, accessibility via standard web browsers, requirements for special software tools in order to use the content).

  2. Provide URL(s) for any examples of previous digital collections or content your organization has created.

New Software Tools or Applications

In addition to providing the required information about copyright and intellectual property rights, projects creating new software tools or applications should provide the following information:

General Information

  1. Describe the software tool or electronic system you intend to create, including a summary of the major functions it will perform and the intended primary audience(s) the system or tool will serve.

  2. List other existing digital tools that wholly or partially perform the same functions, and explain how the tool or system you will create is different.

Technical Information

  1. List the programming languages, platforms, software, or other applications you will use to create your new digital content.

  2. Describe how the intended software or system will extend or interoperate with other existing software applications or systems.

  3. Describe any underlying additional software or system dependencies necessary to run the new software or system you will create.

  4. Describe the processes you will use for development documentation and for maintaining and updating technical documentation for users of the software or system.

  5. Provide URL(s) for examples of any previous software tools or systems your organization has created.

Access and Use

  1. We expect applicants seeking federal funds for software or system development to develop and release these products as open source software. What ownership rights will your organization assert over the new software or system, and what conditions will you impose on the access and use of this product? Explain any terms of access and conditions of use, why these terms or conditions are justifiable, and how you will notify potential users of the software or system.

  2. Describe how you will make the software or system available to the public and/or its intended users.

Research Data

In addition to providing the required information about copyright and intellectual property rights, projects creating rearch data should provide the following information:

  1. Summarize the intended purpose of the research, the type of data to be collected or generated, the method for collection or generation, the approximate dates or frequency when the data will be generated or collected, and the intended use of the data collected.

  2. Does the proposed research activity require approval by any internal review panel or institutional review board (IRB)? If so, has the proposed research activity already been approved? If not, what is your plan for securing approval?

  3. Will you collect any personally identifiable information (PII) about individuals or proprietary information about organizations? If so, detail the specific steps you will take to protect such information while you prepare the research data files for public release (e.g. data anonymization, suppression of personally identifiable information, synthetic data).

  4. If you will collect additional documentation such as consent agreements along with the data, describe plans for preserving the documentation and ensuring that its relationship to the collected data is maintained.

  5. What will you use to collect or generate the data? Provide details about any technical requirements or dependencies that would be necessary for understanding, retrieving, displaying, or processing the dataset(s).

  6. What documentation will you capture or create along with the dataset(s)? What standards or schema will you use? Where will the documentation be stored, and in what format(s)? How will you permanently associate and manage the documentation with the dataset(s) it describes?

  7. What is the plan for archiving, managing, and disseminating data after the completion of research activity?

  8. Identify where you will be publicly depositing dataset(s): Please list the name of the repository and its URL.

  9. When and how frequently will you review this data management plan? How will the implementation be monitored?

NASA

NASA's Data Management Plan requirements are described in the administration's 2014 Public Access Plan. With limited exceptions including human subjects research, proprietary data, and sensitive or export controlled data, the requirements apply to all NASA employees and recipients of NASA research funds. Requirements as broadly described on the NASA-Funded Research Results webpage include:

  • All proposals or project plans submitted to NASA for scientific research funding will be required to include a DMP. The DMP should describe whether and how data generated through the course of the proposed research will be shared and preserved (including timeframe), or explain why data sharing and/or preservation are not possible or scientifically appropriate. At a minimum, DMPs must describe how data sharing and preservation will enable validation of published results or how such results could be validated if data are not shared or preserved.
  • DMPs must provide a plan for making research data that underlie the results and findings in peer-reviewed publications digitally accessible at the time of publication or within a reasonable time period after publication. This includes data (or how to access data) that are displayed in charts and figures. This does not include preliminary data; laboratory notebooks; drafts of scientific papers, plans for research; peer-review reports; communications with colleagues; or physical objects, such as laboratory specimens. This requirement could be met by including the data as supplementary information to the published article, through NASA archives, or other means. The published article should indicate how these data can be accessed. (http://www.nasa.gov/open/researchaccess/data-mgmt)

The online DMPTool provides a template with additional guidance.

NEH Office of Digital Humanities

Similar to the NSF, the NEH Office of Digital Humanities requires a short DMP, not to exceed two pages, to be submitted as a supplementary document. Current documentation, accessible from the links provided below, notes that DMPs are considered during the peer review of proposals, and that post award reports are expected to include discussion of compliance with the plan.

Roles and Responsibilities

Describe who will own the data. This person should be the one who bears final responsibility for the data and adhering to the data management plan. Also, describe the personnel who will collect and manage the data.

Expected Data

Describe all the types of data you will be collecting. Think broadly and consider potentially including anything you create for this project, which may include spreadsheets, interview transcripts with the associated video, computer code etc. There are a variety of ways to describe data, so try to be brief but descriptive.

Describe where your data will be stored during the project and any backup systems or version control you have in place. Also if you can, describe the amount of data in terms of file numbers and disk space.

If you are using existing data, describe where it came from and any terms or licensing conditions associated with the data. If you are using any private or sensitive data, describe your protocols for handling that here, along with your IRB approval if known.

Period of Retention

Describe how long the data will be preserved. If you are depositing it in the UNM Repository, the data will be kept for a minimum of 10 years.

Data Formats and Dissemination

Data Formats

Describe the format of your data and how it will be "documented". Think about what details (metadata) someone else would need to be able to use these files. For example, you may need a "readme file" to explain variables, structure of the files, etc.

  • What form will the metadata describing/documenting your data take?
  • How will you create or capture these details?
  • Which metadata standards will you use and why have you chosen them? (e.g. accepted domain-local standards, widespread usage)
  • Some common types of information to collect are (far from exclusive list)
    • Instrument settings
    • Variable meanings (data dictionary)
    • Protocols
    • Edit and ownership history of data (Provenance)

There are a large number of metadata standards. You may already know of the best standards in your field, however, there may be additional standards or new standards you are not aware of. It is our job as data librarians to keep up to date and help you choose the best standards. Dissemination

Describe when, and how you will make your data available. How will you be licensing your data, what restrictions will you be placed on use of your data? If you have any ethical, privacy or sensitive information, describe how you resolve them. Describe how you will conform to your IRB protocol if you have one.

Archiving and Preservation

Use this section to describe both short -term and long-term strategy for storing, archiving and preserving your data. During the study, how will you store your data and prevent corruption and loss? Describe any backup systems and versioning you use. If your data is web accessable how will you protect your data from malicious deletion or corruption? At the conclusion of your study, where will your data be stored? Describe, if known, how the data will be curated and maintained.

National Institute of Justice

Since 2014, the NIJ has required funding applicants to submit a 1-2 page Data Archiving Plan with all proposals. In the plan, researchers are asked to demonstrate their recognition that data sets resulting from NIJ funded research must be submitted for archiving (typically to the National Archive of Criminal Justice Data) and to describe how the data will be curated or managed to facilitate replication of results. Per the NIJ's Data Archiving Plans for NIJ Funding Applicant website, the plan must briefly describe:

  • Anticipated manipulations of original, intermediate and final data sets
  • Methods of documentation of such manipulations
  • Preparation of the submission for archiving 

More information is available on the website. A Data Archiving Plan template is also available from the online DMP Tool.

NOAA

Definitions from the NOAA Data Sharing Policy for Grants and Cooperative Agreements, Version 3.0, February 2016:

Environmental Data are recorded and derived observations and measurements of the physical, chemical, biological, geological, and geophysical properties and conditions of the oceans, atmosphere, space environment, sun, and solid earth, as well as correlative data, such as socio-economic data, related documentation, and metadata. Media, including voice recordings and photographs, may be included.

Sharing data refers to making data visible, accessible, and independently understandable to users in a timely manner at minimal cost to users, except where limited by law, regulation, policy or by security requirements. NOAA facilities that archive data and make the data openly available should be considered first for the disposition of the data.

 

USDA-NIFA

Key concepts as defined in the Data Management Plan for NIFA-Funded Research Projects documentation released April, 2015:

  • Data (digital and non-digital) are the ultimate outputs of most research investments from NIFA.
  • Appropriate data management is critical to all research enterprises and helps preserve outcomes and outputs of public and private investment.
  • Access and sharing of digital and non-digital data helps increase the scope and outcomes of scientific discoveries, sometimes beyond the initial boundaries of the research.
  • A DMP should be a core component of a research planning process and should contain adequate information for successful implementation.
  • The type of data that needs to be stored, preserved, and shared depend on the type of research, the scientific discipline, and financial implications.
  • Adequate resources must be available to implement the DMP.

Essentail elements of a USDA NIFA Data Management Plan are described in the sections below.

Expected Data Type

Describe the type of data (e.g. digital, non-digital) and how they will be generated (lab work, field work, surveys, etc.). Are these primary or metadata? Consider these additional questions:

  • What data will be generated in the research?
  • What data types will you be creating or capturing?
  • How will you capture or create the data?
  • If you will be using existing data, state that fact and include where you got it. What is the relationship between the data you are collecting and the existing data?

Data Format

For scientific data to be readily accessible and usable it is critical to use an appropriate community-recognized standard and machine readable formats when they exist. The data should preferentially be stored in recognized public databases appropriate for the type of research conducted. Regardless of the format used (notebook, samples, images, spreadsheet, etc.), that data set should contain enough information to allow independent investigators to understand, validate, and use the data.

Describe the format of your data; think about what details (metadata) someone else would need to be able to use these files. Describe the structural standards that you will apply in making data and metadata available.

  • Which file formats will you use for your data and why?
  • What naming conventions will you use?
  • How will you organizing your directories?
  • What form will the metadata describing/documenting your data take?
  • How will you create or capture these details?
  • Which metadata standards will you use and why have you chosen them? (e.g. accepted domain-local standards, widespread usage)
  • What contextual details (metadata) are needed to make the data you capture or collect meaningful?

Data Storage and Preservation

Scientific data should be stored in a safe environment with adequate measures taken for its long-term preservation. Applicants should describe plans for storing and preserving their data during and after the project and specify the data repositories, if they exist. They should outline strategies, tools, and contingency plans that will be used to avoid data loss, degradation, or damage. Topics to consider include:

  • Where will your data be stored?
  • How long will data be kept beyond the life of the project?
  • What additional metadata is saved with the data beyond that mentioned above?
  • What transformations will be necessary to prepare data for preservation?
  • If known, what back up and preservation measures are in place.

Data Sharing and Public Access

Describe your data access and sharing procedures during and after the grant. Provide any restrictions such as copyright, confidentiality, patent, appropriate credit, disclaimers, or conditions for use of the data by other parties.

You should describe:

  • When your data will be available
  • How you will make your data available
  • Email request (avoid this, as research has shown only 20% of data that says its available this way is actually retrievable)
  • Domain Specific Repository (ie. Genbank)
  • Institutional Repository
  • Journal
  • Embargo Periods
  • A reasonable delay to publish is acceptable
  • Delay for patents, although provisional patents can be obtained through the STC at UNM
  • Who will hold the intellectual property rights on the data
  • Any issues with private, sensitive or secret data

Roles and Responsibilities

Who will ensure DMP implementation? This is particularly important for multi-investigator and multi-institutional projects. Provide a contingency plan in case key personnel leave the project. Also, what resources will be needed for the DMP? If funds are needed, have they been added to the budget request and budget narrative? Projects must budget sufficient resources to develop and implement the proposed DMP.

Consider the following:

  • Outline the staff/organizational roles and responsibilities for implementing this data management plan.
  • Who will be responsible for data management and for monitoring the data management plan?
  • How will adherence to this data management plan be checked or demonstrated?
  • What process is in place for transferring responsibility for the data?
  • Who will have responsibility over time for decisions about the data once the original personnel are no longer available?

Monitoring and Reporting

Successful projects should monitor the implementation of the DMP throughout the life of the project and after, as appropriate. Implementation of the DMP should be a component of annual and final reports to NIFA (REEport) and include progress in data sharing (publications, database, software, etc.). The final report should also describe the data that was produced during the award period and the components that will be stored and preserved (including the expected duration) after the award ends.

US Department of Energy

Per the Department of Energy's Statment on Digital Data Management, proposals submitted after October 1, 2014 will be required to include a Data Management Plan that addresses the following requirements:

  • DMPs should describe whether and how data generated in the course of the proposed research will be shared and preserved. If the plan is not to share and/or preserve certain data, then the plan must explain the basis of the decision (for example, cost/benefit considerations, other parameters of feasibility, scientific appropriateness, or limitations discussed in #4). At a minimum, DMPs must describe how data sharing and preservation will enable validation of results, or how results could be validated if data are not shared or preserved.
  • DMPs should provide a plan for making all research data displayed in publications resulting from the proposed research open, machine-readable, and digitally accessible to the public at the time of publication. This includes data that are displayed in charts, figures, images, etc. In addition, the underlying digital research data used to generate the displayed data should be made as accessible as possible to the public in accordance with the principles stated above. This requirement could be met by including the data as supplementary information to the published article, or through other means. The published article should indicate how these data can be accessed.
  • DMPs should consult and reference available information about data management resources to be used in the course of the proposed research. In particular, DMPs that explicitly or implicitly commit data management resources at a facility beyond what is conventionally made available to approved users should be accompanied by written approval from that facility. In determining the resources available for data management at Office of Science User Facilities, researchers should consult the published description of data management resources and practices at that facility and reference it in the DMP. Information about other Office of Science facilities can be found in the additional guidance from the sponsoring program.
  • DMPs must protect confidentiality, personal privacy, Personally Identifiable Information, and U.S. national, homeland, and economic security; recognize proprietary interests, business confidential information, and intellectual property rights; avoid significant negative impact on innovation, and U.S. competitiveness; and otherwise be consistent with all applicable laws, regulations, and DOE orders and policies. There is no requirement to share proprietary data.

From the DOE Statement on Digital Data Management

The DOE's suggested elements of a data management plan are provided below.

 

Data Types and Source

From the DOE Suggested Elements for a Data Management Plan:

A brief, high-level description of the data to be generated or used throughout the course of the proposed research and which of these are considered digital research data necessary to validate the reseach findings.

In describing data types, DMPs may include short descriptions of the kinds of information to be captured, the sources of original or derived data, and file formats to be used. Data types may include text, tabular data, images and multimedia, software and 3D models, reports, surveys, etc. Where known, this section may also include estimates of the number of files or measurements to be captured or the anticipated size of the data set in bytes.

Content and Format

From the DOE Suggested Elements for a Data Management Plan:

A statement of plans for data and metadata content and format including, where applicable, a description of documentation plans, annotation of relevant software, and the rationale for the selection of appropriate standards. (Existing, accepted community standards should be used where possible. Where community standards are missing or inadequate, the DMP could propose alternate strategies that facilitate sharing, and should advise the sponsoring program of any need to develop or generalize standards.)

In describing the content and format of their data, PI's should consider the type and level of detail that someone external to the project would require to use the data or, importantly, to replicate the results of the research. In addition, DMPs should include some detail about how the documentation will be maintained within the project - for example, variable definitions may be recorded within a 'readme' file or as commented code.

Regarding metadata standards, a large number of widely adopted standards exist and can be implemented to suit general purpose or domain specific documentation needs. While researchers may already be aware of and conversant in the standards within their field, Research Data Services librarians are always available to assist with the selection, description and implementation of appropriate standards.

Sharing and Preservation

Per the DOE Suggested Elements for a Data Management Plan, the section on sharing and preservation should include, when appropriate:

  • The anticipated means for sharing and the rationale for any restrictions on who may access the data and under what conditions;
  • A timeline for sharing and preservation that addresses both the minimum length of time the data will be available and any anticipated delay to data access after research findings are published;
  • Any special requirements for data sharing, for example, proprietary software needed to access or interpret data, applicable policies, provisions, and licenses for re-use and re-distribution, and for the production of derivatives, including guidance for how data and data products should be cited;
  • Any resources and capabilities (equipment, connections, systems, software, expertise, etc.) requested in the research proposal that are needed to meet the stated goals for sharing and preservation. (This could reference the relevant section of the associated research proposal and budget request);
  • Cost/benefit considerations to support whether/where the data will be preserved after direct project funding ends and any plans for the transfer of responsibilities for sharing and preservation;
  • Whether, when, or under what conditions the management responsibility for the research data will be transferred to a third party (e.g. institutional, or community repository);
  • Any other future decision points regarding the management of the research data including plans to reevaluate the costs and benefits of data sharing and preservation.

In developing this section of a DMP, it is recommended that researchers strive to be as open as possible and to make the products of research accessible to the public via efficient and low cost means. Especially as the DOE data management policy is a direct response to the 2013 Office of Science and Technology Policy memo, Increasing Access to the Results of Federally Funded Research, proposals and DMPs which limit access to research products may receive poor reviews.

In describing long terms plans for data archiving and preservation, PI's should include information about:

  • Where the data will be stored.
  • How the data will be saved.
  • What additional metadata is saved with the data beyond that mentioned above.
  • If known, what back up and preservation measures are in place.

Protection

From the DOE Suggested Elements for a Data Management Plan:

A statement of plans, where appropriate and necessary, to protect confidentiality, personal privacy, Personally Identifiable Information, and U.S. national, homeland, and economic security; recognize proprietary interests, business confidential information, and intellectual property rights; and avoid significant negative impact on innovation, and U.S. competitiveness.

Sometimes data will contain private information about people, information deemed to be secret by the government or references to ecologically, culturally or otherwise senstive places. You probably already know if you data does. If you have an IRB approval, include that here.

If you do have any privacy or sensitive data issues, describe how you secure your data. For instance, using password protected and encrypted hard drives. Also describe how you will anonomyze, obscure or remove this information when you make your data public.

Rationale

A discussion of the rationale or justification for the proposed data management plan including, for example, the potential impact of the data within the immediate field and in other fields, and any broader societal impact.

US Geological Survey

The United States Geological Survey provides comprehensive data management planning guidance covering the full spectrum of the research data lifecycle. Accordingly, data management plans which fully address the concepts and issues defined by the USGS will be more comprehensive than the two-page, high level overviews requested by other agencies and sponsors. More detailed information and sample plans are available from USGS Data Management.

The template and suggestions provided below are taken from the USGS DMP template provided by the online https://dmptool.org/DMPTool.

Project and Contact Information

The USGS DMP template generally follows the USGS Science Data Lifecycle Model, a high level view of how data relates to project workflows from data planning to preservation and publishing. This template is not prescriptive but meant as guidance for individuals and Centers/Programs who want to create their own Data Management Plans.

Consider these topics and questions describing very basic information about the project and the appropriate contacts:

  • What is the name of the project? Include any identifiers related to the project (e.g. Project ID, Funding ID etc).
  • What is the name of the Center/Program and Branch that oversees the project?.
  • Summary description of the project and reason why the data is being collected.
  • What is the project start and expected end dates for the project?.
  • Include any web links with more information related to the project, if applicable.
  • Who is the main point of contact for the project and its data? List an alternate point of contact, if any. (Include name, title, e-mail address, agency/organization, phone, and mailing address, as appropriate.).
  • If there are collaborating/funding agencies and organizations, who are they and who are the main points of contact?

Plan and Acquire

Plan and Acquire elements of the USGS Science Data Lifecycle: Plan refers to planning considerations before the handling of the project's data assets. Acquire describes the activities related to new or existing data that are collected or generated.

Consider these topics and questions:

  • How will the data be acquired (newly collected or using existing datasets)?
  • If acquiring existing datasets (e.g. NLCD) include the name, format, a persistent identifier, and source citation, if any.
  • Are there any restrictions or agreements such as Memorandum of Understandings (MOUs) for use and storage?
  • If collecting new data, are there special processes or procedures for collecting the data (e.g. licenses, permissions, equipment, software)?
  • What is the estimated volume of the data collected, transformed, and/or generated? e.g. megabyte (MB), GB, TB, or PB.
  • Will the data be static or is there a possibility that new data will continue to be added?
  • Are the appropriate hardware, software, and staff resources part of the budget for data management activities?

Describe and Manage Quality

Describe and Manage Quality elements of the USGS Science Data Lifecycle: Describe emphasizes documentation of every stage of the lifecycle to ensure the data assets and methods can be understood, evaluated for validity, and potentially reused. Manage Quality includes considerations for quality assurance and quality control (QA/QC) measures.

Consider these topics and questions:

  • How many new datasets will be created? List the anticipated title of each dataset.
  • What are the data types and formats, in which the data will be maintained?
  • Briefly describe the data processing steps or provide the scientific workflow and identify any software or technology needs where applicable.
  • How will the metadata for each dataset be created? Who will be responsible for the metadata creation and update?
  • Which metadata standard will be used to describe each dataset (e.g. FGDC-CSDGM, ISO 19115 series, or other as appropriate)?
  • What procedures will be used for ensuring data quality (QA/QC)? If using a known standard or protocol, include the citation source.

Backup, Secure and Preserve

Backup/Secure and Preserve elements of the USGS Science Data Lifecycle: Backup/Secure involves managing risks and accessibility to the data throughout the lifecycle. Preserve highlights important activities that should be taken to ensure long-term preservation of data, metadata, ancillary products, and additional documentation.

Consider these topics and questions:

  • Where will the data be stored in the short-term? Is it properly secured and environmentally controlled?
  • What will be the approach for routine backup of the data (frequency, duration, software, media)? Will the data be stored in multiple places and on different media types (recommended minimum of 3 copies with 1 stored in an offsite location)?
  • Describe any potential access restrictions such as the data contain Personally Identifiable Information (PII) and any practices to ensure access will be restricted.
  • What will be the final format of the data product and will there be any software needs? Will the data format be appropriate for long-term preservation?
  • Where will the data and metadata be preserved in the long-term and by which sponsoring Program (if in collaboration)? Who will be the point of contact?
  • If costs are associated with long-term storage, how will they be provided for?

Publish and Share

Publish and Share elements of the USGS Science Data Lifecycle: Publish and Share highlight important considerations related to traditional peer-reviewed publications and dissemination of the data through Web sites, data catalogs, social media and other outlets.

Consider these topics and questions:

  • How will the data itself be shared and made available to the public (e.g. web page, tool or application, data portal, repository, USGS Data Series)? Are there data release policies that need to be followed?
  • Will there be access or use restrictions on the data (e.g. sensitive data, restricted data, privacy, software with license restrictions, etc.)? Provide justification for the restriction citing any policies or legal reasons.
  • How can someone overcome these restrictions (e.g. fees, non-disclosure statements, special authorization, data embargo or hold, MOUs/MOAs)?
  • Identify any anticipated publications or electronic outlets (e.g. peer-reviewed articles, information/fact sheets, web pages) resulting from the data. If a USGS publication, indicate type (e.g. Open File Report, Provisional Release etc).
  • Where will your metadata be stored to provide an access point for discovery by users and harvest by catalogs such as the USGS Science Data Catalog?
  • How and where will you obtain a persistent identifier for the data (e.g. digital object identifier)?

DMP Tool Online

A good resource for developing any data management plan is the DMP Tool Online.

By selecting University of New Mexico from the drop down list on the sign in page, you will be able to access templates created by our Data Librarians, as well links to UNM specific recommendations and resources.

Sample Data Management Plans