top banner image



Beginning January 18, 2011, NSF grant proposals must include a supplementary document of no more than two pages titled, "Data Management Plan." Details
for this NSF policy are available here.

Other funders, including NIH, have also begun to require data management plans for grant proposals.

For more information on the context of this new NSF requirement, see the very comprehensive "Unpacking the NSF Requirement" from the Association for Research Libraries (ARL).

Raynor Memorial Libraries have put together a set of resources to help you understand, plan for, and implement data management plans for your research.

 


Data management plans should describe how research results and data created in the course of grant-funded research will be managed, disseminated, and shared. The data management plan is required (your proposal will not be reviewed if a plan is not included, or you do not make a clear case for why a plan is not necessary) and, like the rest of your proposal, is subject to peer-review.

In addition to the general requirements for data management plans set forth by NSF, some directorates and offices also include their own specific requirements.

Since the NSF's announcement, many libraries and data centers have drafted guides to help researchers write and implement their data management plan. The Libraries have put together an annotated "guide to the guides" in an effort to help you locate the most relevant advice for your own research needs. This list will be continually updated.

ICPSR: Data Management Plan; resources and examples

While ICPSR is a social science organization, the data management framework they have developed is valuable across disciplines. The framework describes what ICPSR has determined to be the key elements of a good data management plan, the relative importance of each element, and the rationale for including this information in your plan, along with examples.

Data Conservancy

The Data Conservancy, an initiative based at Johns Hopkins University and aimed at developing "data curation infrastructure for cross disciplinary discovery of observational data" has developed a Questionnaire(Word doc) for NSF data management plans that they describe as including, "common elements across NSF directorates." The Data Conservancy adds this caveat: "It is not intended to address the elements or requirements of a particular directorate which may identify additional conditions."

UCSD, Research Cyberinfrastructure: Example Data Management Plans

Several researchers at UC San Diego have agreed to make their actual proposed data management plans openly available as examples. As of March 2011, examples include proposals in Engineering, Geoscience, Cyberinfrastructure, Integrative Activities, and two proposals that cross multiple directorates and offices.

Documenting your data

An important step toward making your data useful both to you and other researchers is to develop a framework for documenting and describing your data and the context in which it was created. The Pennsylvania State University Libraries suggest that data documentation might include the following:

  • names, labels and descriptions for variables, records and their values
  • explanation of codes and classification schemes used
  • codes of, and reasons for, missing values
  • derived data created after collection, with code, algorithm or command file used to create them
  • weighting and grossing variables created and how they should be use
  • data listing with descriptions for cases, individuals or items studied, for example for logging qualitative interviews

(From "Data Management Planning at Penn State Libraries: Documenting Data")

Describing your data with Metadata

Metadata is the data used to describe your data. This makes it easier to store and locate your data, and makes it much easier for future researchers to use your data. A number of metadata schemas exist to help you organize and structure your data description. Metadata schemas can be viewed at the JISC Digital Media website.

The Libraries at MIT have put together a guide to the most basic elements to document, regardless of discipline. These include Title, Creator, Identifier, Subject, Funders, Rights, Access information, Language, Dates, Location, Methodology, Data processing, Sources, List of file names, File Formats, File structure, Variable list, Code lists, Versions, and Checksums. For more detail, see the MIT Libraries' guide to metadata for data management.

For a more comprehensive overview of metadata in general: NISO distinguishes between three types of metadata: descriptive, structural and administrative. Descriptive metadata is the information used to search and locate an object such as title, author, subjects, keywords, publisher; structural metadata gives a description of how the components of the object are organized; and administrative metadata refers to the technical information including file type. Two sub-types of administrative metadata are rights management metadata and preservation metadata.

Source: NISO. Understanding Metadata. NISO Press. ISBN 1-880124-62-9. Accessed: 6 March 2011.

The methodology you choose for managing your data will vary depending on the collection method, nature of the data, and the types of analyses to be applied. Some more common methods for managing data are databases, spreadsheets, data management tools, and standard file systems. The lists below summarize the benefits of each approach and provide links to further resources on the Web.

  • Databases and Spreadsheets: These common tools are relatively simple to set up, with spreadsheets being the simplest in most cases. Both offer advantages in managing your data - databases are especially useful for setting up complex relationships and generating queries and reports based on your data and the relationships between your data; spreadsheets are especially helpful for storing numeric and text data. The University of Wisconsin Libraries provide helpful information for both: Databases   |  Spreadsheets
  • e-Science tools such as myGrid
    • Designed specifically for research and collaboration.
    • Streamlined for data intensive analysis and scalability
    • Increasingly being adopted by research organizations.
    • http://www.mygrid.org.uk/mygrid-in-use/
  • File Systems such as your own Local Area Network (LAN)
    • Ubiquitous, easy to use, and relatively simple.
    • Easily configurable in terms of access control and sharing.
    • Lend themselves to the structured, hierarchical storage of large numbers of files.
    • Highly portable.

For more information on the basic organization of your data files, see the MIT Libraries Guide to "Organizing Files"

Storing and backing up your data

Where data is stored and backed up may depend on funding considerations, collection processes, the need for encryption or increased security, and available resources. Data storage locations may include one or all of the following options: an internal or external hard drive on a personal computer, a departmental or university server, an institutional repository such as e-Publications@Marquette, or cloud storage such as Amazon S3. Subject archives and data repositories, such as Genbank, may also be an option, depending on your discipline, the nature of your data, funding guidelines, and other issues. See the "Sharing Data" tab for more information on external data repositories.

Securing your data

Know the implications of working with confidential, sensitive, or proprietary data. Restrictions upon the ownership or sharing of student, patient, or other personal data may be governed by federal HIPPA, or FERPA guidelines. Marquette's Office of Research Compliance can help researchers working with sensitive data.

Statements on Data Sharing | Considerations on Data Sharing | How to Share Data | Data Repositories

NSF guidelines require grantees to detail how they will disseminate and share their research results: "Investigators are expected to share with other researchers, at no more than incremental cost and within a reasonable time, the primary data, samples, physical collections and other supporting materials created or gathered in the course of work under NSF grants." NSF Award and Administration Guide, Chapter VI.D.4

Why is sharing data important?

  • To fulfill grant funding (e.g., NIH, NSF) and/or journal requirements (see above).
  • To raise visibility and interest in research and publications.
  • To add value to research.
  • To accelerate discovery rates.
  • To frame research as a public good, promoting community and collaboration.

Data sharing is essential for expedited translation of research results into knowledge, products and procedures to improve human health

From NIH Data Sharing Policy, 2003

Many other academic and governmental groups have released their own statements on the importance of data sharing:
  • Open Knowledge Definition (OKD): A volunteer group of scientists, researchers, and others provide a guide to the concept of open data.

  • Panton Principles: Drafted by a group of faculty at Cambridge and refined by the Open Knowledge Foundation Working Group on Open Data in Science, the Panton Principles lay out rationale and criteria for making science data openly available, particularly policies on reuse.

  • Open Data.gov aims to bring together discussion on policy, recommendations for best practices, and other issues relevant to making government-created-and/or-funded data open and accessible.

  • CODATA Scientific Data Policy Statements compendium: A compilation of statements that express the policies of a number of organizations on data issues. Most are related to the environmental sciences.

Considerations about sharing data:

Some data may not be shared, based on policies from funding agencies or other relevant bodies. One example is the HIPPA (Health Insurance Portability and Accountability Act) Privacy Rule, which protects all "individually identifiable health information" derived from health care records and requires specification of data handling responsibilities. Marquette's Office of Research Compliance can help researchers working with sensitive data.

Some issues you may want to consider:

  • Do your data contain confidential or private information? If so, then they may require redaction or anonymization before they can be made public. If the data is anonymized, can individuals be reidentified?
  • Are your datasets understandable to those who wish to use them? Supplementary materials for describing, deciphering, and contextualizing data should be made available. Consider including metadata, methodology descriptions, codebooks, data dictionaries, and other descriptive material to facilitate their use. (See the "Documenting Data" tab.)
  • Do your datasets comply with the standards in your field regarding description, format, metadata, and sharing?
  • What reuse policies do you wish for your data? Consider the Panton Principles carefully before you attach reuse restrictions.

How to Share your Data

Consider the following options:

  • Publish your data as supplementary material or a "data publication" in a journal. Check with individual journals about their data policies.
  • Deposit your data in e-Publications@Marquette, Marquette's institutional repository. Contact Rose Fortier, Digital Programs Librarian, to discuss what data sets might be suitable for e-Pubs
  • Deposit it in a disciplinary data repository if one exists for your research area. See list below for examples:

External Data Repositories

The Distributed Data Curation Center (D2C2) at Purdue University Libraries has put together Databib a listing of data repositories. These are repositories where researchers may be able to deposit and share their research data. The list is both browse-able and searchable.