Skip to Main Content
It looks like you're using Internet Explorer 11 or older. This website works best with modern browsers such as the latest versions of Chrome, Firefox, Safari, and Edge. If you continue with this browser, you may see unexpected results.
 

Information Literacy: 21. Data Management

Strategies used to incorporate research skills for the College of Arts and Sciences and the Graduate School of Management.

Data Management

Data management is the practice of collecting, organizing, and accessing data to support productivity, efficiency, and decision-making. Many researchers create and follow a Data Management Plan (DMP) which helps determine how the data is collected, documented, organized, stored, accessed, and disseminated. It is a controlled progression of how data is curated within and between each lifecycle stage of the data, from creation to completion of the project. 

Fig 1. General Lifecycle of Data

The importance of data management grows as the amount of data produced and consumed continues to climb each year. Below are some reasons why data management is important.

Data management will...

  • Establish a path for researchers to follow
  • Provide structure and organization for data in ways that are appropriate to fields of study
  • Improve workflow
  • Refine the quality of curated data and metadata
  • Establish safely backup plans to prevent data loss
  • Clearly state the responsibilities & expectations of individuals working together on a project
  • Help meet grant funder requirements for data sharing
  • Improve scholarly communication through data sharing, reuse, and verification

Also, information becomes lost over time unless it is intentionally captured during the research process. By creating and following a data management plan, researchers walk through the process of considering different aspects about their data that they might not have considered. The following expandable diagram shows how information can be lost over time through information entropy.

Fig 2. Information Entropy

A data management plan (DMP) helps researchers work with, manage, share and archive their data effectively. A DMP typically describes what data will be produced; how each type of data will be organized, documented, standardized, stored, protected, archived, and shared; who will take responsibility for the various activities; and when these activities will take place over the course of the project.

When developing a DMP, the following are some questions to consider:

  • What data will be generated in this project?
  • What data will I be responsible for?
  • How will my data need to connect to other data sets?
  • What formats will be used? (e.g. Excel, Google Docs, MySQL, JPG)
  • How will the data be accessed?
  • What info about the data is needed for others to understand?
  • Where should the data be stored?
  • How will data be backed up in case of disaster? (e.g. fire, flood)
  • Who has access to the data?
  • How should the data be organized or structured?
  • What file naming convention will be used?
  • Will the data be published or archived at the end of the project?

The California Digital Library has created a free, open-source resource called DMPTools to guide researchers through the process of creating a data management plan, whether as part of a funding requirement or for their own benefit. The brief video above describes the DMPtool.

DMPTool is an excellent place to start. This comprehensive, easy-to-use, open-source tool guides you through a developing a data management plan tailored to your needs that you can easily export. You'll create an ID and password, so your plan will stay private. Step by step instructions let you target your plan for a particular granting body. You can also browse an index page with sample plans, templates, and guidance.

This Primer on Data Management [PDF] from DataOne covers the basics of data management, best practices, and how to consider data management in the larger scheme of your research.

If you have questions about creating a data management plan, please contact, John Repplinger (jrepplin@willamette.edu).

https://libapps.s3.amazonaws.com/accounts/4356/images/best-practices-data-management.png

Sample Data Management Plans

Many online resources exist with sample data management plans, some of which are from actual successful proposals. Take a look at the examples below to get a better understanding of what a DMP should include. 

General Concepts for Data Management

Best Practice Tips for Data Management

  • Use a Data Management Plan. Creating a plan will help make your research more efficient for yourself and others, be more protected from data loss, more accessible, and more.
  • Keep three copies of your data: Working copy (e.g. Laptop drive), local external storage (e.g. USB drive or external drive), and offsite external storage (e.g. network or cloud server). By neglecting this aspect, you put your data and possibly entire research at risk. Ask yourself what would it take to have to recreate any or all of your research data. 
  • Consider data backup frequency. Portions of data may need more frequent backups than others. 
  • Check data periodically. Make sure it is all still there, file size, number of files, able to open the data. 
  • Consider any privacy concerns if using public servers.
  • Use a file-naming structure. It could include: grant name, experiment name or acronym, researcher's name or initials, date of an interview, status of file (e.g. draft or final), conditions, version number of a file, changes made to a file. 
  • Document your file name structure in a shared document.
  • Avoid generic file names that may conflict if moved to a new location.
  • Avoid special characters in file names.
  • Use leading zeros in file names if using multi-digit versions.
  • Avoid letting folders get too big or files will get lost.
  • Avoid letting the folder structure get too deep.
  • Consider using open file formats instead of proprietary. Not everyone has access to Microsoft Word, Excel, Quicktime, or JPG files. 
  • Consider how long should data be kept. Retention policies may be set by institutions or funding agency, but corroborate research may need different type of access to data to promote reuse or support open data initiatives, or support intellectual property rights. 
  • Anticipate future use your data by others in the future.