Data Management

Research Data Management Overview

What is Data Management?

Data management is the process of controlling the information generated during a research project. Research projects across disciplines result in data. Data management can ensure the accessibility of data throughout the data’s life cycle.

The Data Life Cycle: An Overview

The data life cycle has eight components:

  1. Plan: description of the data that will be compiled, and how the data will be managed and made accessible throughout its lifetime
  2. Collect: observations are made either by hand or with sensors or other instruments and the data are placed a into digital form
  3. Assure: the quality of the data are assured through checks and inspections
  4. Describe: data are accurately and thoroughly described using the appropriate metadata standards)
  5. Preserve: data are submitted to an appropriate long-term archive (i.e. data center
  6. Discover: potentially useful data are located and obtained, along with the relevant information about the data (metadata)
  7. Integrate: data from disparate sources are combined to form one homogeneous set of data that can be readily analyzed
  8. Analyze: data are analyzed

Source: Dataone (http://dataone.org)

 


Data Management Plans and Planning

Agency Requirements

The Office of Management and Budget (OMB) Circular A-110 describes the administrative requirements for grants and proposals awarded to institutions of higher education, hospitals and other non-profit organizations. In 1999, Circular A-110 was revised to include a data sharing component under the provision of the Freedom of Information Act which requires grantees to provide access to research data funded by the federal government in a timely manner.

Generally, the plan must address how research data is to be described, accessed, shared, re-used and redistributed during the length of the project and beyond. Funding agencies have implemented the OMB requirement in different ways. Below are the policies from the agencies that fund the majority of research at FIU.

Center for Disease Control (CDC)

Department of Defense (DOD)

Department of Energy (DOE)

Environmental Protection Agency (EPA)

National Aeronautics and Space Administration (NASA)

National Oceanic and Atmospheric Administration (NOAA)

National Endowment for the Humanities (NEH)

National Institutes of Health (NIH)

National Science Foundation (NSF)

Additional Guidance on Funder Requirements:

Writing a Data Management Plan

A data management plan includes:

  • Description of Data
  • Data Standards
  • Policies for sharing, accessing, and reusing your data
  • Methods for archiving and preserving your data

A data management plan (DMP) is a maximum of 2 pages long. If you do not include a DMP you must include a statement why one is not needed.

Describing the Data

It is recommended that the description process begin before the collection or creation of data. Data management plans should provide detailed information about the data format, a description of the data collection and analysis plans, and a prediction of the quantity of data to be generated.

Data Standards

Data management plans should indicate the data standards to be utilized for data format, metadata, data collection, etc. Interoperability, discoverability, and accessibility of data are the ultimate goals achieved by following these standards. The library may provide suggestions for best practices. Existing resources, such as the DataONE Best Practices guide, include (but are not limited to) specifications for standards in:

  • File naming conventions
  • Data backup methods
  • Standards in community/discipline terminology for data description (metadata)
  • Standard formats for spatial location and time
  • Extent and resolution of spatial data
  • Methods for organizing data
  • File formats
  • Steps performed in data processing
  • Quality control methods

Strategy for Archiving and Preservation: Short-Term and Long-Term

Short-term and long-term strategies for data storage and preservation are an integral part of data management plans. Information regarding intermediate storage, during the research and data collection phase, should be included. It is also important to note any data transformations that will take place in order to prepare data for long-term preservation and access.

Policies: Access and Reuse

Descriptions of how data will be accessed, access procedures, as well as necessary equipment, software, and expertise should be included in the plans. Policies should also define access timelines, including embargos and/or other access restrictions.

Data Management Plan Tools

DMPTool (https://dmp.cdlib.org) is a service provided by the University of California Curation Center and the California Digital Library. This service provides guidelines as well as resources for writing a Data Management Plan. The services also provide templates tailored to specific funder requirements. Although templates are a good start to creating a data management plan, FIU strongly suggests adapting plans to account for university/center specific requirements, as well as services provided by the university, such as data/metadata storage, preservation, and future use.

DataONE (http://dataone.org) is an environmental science resource that may also be used as a best practice guide for scientific data management in general.

Sample Data Management Plans:

Below you will find DMP examples and templates from various institutions across a wide range of disciplines.

Samples of NSF Data Management Plans (UC San Diego)

Data Management Plan Template (University of Nebraska Lincoln)

DMP Tool

Data Management Plan Examples by Discipline (university of Minnesota)

Odom Institute Data Management Sample Plans (University of North Carolina)

 


Organization, Format, and Description

The organizational structure, format and description of your data can help secondary users find, identify, select, and obtain the data they require. DataOne has an excellent resource to assist you DataONE Best Practices guide

What format should you use?

Formats that maximize share-ability and reusability of your data are recommended. This includes open/non-proprietary formats, commonly used formats in your field, and formats that are not encrypted or compressed. Selecting these types of data format can also help limit the chance of your data becoming obsolete when a proprietary format is no longer supported.

Here are recommended data format lists that may be helpful in your planning:

UK Data Archive: http://www.data-archive.ac.uk/create-manage/format/formats-table

University of Washington Preferred File Formats: http://digital.lib.washington.edu/preferred-formats.html

Describing your Data

Metadata, or the description of your data, is an important element in ensuring share-ability, usability and discoverability of your data. It provides context and information about what the data means, which can assist researchers outside your project in understanding your data better.

Disciplines often times follow their own unique metadata standards. It is recommended that you determine the appropriate metadata schema at the very beginning of your project. The DCC Disciplinary Metadata Tool can be used to identify discipline specific metadata. In addition, the Three Categories of Metadata provides a table summarizing the goals, elements, and sample implementations of the three categories of metadata, presented by Cornell University Library.

 


Sharing & Storage

Why is it important to share your data?

Making your data, along with your research publications*, widely accessible through institutional or subject repositories can increase the visibility and prominence of your research and ensure the continued use of the data in your field. In addition you may share data:

*To learn more about how to share your research publications visit [Stephanie’s libguide here]

Please remember when planning to share your data that you follow all policies and guidelines for privacy and human subject’s data, along with intellectual property rights. Consult with FIU’s Office of Research and Economic Development for details on these policies.

Where to store your data?

There are two types of data storage that occur during the research data life cyle: active storage and archival storage. Active storage is the storage of your raw datasets during the research. FIU researchers typically use resources from the school, college or department for active storage of data.

Archival storage is described as the storage of the final datasets of your research.

FIU Data Storage

FIU Libraries provides support for archival storage of final datasets. The following systems can support research output across disciplines.

dPanther

dPanther is a digital repository system supported by a cloud computing infrastructure with 22 servers, over 220 TB storage space and sufficient redundancy. All original raw data, derivative data products, documentation of data, models, scripts, web visualization applications, reports, publications documents and other project products (along with appropriate metadata) can be served based on the requirements of your projects unique data management plan.

Digital Commons

FIU’s institutional repository Digital Commons provides publishing support for reports, documents, publications, images etc. in compliance with funder requirements.

GeoPortal

A metadata catalog for Geo-spatial, ecological, and other environmental datasets. U.S. Government funded datasets (see also http://www.data.gov/ ) are using Geoportal which support primiarily FGDC and ISO19115 schema for geo-spatial data.

Type of Data and InformationFIU Libraries Systems/ServicesExisting ExamplesData/Metadata Standards & Protocol
Reports, Publications, Documents, Scripts, Models, etc...Digital Commons
 Florida Coastal Everglades Long Term Ecological Research Network: http://digitalcommons.fiu.edu/fce_lter


Disaster Risk Reduction (DRR) http://digitalcommons.fiu.edu/drr/


Sea Level Rise http://digitalcommons.fiu.edu/sea_level_rise/
South Florida Education Research Conference Proceedings: http://digitalcommons.fiu.edu/sferc/
Dublin Core Schema; Metadata Encoding and Transmission Standard (METS); OAI-PMH for metadata harvesting
dPantherSea Level Rise Bibliography hosted by FIU dPanther
Disaster Risk Reducation(DRR) http://dpanther.fiu.edu/dPanther/dpMain/dpCollections_new.html?id=drr&ti...
Global Water for Sustainability http://dpanther.fiu.edu/dPanther/dpMain/dpCollections_new.html?id=glows&...
Geo-spatial DatasetsData and metadata to be hosted in FIU’s Geoportal, and shared @http://www.data.gov/FIU GIS Center has been serving geo-spatial datasets for multiple scientific collaborative projects. See also FIU’s GeoportalESRI ArcGIS data formats, e.g. file Geodatabase; ISO 19115 XML geo-spatial metadata schema;
Geo-data web download applicationWeb interactive downloading to be hosted by FIU GIS Center using its dPanther framework. Metadata to be shared @http://www.data.gov/LIDAR elevation data download for Florida hosted by FIU GIS Center: http://digir.fiu.edu/Lidar/lidarNew.php ISO 19115 XML; Open GIS Consortium (OGC) Web Map Service (WMS)
Web Visualization tools Sea Level Rise Toolbox developed by FIU GIS Center: http://eyesontherise.org/app/ Standards -- Open GIS Consortium (OGC) Web Map Service (WMS)

; Tools -- 
HTML5+CSS3
, Google Maps APIs, 
ArcGIS Server APIs, 
Mootools
Project WebsitesHosted by FIU GIS CenterAn Interactive Web Information Management System (IMS) for USAID funded WAWASH project: http://wawash.fiu.edu/drupal-cms/Drupal CMS; 
Microsoft Active Directory (ALDP)
Google Map APIs
Bootstrap
Jquery
Microsoft .NET

Discipline Specific Data Repositories:

You may choose, or be required by the funding agency, to archive your data in a discipline specific repository.

There are several resources that can assist you in finding discipline specific data repositories for archiving your research outcomes. These include: