Research Data Management Overview
What is Data Management?
Data management is the process of controlling the information generated during a research project. Research projects across disciplines result in data. Data management can ensure the accessibility of data throughout the data’s life cycle.
The Data Life Cycle: An Overview
The data life cycle has eight components:
- Plan: description of the data that will be compiled, and how the data will be managed and made accessible throughout its lifetime
- Collect: observations are made either by hand or with sensors or other instruments and the data are placed a into digital form
- Assure: the quality of the data are assured through checks and inspections
- Describe: data are accurately and thoroughly described using the appropriate metadata standards)
- Preserve: data are submitted to an appropriate long-term archive (i.e. data center
- Discover: potentially useful data are located and obtained, along with the relevant information about the data (metadata)
- Integrate: data from disparate sources are combined to form one homogeneous set of data that can be readily analyzed
- Analyze: data are analyzed
Source: Dataone (http://dataone.org)
Data Management Plans and Planning
The Office of Management and Budget (OMB) Circular A-110 describes the administrative requirements for grants and proposals awarded to institutions of higher education, hospitals and other non-profit organizations. In 1999, Circular A-110 was revised to include a data sharing component under the provision of the Freedom of Information Act which requires grantees to provide access to research data funded by the federal government in a timely manner.
Generally, the plan must address how research data is to be described, accessed, shared, re-used and redistributed during the length of the project and beyond. Funding agencies have implemented the OMB requirement in different ways. Below are the policies from the agencies that fund the majority of research at FIU.
Additional Guidance on Funder Requirements:
- DMPTool – List of funder requirements and sample plans https://dmptool.org/guidance
- University of Minnesota Libraries Funding Agency and Data Management Guidelines
- ROARMap (Registry of Open Access Repository Mandates and Policeis) http://roarmap.eprints.org/
Writing a Data Management Plan
A data management plan includes:
- Description of Data
- Data Standards
- Policies for sharing, accessing, and reusing your data
- Methods for archiving and preserving your data
A data management plan (DMP) is a maximum of 2 pages long. If you do not include a DMP you must include a statement why one is not needed.
Describing the Data
It is recommended that the description process begin before the collection or creation of data. Data management plans should provide detailed information about the data format, a description of the data collection and analysis plans, and a prediction of the quantity of data to be generated.
Data management plans should indicate the data standards to be utilized for data format, metadata, data collection, etc. Interoperability, discoverability, and accessibility of data are the ultimate goals achieved by following these standards. The library may provide suggestions for best practices. Existing resources, such as the DataONE Best Practices guide, include (but are not limited to) specifications for standards in:
- File naming conventions
- Data backup methods
- Standards in community/discipline terminology for data description (metadata)
- Standard formats for spatial location and time
- Extent and resolution of spatial data
- Methods for organizing data
- File formats
- Steps performed in data processing
- Quality control methods
Strategy for Archiving and Preservation: Short-Term and Long-Term
Short-term and long-term strategies for data storage and preservation are an integral part of data management plans. Information regarding intermediate storage, during the research and data collection phase, should be included. It is also important to note any data transformations that will take place in order to prepare data for long-term preservation and access.
Policies: Access and Reuse
Descriptions of how data will be accessed, access procedures, as well as necessary equipment, software, and expertise should be included in the plans. Policies should also define access timelines, including embargos and/or other access restrictions.
Data Management Plan Tools
DMPTool (https://dmp.cdlib.org) is a service provided by the University of California Curation Center and the California Digital Library. This service provides guidelines as well as resources for writing a Data Management Plan. The services also provide templates tailored to specific funder requirements. Although templates are a good start to creating a data management plan, FIU strongly suggests adapting plans to account for university/center specific requirements, as well as services provided by the university, such as data/metadata storage, preservation, and future use.
DataONE (http://dataone.org) is an environmental science resource that may also be used as a best practice guide for scientific data management in general.
Sample Data Management Plans:
Below you will find DMP examples and templates from various institutions across a wide range of disciplines.
Samples of NSF Data Management Plans (UC San Diego)
Data Management Plan Template (University of Nebraska Lincoln)
Data Management Plan Examples by Discipline (university of Minnesota)
Odom Institute Data Management Sample Plans (University of North Carolina)
Organization, Format, and Description
The organizational structure, format and description of your data can help secondary users find, identify, select, and obtain the data they require. DataOne has an excellent resource to assist you DataONE Best Practices guide
What format should you use?
Formats that maximize share-ability and reusability of your data are recommended. This includes open/non-proprietary formats, commonly used formats in your field, and formats that are not encrypted or compressed. Selecting these types of data format can also help limit the chance of your data becoming obsolete when a proprietary format is no longer supported.
Here are recommended data format lists that may be helpful in your planning:
UK Data Archive: http://www.data-archive.ac.uk/create-manage/format/formats-table
University of Washington Preferred File Formats: http://digital.lib.washington.edu/preferred-formats.html
Describing your Data
Metadata, or the description of your data, is an important element in ensuring share-ability, usability and discoverability of your data. It provides context and information about what the data means, which can assist researchers outside your project in understanding your data better.
Disciplines often times follow their own unique metadata standards. It is recommended that you determine the appropriate metadata schema at the very beginning of your project. The DCC Disciplinary Metadata Tool can be used to identify discipline specific metadata. In addition, the Three Categories of Metadata provides a table summarizing the goals, elements, and sample implementations of the three categories of metadata, presented by Cornell University Library.
Sharing & Storage
Why is it important to share your data?
Making your data, along with your research publications*, widely accessible through institutional or subject repositories can increase the visibility and prominence of your research and ensure the continued use of the data in your field. In addition you may share data:
- To fulfill funder requirements (OSTP Memo “Increasing Access to the Results of Federally Funded Scientific Research)
- Some journals and societies require data archiving, for example Nature
- Sharing detailed research data is associated with increased citation rate: (PLoS One)
*To learn more about how to share your research publications visit [Stephanie’s libguide here]
Please remember when planning to share your data that you follow all policies and guidelines for privacy and human subject’s data, along with intellectual property rights. Consult with FIU’s Office of Research and Economic Development for details on these policies.
Where to store your data?
There are two types of data storage that occur during the research data life cyle: active storage and archival storage. Active storage is the storage of your raw datasets during the research. FIU researchers typically use resources from the school, college or department for active storage of data.
Archival storage is described as the storage of the final datasets of your research.
FIU Data Storage
FIU Libraries provides support for archival storage of final datasets. The following systems can support research output across disciplines.
dPanther is a digital repository system supported by a cloud computing infrastructure with 22 servers, over 220 TB storage space and sufficient redundancy. All original raw data, derivative data products, documentation of data, models, scripts, web visualization applications, reports, publications documents and other project products (along with appropriate metadata) can be served based on the requirements of your projects unique data management plan.
FIU’s institutional repository Digital Commons provides publishing support for reports, documents, publications, images etc. in compliance with funder requirements.
A metadata catalog for Geo-spatial, ecological, and other environmental datasets. U.S. Government funded datasets (see also http://www.data.gov/ ) are using Geoportal which support primiarily FGDC and ISO19115 schema for geo-spatial data.
|Type of Data and Information||FIU Libraries Systems/Services||Existing Examples||Data/Metadata Standards & Protocol|
|Reports, Publications, Documents, Scripts, Models, etc...||Digital Commons||
Florida Coastal Everglades Long Term Ecological Research Network: http://digitalcommons.fiu.edu/fce_lter
Disaster Risk Reduction (DRR) http://digitalcommons.fiu.edu/drr/
Sea Level Rise http://digitalcommons.fiu.edu/sea_level_rise/
South Florida Education Research Conference Proceedings: http://digitalcommons.fiu.edu/sferc/
|Dublin Core Schema; Metadata Encoding and Transmission Standard (METS); OAI-PMH for metadata harvesting|
|dPanther||Sea Level Rise Bibliography hosted by FIU dPanther|
|Disaster Risk Reducation(DRR) http://dpanther.fiu.edu/dPanther/dpMain/dpCollections_new.html?id=drr&ti...|
|Global Water for Sustainability http://dpanther.fiu.edu/dPanther/dpMain/dpCollections_new.html?id=glows&...|
|Geo-spatial Datasets||Data and metadata to be hosted in FIU’s Geoportal, and shared @http://www.data.gov/||FIU GIS Center has been serving geo-spatial datasets for multiple scientific collaborative projects. See also FIU’s Geoportal||ESRI ArcGIS data formats, e.g. file Geodatabase; ISO 19115 XML geo-spatial metadata schema;|
|Geo-data web download application||Web interactive downloading to be hosted by FIU GIS Center using its dPanther framework. Metadata to be shared @http://www.data.gov/||LIDAR elevation data download for Florida hosted by FIU GIS Center: http://digir.fiu.edu/Lidar/lidarNew.php||ISO 19115 XML; Open GIS Consortium (OGC) Web Map Service (WMS)|
|Web Visualization tools||Sea Level Rise Toolbox developed by FIU GIS Center: http://eyesontherise.org/app/||Standards -- Open GIS Consortium (OGC) Web Map Service (WMS) ; Tools -- HTML5+CSS3 , Google Maps APIs, ArcGIS Server APIs, Mootools|
|Project Websites||Hosted by FIU GIS Center||An Interactive Web Information Management System (IMS) for USAID funded WAWASH project: http://wawash.fiu.edu/drupal-cms/||Drupal CMS; Microsoft Active Directory (ALDP) Google Map APIs Bootstrap Jquery Microsoft .NET|
Discipline Specific Data Repositories:
You may choose, or be required by the funding agency, to archive your data in a discipline specific repository.
There are several resources that can assist you in finding discipline specific data repositories for archiving your research outcomes. These include:
- re3data.orgRegistry of Research Data Repositories
- Simmons Data Repositories Listing
- University of Oregon Discipline-related repositories