Research Data Management, Sharing, and Preservation
Research Data Management: Policies, Services, Technologies
Mark Ratliff, Digital Repository Architect, Princeton, Office of IT
Data production and processing: very successful high performance computing center, who obtained a petabyte of hierarchical storage management. Faculty focus is on creating and analyzing data, so Princeton has made considerable investment. Have a few central file systems for faculty and students to store data. Have a Xythos system for smaller datasets, SharePoint (not usually used for research data).
Publishing and Preservation – DataSpace repository originally designed to publish and preserve research datasets – allow faculty to refer to datasets in a stable way. One-time charge model (aligns well with grant funding). Don’t have much data in system to date. Twenty datasets in there so far, in one collection. In talking with web services group discovered that departments who were redesigning departmental web sites had hundreds of PDFs of working papers, so they’ve helped them work that out, building a Drupal plugin to allow papers in repository to show in web sites.
In institutional study of Data Disposition for NSF Grant Proposals Jan 2011 to Oct 2012, DataSpace was mentioned 48 times, but Mark has only heard from two of those.
Outreach – Library recently named E-Science Librarian and they’ve begun to plan outreach. Consider how to relate research data to recently adopted Open Access Policy for journal publications.
Research Data Services @ Duke – The good, the bad, and the ugly
Molly Tamarkin, AUL for Information Technology, Duke
The Good: Mellon grant, Digital Futures Task Force committee – asked what Duke might do to help researchers manage, archive and share data? What incentives might be provided for better management of research data? How might it be funded? They have a policy in Faculty Handbook requiring data stewardship and retention, and growing demand from faculty & schools to improve data management. Up till now most management is informal or ad-hoc, with little emphasis on access, curation, discovery, and disposal. In practice only 10-25% of data requests are fulfilled.
Recommendations: Consulting Services for planning, infrastructure selection, for application of relevant standards, practices, policies, training. Managed storage, and ongoing governance. Later phase, interested in management systems for data and research worfklows and creation of a data registry service for discovery of data in widespread repositories.
The bad: Their request for $560k was not funded. Why? Perhaps because they didn’t dwell enough on the problems this would solve. Where’s the story about person at Duke who lost everything?
The ugly: drafting proposal for NSF MRI program to buy storage with curation, data registry, etc. Support “research data from creation to curation”. An effort by an interdisciplinary group. Ugly because there are lots of player working hard.
Research data challenges: UVA’s approach
Andrew Sallans, Head of Strategic Data Initiatives and the Scientific Data Consulting Group, University of Virginia
Started in May of 2010, trying to avoid doing large scale faculty advisory groups and reports. Structure in three blocks: Assessment of research problems at individual researcher level, Planning of data management plans (developed templates to make the process quick, easy, and help people make right decisions), and implementation (data in institutional repository).
Engaging at multiple levels: Front-line services (advising on data management planning, consulting on process improvement, training on requirements and best practices) Process improvement (DMPTool for efficiency, policy revision and recommendations, connecting stakeholders); Infrastructure development (Repository services); Community-wide (involvement in national initiatives, developing collaborations).
DMVitals – an assessment framework for producing understanding capability maturity of data management practices at an individual researcher level.