Managing Research Data: Some Ins and Outs
Joyce Ray, Johns Hopkins University
Geneva Henry, George Washington University
Michele Kimpton, DuraSpace
Melissa Levine, University of Michigan
Based on a book: Research Data Management – Practical Strategies for Information Professionals.
Overview of the volume:
– Policy context
– Managing active data
– archiving and managing data long-term
– measuring success
– case studies
– what’s next
– planning is essential and ongoing
– essential infrastructure goes beyond software and your own institution – it includes tools, services, policies, and communities of practices
– collaboration internally and externally helps to maximize insittutional investment
– value of managing research data is not yet proven
Data Curation for the Humanities (based on work done at Rice)
– Where’s the data in humanities research?
Digital content enables structure to be added to otherwise unstructured resources (metadata, OCR, text markup, georeferencing, 3d recreations of space)
Data enables new research never before possible (social network analysis to discover historic relationships, time and space anlysis, closer inspection of historic sites, content analysis for frequency of terms across multiple works)
– Creating sustainabile digital content and what it takes to curate it
Digital content is powerful but enhancements must be maintained and reusable. Balance in extreme markup/annotation vs. minimal additions of interpretations; managing elements associated with a work as separate objects. Example of migrating content across newer versions of TEI.
– teams and infrastructure
Domain expertise and technical expertise needed for success – partnering between academic faculty and librarians and technologists is powerful. Don’t go in with the attitude that this is just a service provided – scholars should be involved in considering markup. Opportunities for learning new skills. Platforms that can handle varied content.
– project case studies – Our Americas Archive partnership; travelers in the middle east archive; shepherd school of music collection; Rice ephemera archive; Houston Aiian American Archives Oral Histories collection.
Michele Kimpton – Archiving research data in the cloud or in a local repository
Did a survey of people in the Duraspace community around practices and use cases.
Common issues: Where can I put my data for long term access; How do I make it discoverable, reuseable, reproducible?; What metadata, provenance, and identifiers should I use? (very much an emerging set of practices); What policies should be in place for archiving and preserving data? (multiple locations? cost associated); How do I fund this?
Data management in DSpace – new features in DSpace 5.0 related to data management and archiving (coming out end of this month). DOI support – EZID, ORCID integration, linked open data support, integrated with DuraCloud.
Data management in Fedora – last week Fedora 4 became available. Supports linked open data; content modeling; versioning; large files; fixity checking; external, asynchronous storage.
DataOne project -humanities, social sciences, earth science. 80% of files are excel or comma-delimited – the long tail of data.
Commercial based cloud solutions – Attract end users because solves immediate need without adding a ton of work to end user. Share, collaborate,or meet mandate by publisher or funding agency; Little to no preservation practices in place; No stated or unstated longerm data management practices; long term at risk reliant on investors interest and success in the market; lack of trust and control in academic community.
Publishers are paying for storage of data in Figshare.
Community based cloud solutions: duracloud (in partnership with DPN and chronopolis); center for open science; dryad (UNC, based on DSpace); zenodo.
Questions: Is it open source? Are the policies transparent? What is the governance? Are there policies to preserve the data?
POWRR study from IMLS – comprehensive study of archiving tools.
Availability-Usability Gap – Copyright, open data and the availability-usability gap: challenges opportunities, and approaches for libraries.
The big ideals: “data as the new gold” – we are building the mines, can we make them safe and stable?
Principles: Denton Declaration 2012. Research data when repurposed has accretive value; publicly funded research should be publicly available for public good (issues with commercial providers); transparency in research is essential to sustain the public trust; validation of research data by peer community essential to the function of responsible research; managing research data is the responsibility of a broad community of stakeholders including researchers, funders, institutions, libraries, archivists, and the public.
Still early times:open access/data: Open access funding mandates (NIH 2008, NSF 2010, OSTP 2013); Simple things complex: making sure authors have the rights they need to deposit, can only pass on what they have to give; trending toward proactive planning in rquired data plans (data considered at the front end, well before publication).
Issues: technology outpaces law; law is harder than technology because it’s about people; different countries differ; different disciplines differ; hiding data (squeezing the last bit of publication before sharing; disincentives to share); data citation – chain of title; public versus private interest; cost – good metadata is expensive.
Progress: Research Data Aliiance- CODATA; DataCite; DataONE – best practicies primer; databib; ORCID; new roles for libraries as hubs of expertise – even if it’s to other parts of the enterprise.