[CSG 2010] Curation, Preservation, & Information Lifecycle Management

Mairead from Penn State is talking about designing and implementing storage arhcitectures and systems to support data curation and preservation needs. Who’s thinking about this, and what are they doing?

Drivers & Incentives – eScience/eResearch. NSF requirement for data management plans. Compliance – e-discovery, FERPA, HIPAA, Sarbanes-Oxley. Institutional record retention regulations and policies. Storage services for libraries, archives, cultural heritage entities. Great efficiencies.

Expectations (not supported) – storage is cheap; storage is smart; stuff on the internet is persistent; digital safer than analog; storage provider – curators and preservation experts; repositories take care of preservation; metadata will take care of it; libraries will take care of it; the cloud will take care of it.

The reality – new roles, new responsibilities, new collaborations, practices, workflows; Intellectual capital requirements – digital preservation; clout antithetical to preservation?; increased management requirements; scaling issues with preservation requirements.

Standards/Technologies
iRODS – From SDSC, integrated rule-based data system. Second generation of SRB.
Content addressable storage – fixed content storage, retrieval based on content rather than location
eXtensible Access Method (XAM)

Initiatives –
NSF DataNet – Data Conservancy Project – JHU lead with 23 institutions.
Chronopolis – SDSC, UCSD, UMIACS, NCAR – federated data grid using SRB/IRODS
LOCKSS (Lots of Copies Keep Things Safe) – replication of licensed journals and other content
MetaArchive – a private LOCKSS archive
Internet Archive
National Digital Information Infrastructure & Preservation Program (NDIIP) – Library of Congress project.
California Digital Library
DuraSpace – DuraCloud project to implement a preservation-oriented cloud storage service
HaithiTrust – Repository and storage infrastructure initiated for CIC Google book project
Sun PReservation and Archiving SIG (PASIG)
Storage Networking Industry Association

Penn State activities – Content Stewardship PRogram – strategic collaboration between Libraries and ITS. Goal – a suite of services to support the lifecycle of the digital object – creation, discovery, access, storage, preservation, and archiving. Hired Digital Library Architect and Digital Collections Curator; worked on governance.

Sally Jackson says that the Library School at Illinois now has a program in digital curation.

Cliff – decisions on what to curate, and what to keep, are less binary in digital formats than in print. Eg, Portico for scholarly journals, vs. “digital archaeology” status. It’s about risk management and resource allocation. Some of what we’re trying to understand in bit-management is really about risk and cost. How many redundant copies do you need? Failure modes are not well understood. Very scary data from physics labs about undetected bit flip errors. What does that cost in a preserved object? If it’s encrypted in clever ways it can cost a lot!

Advertisements

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s