Managing very large files in research computing at IU.
Task force two years ago on research cyberinfrastructure had recommendations concerning storage – Continuing to deliver centralized facilities to support research computing as well as dependable archival storage were identified as important. Large file storage is just a piece of the storage strategy for IU.
They have about a petabyte of spinning disk available for researchers, as well as 4 petabytes of archival storage (the Massive Data Storage System). The “Data Capacitor” captures data from instrumentation.
Data Capacitor uses Lustre OS.
MDSS designed to provide a deep store for large files. Runs HPSS. Interfaces include FTP, Samba, and tar. Radiology is one of the biggest users. Also working with digital library programming. They give the researchers 500 GB for free, and after that they want to discuss it.
Preservation, curation, and long term management of data is a big issue – need to link librarians, computer supporting, and IT professionals. Serge notes that finding ways of accomplishing persistent URIs for data is important.
Backup with mirroring is if you accidentally delete something or introduce bad data in big data sets is a serious problem.