The afternoon workshop, coordinated by Kitty, is on data storage.
There is, of course, a survey to present. Most of the schools are offering multiple kinds of file services with ever-increasing quotas. Only two schools are offering replication technologies (like Apple or Microsoft’s). The predominant technology is direct attached storage, but there is use of Fiber Channel SAN, and some use of iSCSI SAN. Most folks are using TSM for backup.
Most people said that the Library does not provide any data archiving services.
Unsolved problems include (of course) funding, smart data storage, multi-platform access, replacing current distributed file systems – what’s next?, virtualization and tiering, more-more-more – keeping up with demand.
Summary – growth in data is a huge problem and an unfunded mandate. Federal requirements for keeping and protecting data for longer periods and unmanaged data are huge issues. Inefficiency is a problem – we’re not aligning data with the right solutions. The technologies for storage don’t knit together well – there’s a duct-tape feeling to the solutions.
Ron Thielen from the University of Chicago is talking about storage.
SAN vs NAS is the wrong question – they’re converging anyway. The real question is what APIs do you want to use to provide access to data – files, blocks, objects.
A File System is really a metadata repository and related APIs. Once a vendor understands that it enables really interesting things to happen – Xythos is an example of someone who gets that. Typical storage growth figures are quoted as 39% annually – even more worrisome is the percentage of budget devoted to storage. At the U Chicago, in the last few years they’ve seen 96% compound annual growth rate.
Gartner predicts “By 2008, nearly 50% of data centers worldwide will lack the necessary power and cooling capacity to support high-density equipment.”
What’s the storage buzz?
– SMI-S 1.2 (an ANSII standard for storage management) & Aperi (an open source storage management project – part of the Eclipse project).
– Continuous data protection – backs up files as they change.
– Virtualization – heterogeneous (the holy grail), switch-based (Cisco and Brocade – moving virtualization into the SAN itself), HBA (for VMWare or blade centers).
– Global Nape Spaces (File Virtualization) – put something in front of a bunch of NAS devices that looks like a single name. EMC and Brocade have purchased technologies in this area.
– Clustered File Systems and Storage (like Isilon)
– Archival file systems (Archivas, Permabit) – a specialized example of clustered file system.
– Database archiving
– Wide Area File Systems
– Object Based Storage Devices – when you’re storing data on storage devices, some metadata can be managed by the device not the storage system. (why would you want to do this?)
– TPM (Trusted Platform Module) in storage devices – TPM in devices and servers exchange certificates – storage devices can be made to not give up access if they’re not matched with the appropriate servers.
– Solid State & Hybrid
– Intelligent Storage Grids & Storiage Autonomics – do self-provisioning based on access to policy rules.
Regulatory Effects on Storage –
New Federeal Rules for Civil Procedures causing much FUD.
– “rules also mean that colleges that are in litigation or that suspect they may soon be in litigation cannot destroy electronic evidence they know would be relevant to a lawsuit.” (Chronicle of Higher Education)
– means universities will have to keep much better track of data.
Greg Jackson notes that this is a risk management issue where we need to be careful about going to great lengths to solve problems technologically instead of planning on some basic procedures that we might take when or if we have to perform under this law.
Use case – VBoIP and Unified Messaging – talk about unstructured data!