National Science Foundation workshop on high performance computing, storage, and large databases


This workshop was sponsored by NSF’s Division of Science Resource Statistics, which collects data on the US science and engineering enterprise to be used for policy making purposes. They have collected data on the physical environment for research at higher education and biomedical institutions since 1986, and since 2003 they have begun to add a survey on cyberinfrastructure. The initial effort was to collect data on networking infrastructure, but now they are interested in also collecting data on high performance computing, storage, and large databases used for science and engineering research purposes. These surveys go to all research performing institutions with greater than $1 million in research expenditures and all biomedical institutions with greater than $1 million in NIH funding.

The workshop gathered a group of about fifteen participants from institutions as large as the UW, Penn State, and UNC, as rarified as Princeton, as specialized as the National Center for Supercomputing Applications (NCSA) and the Scripps Biomedical Institute, and as small as the Mount Desert Island Biological Laboratory, to brainstorm on what data points might possibly be collected on these activities that would be both meaningful and possible to collect.

Most of these institutions, unlike the UW, host some central research computing facility where a central IT organization runs some large high performance computing resources that are used by faculty doing research. But even in those institutions there are many other research computing efforts on the campuses that are not run by central organizations.

Over the couple of days what emerged was a way of classifying high performance systems into: Clusters (which can be either tightly or loosely coupled); Massively Parallel (MPP) machines with distributed memory; Symmetrically Multiprocessor (SMP) machines with shared memory; and Vector Processors (PVP) which it was noted aren’t seen too much in the US.

Common data that can be collected about those kinds of compute resources includes: number of processors (there was an interesting discussion of how to count this in this day of multi-core chip-sets); processor speed; amount of memory per processor; what kinds of interconnects exist between processors; total RAM, total attached disk (and what kind); and total estimate of flops the machine is capable of.

Some interesting items pop up in my notes from the two days:

  • The needs of a research data center are qualitatively different from the needs of a business data center in terms of types of facilities, access policies, and tolerance for what kinds of down time.
  • Support for data management and use of databases is the fastest growing demand for help among researchers using high performance computing.
  • UCLA has grown a strong grid computing initiative, which is not only supporting the other UC systems, but also providing cycles to the Cal State institutions and K-12 institutions in California, through the “Kids On The Grid” program.
  • Princeton has evolved their academic technology support to a new group in OIT, their central IT organization, to support research computing. That group works very closely with PICSciE, the campus’ new center for computational science and engineering work. The group within OIT concentrates on administering high performance systems that are widely used by researchers. They currently run an IBM Blue Gene, an SGI Altics, and a Beowulf cluster. They’re building a 35 terabyte shared storage facility.
  • One institution is building a brand new 11,000 square foot data center with 8 tons of cooling capacity – they figure that amount of capacity will only hold them for a year or two.
  • The University of Houston has an interesting model where the system administrators for their research facility are not university employees but contracted from outsourced firms – they have a lot of folks in Houston with those skills providing outsourced services to the petroleum industry as well as academia.
  • Purdue is running Condor clustering to make unused cycles from student computing labs available to research efforts.

It was a very interesting couple of days – it was great to meet folks I didn’t know, and to get a feel for what’s happening out there in this fast-changing field.

Technorati Tags: , , ,

Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s