CSG Spring 2015 – Research Computing Directions, part 1

The afternoon workshop is on research computing directions. Strategic drivers are big data (e.g. gene sequencing); collaborations; mobile compute; monetization.

Issues: sfotware defined everything enables you to do things cheaper; cloud/web scale IT drives pricing down; mobile devices = sensor nets + ubiquitous connectivity; GPUs/massive parallelism. Containerized and virtualized workloads and commodity computing allows moving analysis tools to the data. Interconnect science DMZs. Federations, security & distributed research tools.

Case Studies – where are we today?

Beth Ann Bergsmark, Georgetown: Ten years ago did very little to shape central IT to align with researchers. We always started conversations as security issues. Researchers started realizing that the complexity of what they were building needed IT support. Central IT has adopted research support – grew organization to build support across the fabric of the organization. Built a group for supporting regulatory compliance. Most research has moved into the data centers on premise. Putting staff on grants – staff from traditional operations areas. Fantastic career path, plus making them more competitive for grants. Understanding how to create partnerships. Regulatory compliance control complexity continuing to grow, but research management software is also maturing. Thinking about integration of those apps. Research computing is driving future planning – networking, storage (including curation), compute. Research driving need for hybrid cloud architecture. Researchers will go where the opportunities and data are. Watching open data center initiative closely – AWS hosting public federal data. Portability becomes key. PIs and researchers move – on premise that’s hard. In cloud it should be easier.Need to build for portability. New funding models for responding to the life cycle of research.

Charles Antonelli – Michigan: Has been doing research support in the units since 1977. No central support for research computing except for 1968 era time sharing service. In 2008 there was a flurry of clusters built on campus in various units that had HPC needs. One of those was Engineering’s. In 2009-10 first central large cluster was born. Been growing since that tiem. Flux cluster: 18k cores with ~2500 accounts. Primary vehicle for on campus support of large scale research computing. Does not yet support sensitive data because it speaks NFS v3. That will be fixed with a new research file system. Cluster around 70% busy most of the time.  Central IT does not provide much help for consulting on the users. There is a HP consulting service currently staffed by 20% of one person. Have been looking at the cloud. Hard to understand how to use licensed software in the cloud. Have been using Globus Connect for a long time. Hooking up group stuff to the Globus endpoints.

Charley Kneifel – Duke: Duke Shared Cluster Resource – Monlothic cluster, Sun grid engine scheduler, solid scientific support staff, problematic financial model, breakaway/splintered clusters spun up by faculty. New provost and Vice Provost for research. Active faculty dissatisfaction, new director of research computing in IT. Now: Duke Compute Cluster, SLURM job scheduler, reinvigorated financial model – cover 80% of need on campus with no annual fees for housing nodes. Moving capex to opex. Faculty who’ve built their own clusters are now interested in collaborating. Going towards: Flexible compute cluster with multiple OS, virtualized and secure. Additional computing servers/services: specialized services such as GPU clusters or large memory machines. Flexible storage – long term, scratch, high performance SSD. Flexible networking: 10GB minimum, 40G+ interswitch connections; 20G+ storage connections; SDN services. Challenges: History, wall between health system and university. How to get there? Allocations/vouchers from middle; early engagement with researchers; matching grants; SDN services; cooperation with colleges/departments; support for protected network researchers; Outreach/training – docker days, meetings with faculty. Requires DevOps – automation, work flow support, hadoop on demand, GUI for researcher to link things together. Carrots such as subsidized storage, GPUs, large memory servers. Cut-n-pastable documents suitable for grant submissions; flexibility; removal of old hardware.

Tom Lewis, Chance Reschke, WashingtonConversations with research leaders in 2007-8. 50+ central IT staff involved, 127 researchers interviewed, selected according to: number and dollar amount of current grants relative to others; awards and recognitions. Learn about future directions of research and roles of technology. IT & data management expertise, data management infrastructure, computing power, communication & collaboration tools, data analysis & collection assistance. That equals cyberinfrastructure. By 2005 data centers were overwhelmed. In 2005 data science discussions began. By 2007 VP of research convened forums to discuss solutions, by 2010 rolled out first set of services. Why? Competitiveness & CO2 – faculty recruitment & retention, data center space crisis, climate action plan, scaling problem. Fill the gap: Speed of science/thought – faculty wanted access to large scale, responsive, supportable computing as they exceed the capacity of departments. A set want to run at huge scale – prep for petascale. Need big data pipelines to instruments. Data privacy for “cloudy” workloads.  Who’s doing it? UW-IT doing most of it through service delivery, mostly through cost recovery. Libraries work on data curation, the eScience institute works on big data research. First investment was to build scale for the large researchers who were asking the Provost. Built credibility, and now getting new users. Faculty pay for the blade, which is kept for 4 years. Just added half an FTE data scientist for consulting.

UCSD – UCSD has 25% of all research funding for all of UC. Most research computing is at San Diego Supercomputer Center. Two ways to access – XSEDE (90% of activity). Users get programming support and assistance. There are champions around campus to help. Triton Shared Computing Cluster – recharge HPC. Can buy cycles or buy into the condo. 70% of overall funding comes from research side, rest comes from campus or UC system. Integrated Digital Infrastructure is a new initiative started by Larry Smarr: SDSC, Qualcomm Institut, PRISM @ UCSD, Library, Academic Computing, Calit2. Research data library for long term data curation is part of that initiative.

Advertisements

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s