CSG Fall 2016 – Large scale research and instructional computing in the Clouds, part 2

What is Harvard Doing for Research Computing?
Tom Vachon – Manager of Cloud Architecture at Harvard

Research Computing & Instructional Computing
Harvard AWS admin usage averages about $150k/month, research computing ~$90k. There’s Azure usage too.

They have a shared research computing facility with MIT and other campuses. Want to use cloud to burst capacity.

Cloud makes instructional computing easier, particularly in CS-centric classes.

How do you save money in the cloud? Spot Instances (if you can architect workload to survive termination with almost no notice); Auto-turndown (can base rules on tags); Provide ongoing cost data – if you don’t tell people how much they’re spending as they spend it they get surprised. How does region choice influence cost? Cheapest region may not be closest. Certain places might not have the features – e.g. high bandwidth infiniband only available in 2 regions in Azure. Understand cloud native concepts like AWS Placement Groups to get full bandwidth between instances. How do you connect to your provider – cannot be an afterthought. What speed? Harvard has 40 gb direct connect to AWS. What reliability? (had issues with Azure VPN appliances which disconnect every six minutes); Where do you do encryption? Network or application? They chose to require application encryption (including database connections), don’t encrypt their connections.

Cloud requires new tools. How will you handle multiple providers? They’re making golden images for each provider that has very little in it. Ideally have one config management product (they’re consolidating to Salt Stack). Using Terraform to run images on multiple vendors – worth buying the enterprise version. Bonus if you can use same toolset on-premise.

Research Computing in AWS at Michigan
Todd Raeker – Advanced Research Computing Technology Services

What are researchers doing? What kinds of projects?

At Michigan environment is dominated by the Flux cluster: HPC and HTC, 28k computational cores. Researchers aren’t looking to do large-scale compute in the cloud.

In 2015 AWS program to explore cloud. Received 20 x $500 AWS credits. Most were small scale projects. Was primarily used by web applications – easy to learn and use. Researchers working used AWS for data storage and compute. Easier to collaborate with colleagues at different institutions – researchers can manage their collaborations.

Pros and cons: Can be support intensive, need to train staff. Good for self-sufficient researchers (with savvy grad students). User setup can be made hard. Is it really cheaper?

Duke – 60% of research compute loads were using 1 core and 4 GB of RAM – ideal for moving to the cloud.

Asbed Bedrossian – USC, Jeremy Hallum, Michigan

Containerization can help with reproducibility of science results. That’s an area where IT can partner to make containers for researchers to distribute.  Science is a team sport now, complexity increasing rapidly. Challenge is to get attention of researcher to change how they work – need to meet person in the field (agricultural extension metaphor). Central IT can help facilitate that, but you can’t just hang out a shingle and hope they come. Ag Extension agents were about relationships.

Notre Dame starting by helping people with compliance issues in the cloud – preparing for NIST 800-171 using GovCloud.

Getting instructors to use modern tools (like checking content into Git and then doing the builds from there) can help socialize the new environments.

Harvard hopes to let people request provisioning in ServiceNow, then use Terraform to automatically fire off the work.  Georgetown looking at building self-service in Ansible Tower.

Research computing professionals will need to be expert at pricing models.




Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

%d bloggers like this: