Tom Vachon, Harvard

Harvard’s cloud at a glance

  • 684 applications targeted for migration by 7/18, 300+ migrated already
    • Shutting down one on-prem data center
  • 1 VPC per account on average
    • Centrally Billed: 131 Accounts
    • 45 Accounts/VPCs on Direct Connect
    • Looking to make Cloud a University-wide strategic program
  • Cloud Shield – physical firewall
    • Kicked off 7/15 in response to a security breach
    • POC – 11/15 – 2/16
    • Started automation code 3/16
    • 15,000 lines of code
    • Production ready 7/16
    • Design goals
      • provide highly available and highly redundant AWS network access
      • Provide visibility of traffic into, out of, and between cloud applications
      • Provide next-gen firewall protections
      • Inline web filtering to simplify server configuration
      • Provide multicloud connectivity
    • Tech details
      • Diverse paths and POPs – Boston has 2 direct connects, and a POP in Equinix in Virginia with private network connection to campus
      • Primarily done for visibility
    • Actively discourage host-based firewalls
      • Use security groups instead
      • Don’t use Network ACLs
  • Will provision services with public IPs
    • They have overlapping private address spaces
  • Design manager of managers in Python
    • Create an ops & maintenance free architecture in Lambda
    • Provide REST API through AWS API Gateway
    • Isolate changes by segregating integrations in AWS Lambda
  • Leverage AWS DynamoDB for
    • Schemaless session cache
    • Dynamic reconfiguration
  • Challenges
    • Static DNS names
      • use ELB or ALB for applications
    • Everyone needs to be on Harvard IP space
      • Delegates six /16s for AWS
    • Legacy application stacks
      • Java has a “mostly hate” relationship with DNS
        • Lots of apps cache DNS forever
    • Reduced S3 visibility
    • Inability to do app-by-app identification
      • Grouping by data classifications
    • Items which are unknowingly locked down to AWS IP space
      • eg doing a yum update to AWS Linux from a non-AWS ip space
  • Virtual firewalls per VPC were going to cost >$4 million over three years, this model costs $1.6 million over five years
  • Most applications got faster when distributed across this model
    • Less switching in the way

Panel Discussion

  • Biggest technical challenges so far?
    • Georgetown  – have to run virtual firewalls in HA. Looking at replacing with TrendMicro
    • Harvard – lack of visibility in AWS
    • UNL – Vast offerings from vendors – how to wrap heads around it?
    • How to support on prem and burst out, especially for research instruments?
    • Cornell – Keeping up with the technology. Having people to manage and implement solutions. Encouraging lack of consistency in an effort to use the best new technology to solve problems.
    • Wisconsin – Have to worry about security a whole new paradigm in the cloud.
    • Notre Dame – pace of innovation. Do we prepare for a more rapid pace of change (and those costs) or learn to live with not implementing the latest?




CSG Fall 2016 – Security and Configuration in the Cloud, pt 1

Sarah Christen is introduces the workshop.

Bob Winding and Sharif Nijim – Notre Dame

  • Cloud first – even distributed groups going cloud first
  • VPCs: Share Services VPC with peering to Central Applications or Departmental VPCs; VPN Tunnels over I2 to campus; Be wary of implicit peering through campus routers.
  • 80% central IT, 20% distributed
  • Pauses to assess progress are built into the plan, with sprints to address issues. Inviting Mandian to campus to help establish 5 year security roadmap.
  • Export controlled data
    • 22 projects on campus dealing with this kind of data
    • Gov Cloud new initiative to support research
    • NIST 800-171 DFAR-7012 – looks a lot like PCI DSS
      • AWS covers 1/3 of security controls in GovCloud
      • Talked to a half-dozen PIs – experiments generate lots of data, then they move data to a local spot for analysis, or design work that happens locally with specific apps.
      • Developed a compliance matrix and quick start template in Cloud Formation
        • Quick start builds shared services and multi-tenant project VPC
      • Want to create an environment in GovCloud that is cloistered for the work until it goes back to the sponsor.
  • VDI – Using graphics intensive applications in the cloud
    • Looked at frame – delivers screen from remote desktop over video streams. Running pilot in US East
  • Look at the RDP gateway as the audit boundary – doesn’t include the end user device
  • Least privileges in IAM
  • Working with Purdue to look at SaaS providers for security monitoring and log analysis
  • AWS Security
    • Flipped IAM from least privilege to explicit deny of dangerous operations
      • Separation of control on IAM policy creation and application
      • Writing Lambda functions to undo changes that aren’t permitted
  • Organizing security groups
    • Setting standards for common functions, like sysadmin access
    • Engineers have a hard time keeping things simple
    • Databases use security groups for access control, which simplifies auditing
  • `Data security
    • Using Tripwire tuned precisely on systems with confidential information
    • Encryption at rest and backups
    • Replication of backups/snapshots to a separate account and region. If a credential is compromised can’t destroy both operational data and backup
  • Future
    • Cloudfront WAF
      • Want to fully leverage Amazon’s tools to gain advantage
      • Realize that this increases lock-in with the vendor
    • Host IDS for selected sensitive systems – looking for things that don’t cause choke points
    • Comment from Bruce – “we’re on the verge of a post-firewall world”
      • At AWS have to use IP-address based controls across VPCs and shared services

Bob Turner – Wisconsin

  • Somewhere between cloud experimentation and cloud aware.
  • Trying to not yet deal with sensitive and restricted data in the cloud
  • security requirements for accounts and VPCs
    • Working off script based on risk management framework
    • Using it for onboarding people into cloud environments
    • Working on audits and attestations
  • Enforcing cloud controls (will also use for on campus environments)
    • Provisioning/De-provisioning
    • Going to try to use FEDRAMP checklist as a guide
    • Approval of risk by Executive able to accept on behalf of University
  • Automated Templates (consultancy model)
    • Create a new account or migrate existing account under master
    • Pre-provisioned equipment templates with logging enabled
    • Configured for Shibboleth
    • Moving towards Duo for MFA
    • Activate AWS Config
    • Use (future) cloud security tool for initial verification and continuous monitoring
  • Things to be concerned about
    • Holding on to root accounts and credentials
    • Challenges of CDM
    • Usual tools are not necessarily available
    • AWS tools have charges
    • Challenge of cloud vendors that don’t support SAML or federation
  • Account management
    • Group email per department, including Office of Cybersecurity Rep
  • Researcher accounts
    • must know their expected data (at present no Restricted or Sensitive data)
      • Google as a government service that has been pretty well vetted by US agencies

Sarah Christen – Cornell

  • Cloud first according to IT Strategic plan written in 2013
  • 54 accounts under master contract, hundreds outside
  • Cloudification services has been an opportunity for central IT to partner with campus
  • Requirements for being on master contract
    • Onboarding discussion
      • How billing works; unit responsibilities – how is this different than the data center?; Security and configuration requirements; Benefits; Discussion about joining tech community; central services available – Container service (will containerize and run code for fee), DevOps service
    • Attestation
      • Explicit agreement to policies
    • Shibboleth
    • Duo for MFA for console access
    • Activation for AWS Config and CloudTrail
    • CloudTrail logs sent to Security Office
  • Onboarding – create account, configure Shib and Duo, lockdown root account, standard AD groups (admin, cloud group, security), activate Config and CloudTrail and configure CloudTrail logs to be sent to Security office as well as the VPC owner; activate Cloudcheckr and schedule review of how to use.
  • CloudCheckr – allows those with accounts to see usage data; makes recommendations on how to save money; sends monthly invoices; runs continuous vulnerability scan; gives Security a view into all accounts
  • Standard VPC setup –
  • What about reseaerch accounts?
    • Easy onboarding without a lot of steps or complication
    • No intereference with research, no cost of performance overhead
    • Solutions for export controlled data and othe rcompliance requirements
    • Standard network config not always a good fit
    • Consultation and services – Docker, Data Storage, Training, Devops support

Mark Debonis – VaTech

  • Cloud Aware -> Moving into CLoud Experiment
  • One production VPC in AWS, five pre-production
  • Moving towards both AWS and Azure offerings
  • Manual provisioning process
    • Customer contacts CCS via Service Catalog for Cloud brokerage discussion
    • Difference in Azure (upfront) and AWS billing models – In Azure if you don’t use your commitment in a year you lose it
  • Logins to Azure portal with VT AD account, Redirect to VT ADFS, Login and use Duo, Primary contact manages other users through Azure Admin portal with VT AD accounts

Kevin Murphy – UNL Lincoln

  • Cloud first for SaaS
  • Experimentation for PaaS and IaaS: Rackspace, Azure, AWS
  • On VPC in Azure for disaster recovery (domain controllers, ADFS)
  • VPC in progress for AWS
  • Central IT is pushing cloud strategies, very little departmental participation. Research computing run by CS faculty, not interested in cloud computing.
  • Security requirements: Federated logins (ADFS with Duo) for Azure. Shipping everything from IaaS to Splunk on campus
  • Security requirements – manually creating accounts; No PII data in the cloud
  • Been doing Azure StoreSimple device – hybrid solution.
  • Moving PCI environment to the cloud with a managed service provider who will take the liability and run on AWS. “not extremely expensive”
  • Challenges: Moving current architecture to IaaS can be prohibitively expensive – people build for peak loads, need to use elastic capabilities. Exploring PaaS options such as Azure Web Apps and DB services. Billing is a challenge.

Bereket Amdemichael, Daniel Tamiru, Georgetown

  • Based their AWS cloud architecture on the work done at the CSG Cloud Architecture Working Group
  • Added a proxy layer.
  • IPSec VPN – Cisco
  • Users only have access to specific VMs – have to access across the VPN
  • VPC and group architecture is a “spirited discussion”
  • When do they (security) need to be alerted when something isn’t right?
  • Using Equinix for high speed transfer to AWS

CSG Fall 2016 – Next-gen web-based interactive computing environments part 2

NYU Sample Cases – Stratos Efstathiadis

Web-based interactive tools supported by NUI Data Services

Most popular tools include Quantitative, Geospatial, Qaulitative, Visualization. Courses, boot campus, etc.

Some tools are web-based (R Studio, ArcGIS  Online, CART, Qualtrics, Tableau, plotly)

Services provided for tools: Training, Consultations; Pedagogy; Data; Accounts

Geospatial example: ArcGIS online used by courses in radicalization & religion and ethnic conflict, art & politics in the street. Initial consultation and needs assessment, account creation, training, data gathering, in depth consultations, initial web publishing, training round 2, technical support, lessons learned.

Certificate class in Big Data. Structured around a textbook developed from a set of similar classes. Three options: For credit; certificate class meeting four times a semester for three days each; tailored for an agency. Includes non-NYU students, students will be able to access and analyze protected /confidential data. WOrk in teams sharing code and data in project spaces; provide substantial analytic and visualization capabilities where everyone in the class can work simultaneously. User experience is important.

Deployed two PoC environments: on-premise (short-term) and on AWS (long-term).

Built NYU Secure Research Data Environment – serve broad communities of NYU scholars and their collaborators including government agencies and privat sector; support a wide spectrum of data; provide access to powerful resources; enable collaboration; offer training; offer data curation and publications.

Part of Digital Repository Service: Band 1 (fast temporary storage); Band 2 (storage for ongoing activities); Band 3 (feature rich publication environment); Band 4 (secure data environment).

ARC Connect
Mark Montagues – Michigan

Enable easier access to HPC resources – researchers who had never used the command line. Texas shared code they wrote for the XSEDE portal. Added federated authentication with shib and mandatory multifactor. ITAR today, HIPAA on roadmap. Mandatory encryption for VNC sessions (no SSH tunnels needed). Web-enabled VNC viewer brings up a desktop. Encryption is enforced and mandatory. gsissh (part of the globus toolkit) enables authn between arcconnect web server and cluster node. Environment has RStudio and Jupyter. Researchers can install web apps in their home directory on the PC cluster.

To take advantage of the infrastructure, web apps need to be able to run nicely behind a reverse proxy. Hoping to automate the environment more in the future.

Charlie suggests that Galaxy is a piece of software that is worth looking at.

CSG Fall 2016 – Next-gen web-based interactive computing environments

After a Reuben from Zingerman’s, the afternoon workshop is on next gen interactive web environments, coordinated by Mark McCahill from Duke.

Panel includes Tom Lewis from Washington, Eric Fraser from Berkeley

What are they?  What problem(s) are trying to solve? Drive scale, lower costs in teaching. Reach more people with less effort.

What is driving development of these environments? Research by individual faculty drives use of the same platforms to engage in discovery together. Want to get software to students without them having to manage installs on their laptops. Web technology has gotten so much better – fast networks, modern browsers.

Common characteristics – Faculty can roll their own experiences using consumer services for free.

Tom: Tools: RStudio, Jupyter; Shiny; WebGL interactive 3d visualizations; Interactive web front-ends to “big data”. Is it integrated with LMS? Who supports?

What’s up at UW (Washington)?

Four patterns: Roll your own (and then commercialize); roll your own and leverage the cloud; department IT; central IT.

Roll your own: SageMathCloud cloud environment supports editing of Sage worksheets, LaTex documents, and IPython notebooks. William Stein (faculty) created with some one-time funding, now commercialized.

Roll your own and then leverage the cloud – Informatics 498f (Mike Freeman) Technical Foundations of Informatics. Intro to R and Python, build a Shiny app.

Department IT on local hardware: Code hosting and management service for Computer Science.

Central IT “productizes” a research app – SQLShare – Database/Query as a service. Browser-based app that lets you: easily upload large data sets to a managed environment; query data; share data.

Biggest need from faculty was in data management expertise (then storage, backup, security). Most data stored on personal devices. 90% of Researchers polled said they spend too much of their time handling data instead of doing science.

Upload data through browser. No need to design a database. Write SQL with some automated help and some guided help. Build on your own results. Rolling out fall quarter.

JupyterHub for Data Science Education – Eric Fraser, Berkeley

All undergrads will take a foundational Data Science class (CS+Stat+Critical Thinking w/data), then connector courses into the majors. Fall 2015: 100 students; Fall 2016 500 students; Future: 1000-1500 students.

Infrastructure goals – simple to use; rich learning environment; cost effective; ability to scale; portable; environment agnostic;  common platform for foundations and connectors; extend through academic career and beyond. Student wanted notebooks from class to use in job interview after class was done.

What is a notebook? It’s a document with text and math and also an environment with code, data, and results. “Literate Computing” – Narratives anchored in live computation.

Publishing = advertising for research, then people want access to the data. Data and code can be linked in a notebook.

JupyterHub – manages authentication, spawns single-user servers on demand. Each user gets a complete notebook server.

Replicated JupyterHub deployment used for CogSci course. Tested on AWS for a couple of months, ran Fall 2015 on local donated hardware. Migrated to Azure in Spring 2016 – summer and fall 2016. Plan for additional deployment to Google using Kubernetes.

Integration – Learning curve, large ecosystem: ansible, docker, swarm, dockerspawner, swarmspawner, restuser, etc.

How to push notebooks into student accounts?  Used github, but not all faculty are conversant. Interact: works with github repos, starting to look at integration with Google Drive. Cloud providers are working on notebooks as a service., – access sample notebooks.

Mark McCahill – case studies from Duke

Rstudio for statistics classes, Jupyter, then Shiny.

2014: containerized RStudio for intro stats courses. Grown to 600-700 students users/semester. Shib at cm-manage reservation system web site, users reserve and are mapped to personal containerized RSTudio. Didn’t use RStudio Pro – didn’t need authn at that level. Inside – NGINX proxy, Docker engine, docker-gen (updates config files for NGINX), config file.

Faculty want additions/updates to packages about twice a semester. 2 or 3 incidents/semester with students that wedge their container cause RStudio to save corrupted state. Easy fix: delete the bad saved-state file.

Considering providing faculty with automated workflow to update their container template, push to test, and then push to production.

Jupyter: Biostats courses started using in Fall 2015. By summer 2016 >8500 students using (with MOOC).

Upper division students use more resources: had to separate one Biostats course away from the other users and resized the VM. Need to have limits on amount of resources – Docker has cgroups. RStudio quiesces process if unused for two hours. Jupyter doesn’t have that.


Shiny – reactive programming model to build interactive UI for R.

Making synthetic data from OPM dataset (which can’t be made public), and regression modeling against that data Wanted to give people a way that allows comparison of results to the real data.



CSG Fall 2016 – Large scale research and instructional computing in the Clouds, part 2

What is Harvard Doing for Research Computing?
Tom Vachon – Manager of Cloud Architecture at Harvard

Research Computing & Instructional Computing
Harvard AWS admin usage averages about $150k/month, research computing ~$90k. There’s Azure usage too.

They have a shared research computing facility with MIT and other campuses. Want to use cloud to burst capacity.

Cloud makes instructional computing easier, particularly in CS-centric classes.

How do you save money in the cloud? Spot Instances (if you can architect workload to survive termination with almost no notice); Auto-turndown (can base rules on tags); Provide ongoing cost data – if you don’t tell people how much they’re spending as they spend it they get surprised. How does region choice influence cost? Cheapest region may not be closest. Certain places might not have the features – e.g. high bandwidth infiniband only available in 2 regions in Azure. Understand cloud native concepts like AWS Placement Groups to get full bandwidth between instances. How do you connect to your provider – cannot be an afterthought. What speed? Harvard has 40 gb direct connect to AWS. What reliability? (had issues with Azure VPN appliances which disconnect every six minutes); Where do you do encryption? Network or application? They chose to require application encryption (including database connections), don’t encrypt their connections.

Cloud requires new tools. How will you handle multiple providers? They’re making golden images for each provider that has very little in it. Ideally have one config management product (they’re consolidating to Salt Stack). Using Terraform to run images on multiple vendors – worth buying the enterprise version. Bonus if you can use same toolset on-premise.

Research Computing in AWS at Michigan
Todd Raeker – Advanced Research Computing Technology Services

What are researchers doing? What kinds of projects?

At Michigan environment is dominated by the Flux cluster: HPC and HTC, 28k computational cores. Researchers aren’t looking to do large-scale compute in the cloud.

In 2015 AWS program to explore cloud. Received 20 x $500 AWS credits. Most were small scale projects. Was primarily used by web applications – easy to learn and use. Researchers working used AWS for data storage and compute. Easier to collaborate with colleagues at different institutions – researchers can manage their collaborations.

Pros and cons: Can be support intensive, need to train staff. Good for self-sufficient researchers (with savvy grad students). User setup can be made hard. Is it really cheaper?

Duke – 60% of research compute loads were using 1 core and 4 GB of RAM – ideal for moving to the cloud.

Asbed Bedrossian – USC, Jeremy Hallum, Michigan

Containerization can help with reproducibility of science results. That’s an area where IT can partner to make containers for researchers to distribute.  Science is a team sport now, complexity increasing rapidly. Challenge is to get attention of researcher to change how they work – need to meet person in the field (agricultural extension metaphor). Central IT can help facilitate that, but you can’t just hang out a shingle and hope they come. Ag Extension agents were about relationships.

Notre Dame starting by helping people with compliance issues in the cloud – preparing for NIST 800-171 using GovCloud.

Getting instructors to use modern tools (like checking content into Git and then doing the builds from there) can help socialize the new environments.

Harvard hopes to let people request provisioning in ServiceNow, then use Terraform to automatically fire off the work.  Georgetown looking at building self-service in Ansible Tower.

Research computing professionals will need to be expert at pricing models.



Life in the Cloud


The other day I shared a mind map I was working on with my colleague Jim Loter. I had authored the mind map using Nova Mind, a nice piece of software that I’ve used for several years on my Macs. Jim’s a Windows user. There is a Windows version of NovaMind, but Jim turned around and recreated the mind map using MindMeister, a new Web-based collaborative mind-mapping tool that works in the browser. Jim and I are now happily working together on iterating the mind map for a project we’re working on. While MindMeister doesn’t have all the rich functionality of NovaMind, it’s plenty good enough for my rather basic uses, and the ability to work within the browser from any computer I happen to be at and to collaborate easily with my colleagues makes it even better than a desktop program.

So that experience got me thinking about all the work I actually do in the Internet cloud these days. Over the last month or two I’ve been working out of two different office locations at the UW, and I’m finding it much easier to just use the various cloud computing services, like Google Docs and Calendar, Remember The Milk (to-do lists), and now MindMeister for a set of activities where I would previously have used heavyweight desktop applications like Word or Apple’s Pages. I’ve already written about how my blog is now hosted in the cloud. And at this point I’d say that most of my professional collaborative contacts take place in the cloud, through Twitter, Facebook, Skype, or the IM services.

It’s a new computing world out there. Those of us in IT support roles had better get used to it.