Cloud Forum 2016 – Cloud DevOps and Agile, 1 Year In

Melanie McSally, Ben Rota – Harvard

Cloud Program since February 2015 – Migrated 285+ applications (43% of goal), implemented Cloud Shield, Designed and implemented centralized cloud billing. Only 42 apps were lift and shift. Even simplest migration ends up having lots of refactoring.

All new applications have been put in the cloud

IdM team realizing $8500/month in savings by using elastic sizing of resources

Lessons learned –

  1. Get security and network design right as early as possible. Goal was to make cloud security as good or better than on premise
  2. Moving to cloud is 2 parts culture : 1 part technology. Be prepared to answer basic, non-technical questions – If things are working fine now, why move? Will cloud really save money? I understand cloud is the future, but we’re really busy! Doing thing the right way takes too long!
  3. You won’t do as well when you have to split your focus. When things get migrated, the app teams have to manage in two environments. Better to migrate entire portfolios at once.
  4. Everyone is accountable for the cloud – teams need a shared vision, shared goals, and aligned priorities. Corollary: When teams come forward really fast, it’s likely because they have a technical challenge you might not want to touch. Understand training needs of those you work with, before you get there.
  5. Communicate, communicate, communicate! Create a unified baseline understanding. Build partnerships to figure out the questions. Be open and transparent. Address the workforce fears up front.
  6. Don’t max out all the dials at once! Started with new program with new teams, new technology (Cloud), new management, and new processes (Scrum). In retrospect would have provided more help for the team. They didn’t have developers in their cloud program – they would change that if doing it over.
  7. Migrations + engineering + operations = impossible. Recommendation is to create small teams and have them focus on a specific goal. Separated migrations from operations. Operations will quickly consume all capacity.
  8. Cost savings take time to actualize. Learning how to manage costs in the cloud takes time. Could save money if they could close the data center (power and real estate expensive in Cambridge), but in a shared environment that’s hard. Push other benefits of cloud.
  9. Don’t forget about cost management.
  10. Be open to changing your strategy when new information presents a better way.

Cloud Forum 2016 – ERP In The Cloud

Jim Behm (Michigan), David McCartney (Ohio State), Glen Blackler UC Santa Cruz, Erik Lundberg, Washington

UMich – Currently running PeopleSoft (Student, Fin, HR) Click Commerce, Blackbaud. Investigating IaaS for the Student system and planning others.

Ohio State – Currently running PeopleSoft (Finance, HR, and Student) converting Finance to Workday and then HR. Exploring Workday Student. Timing 3 years for finance, 5 years for HR.

UC Santa Cruz – Banner, PeopleSoft, custom IdM on Solaris. Moving it all to AWS by Spring 2018.

Washington – Most modern ERP is 30+ years old on Cobol mainframe. Moving HR/Payroll to Workday, others to follow. Launching in June 2017. Completely restructuring business processes around HR, creating a single service center. Then will tackle Finance. Lessons learned – don’t try implementing software without redoing business process. Looking at how to create sustainable organization capable of tackling these huge projects over 15-20 years.

What impact has your cloud move had on your IT staff?

Bentley University: Didn’t take into account the level of effort involved in regression and security testing. Unanticipated costs and resource issues.

Notre Dame moving ERP to AWS. Had a big impact on storage team who don’t need to do what they used to.

Harvard moving Peoplesoft HR into the cloud. Looking at it as a people issue, not technology. Very sensitive data and people who manage it on premise are invested, but don’t have the skills in the cloud. Don’t want to rely on the cloud team’s expertise. Holding Peoplesoft Day once a week with a consultant who has expertise moving Peoplesoft to AWS, the cloud team, and the Peoplesoft team, working together to solve problems and remove barriers. Building continuous integration and lots of automation.   Arizona doing that too.

Ohio State gutting the data warehouse and rebuilding from scratch. Not sure yet where it will end up.

How have you dealt with information in the cloud and the security ramifications?

Ohio State – Workday is different in terms of access than something like Box. Running into challenges getting enough visibility into the system. Concerns about ability to get logs and information they can consume.

UCSC – people don’t understand the different between SaaS and IaaS. Having to educate them on the local responsibilities still inherent in moving to AWS.

If you chose SaaS how did you enlist your campus and business partners to sacrifice flexibility of the current way for business standardization? How challenging was this?

At Cornell HR decided to move to Workday without consulting IT initially. Was a wake up call for IT in terms of commoditization.

Cloud Forum 2016 – Migration of OnBase to AWS

Sharif Nijim – Notre Dame

OnBase in AWS? Really? Windows app. AWS does Windows fine, OnBase doesn’t do the cloud fine. Licensing is painful for using elasticity. Looked at Hyland’s own hosted offering, but it was way more expensive.

A few lessons learned: EFS  doesn’t do CIFS – not useful if you want Windows File Service in the cloud. There are some products that can help. If they had to do it over again they’d probably run their own Windows File Servers in AWS, but they used Panzura because they had some licenses.

Moving the data – had a couple of terabytes. Tried AWS Snowball. Was complete overkill for what they needed. Transferred 16 GB of database in about 17 minutes. S3 multi-part parallel works well, but there were ~7 million small document files. Had to zip it up for transfer optimization and then rehydrate. Then tried Robocopy to trickle data over a couple of weeks. In order to make a choice, had to understand how the application actually works. Document is written to disk and never changes (annotations go in database). OnBase segments by directory loosely the size of a DVD. So it doesn’t matter if it takes a long time to move data, as it never changes.

OnBase uses NTLM Auth, which doesn’t work well with load balancers, so had to stand up HAProxy. Hoping that OnBase will implement Shib in the next year or so. Notre Dame default procedure is to shut down servers outside of 7am – 7pm. But with OnBase the hash codes for the licenses screw up how the license checks work. Had to get rid of load balancers and elasticity. Still gained separation of web tier from app.

June 25 2016 cut over production users and nobody noticed. Shut down 25 servers and liberated 2 TB of space in production environment.

Plea to cloud providers – make it easy to provision automation from the GUI,  by exporting to templates or whatever.

Cloud Forum 2016 – Lightning Rounds #1

5 minute lightning rounds

How the Cloud is living up to its promise in Cornell Student Services – Phil Robinson

Might have the largest apps portfolio at Cornell – around 190 apps and sites, POS systems, etc. Compliance requirements including HIPAA. Pain points include lots of technical debt from inherited tech. Lots of time spent keeping up with server patching and upgrades. Looking to leverage elasticity to match student cycle spikes. Built a class roster with scheduler on AWS – scaled to over 1k simultaneous users in July, then scaled down. They have 10 apps in production in AWS. Identified an inspired team member to act as champion, prioritized cloud solutions. “Automate like crazy”

Using AWS workspaces for Graduate Students in applied social sciences – Chet Ramey & Jeff Gumpf (Case Western)

pilot project to test virtual desktops via AWS Workspaces. Department was eliminating a computer lab as building was being remodeled. Workspaces are easy to provision, manage, and use on multiple devices. Each person gets a Workspace, provisioned with stats software and other tools in Spring 2016, paid for by central IT. Originally planned for 3 courses and 26 students. Initial setup took about one hour. After first week of operation the pilot was expanded to 6 courses and 110 workspaces. Users were provisioned through the AWS Console. Built a master Workspace, created an image and two bundles from it, and used them to provision users. Problem with SPSS installer – won’t run on Windows Server. Got around that. Included Google Drive client for storage. About $150/student/semester, but with new AWS hourly pricing would be ~ $80.

Bringing IT Partners on Campus Along For the Ride – Susan Kelley (Yale)

Technology Architecture Committee – govern design and architecture, approve POCs, encourage documentation of strategies, working groups. Reviewed 31 projects in the last year. Formed a Cloud Working Group – 8 central IT staff and 7 IT partners. Decision 1: AWS and Azure. Med School helped with how to interpret Azure bills. School of Architecture wanted to get out of managing servers locally – used as test case for VPC, within one year migrated all their infrastructure to cloud. Now they go around telling other IT teams what they learned.

Securing Research Data: NIST 800-171 Compliance in the Crowd – Bob Winding (Notre Dame)

Lots of work will need to be compliant by end of 2017.  Research that contains “controlled, unclassified information” – ITAR. Held a workshop with AWS and several other schools. Worked to create a Quickstart Guide and a Purdue Educause paper. GovCloud and Quickstart cover about 30% of the controls mandated. GovCloud is US persons only region, so that helps. Providing a VDI/RPC gateway in the Shared Services VPC – VDI client is the audit boundary. As long as you run on University-managed equipment you have an isolated environment. Still process-intensive, but you don’t have to worry about infrastructure.

Cloud Adoption: A Developer’s Perspective – Brett Haranin

Cost Engineering in AWS: Do Like We Say, Not Like We Did – Ben Rota

Lessons learned:

  • Be careful that your people don’t confuse “cheap” with “free”
  • For cost estimates, you generally only need to worry about RDS, EC2, and S3
  • Easiest way to save money is to shut down what you don’t need (engineers aren’t used to doing this on premise)
  • Enforce tagging standards that help you understand your spend (including tags for testing)
  • Look out for unattached storage
  • Consider over-provisioning storage rather than buying PIOPS
  • Multi-AZ RDS instances are a low-risk way to get into RI purchases
  • Real bang for buck in RI purchases is to do them at all

How to do? Set up Trusted Advisor or third party tool to help get the view of what’s going on.

Dirty Dancing In The Cloud – Scotty Logan

Why are we moving to the cloud? Geo-diversity, scalability, etc. Don’t forklift to the cloud. “Go Cortez” – burn your boats behind you. Go to new stuff, DevOps, CI, CD, etc. But you still have FIrewalls, IP Addresses – tightly coupled. Use TLS and client certs instead … but my CISo says we need to use static IPs and VPNs! If you have to, use NAT Gateways (AWS Service)… until you can get to the happy place.

Jetstream: Bob Flynn (Indiana)

Expanding NSF XD’s reach and impact. 70% of NSF researchers claimed to be resource constrained. Jetstream is NSF’s first production cloud facility. Infrastructure running Indiana and Texas, with dev at Arizona. Built on OpenStack. For researchers needing a handful of cores (1 to 44), devs, instructors looking for a course environment. Set of preconfigured images (like AMIs) to pick from. Went live September 1, over 125 XSEDE projects. NSF soliciting allocation requests, including Science Gateways.

Research Garden Path Case Studies – Rob Fatland (Washington)

CloudMaven – a github repo. Don’t recreate solutions. Has AWS procdurals. Has page on HIPAA compliance.

Prototyping library services using high performance NoSQL platform Erik Mitchell (Berkeley)

Costs about $8 per book to put it in a storage facility. Looked at levels of duplication across two libraries and 27 fields = 378 million data items to compare. Looked at big data solutions. Used BigQuery on Google – NoSql database with a GUI and SQL-like query language. Was able to analyze the data easily and discovered lots of places to save effort and money. Not everything needs to be an enterprise service, if the cloud service is easy enough to use at a local level.

Umbrellas for  a Rainy Day: Cloud Contracts and Procurement – Sara Jeanes (Internet2)

In the world of cloud, ” contract is king” – all you own is the contract. Typical procurement processes are long and cumbersome – doesn’t work for the cloud. Challenges: Timeliness, Risk Management, Price Variability, Pilots and Trials. Possible solutions: Consortia, “piggybacking”, Community communication and collaboration

Red Cloud and Federated Cloud – Dave Lifka (Cornell)

Talked to lots of researchers – they liked everything a la carte – cheap computing on demand, with no permanent staffing or aging hardware.  Built Red Cloud, a Eucalyptus stack (100% AWS compatible so you can burst). Gave each of them root, but then they built a subscription model and a for-fee service building VMs for researchers. Available externally as well as internal to Cornell. Aristotle – data building blocks. Bursting to other institutions and then AWS. Building an allocation and accounting system. It’s about time-to-science. Portal to tell people what resource they can get when and at what cost.


Cloud Forum 2016 – Routing to the Cloud

DNS – Notre Dame (Bob Winding)

Found early that everything relies on DNS. Need to integrate with AWS DNS to take advantage of both. How many “views” do you have on campus? Want to resolve all the views, but not undermine virtues of Route 53. Think about query volumes, what do you do on campus? They delegate zone administration with Infoblox on campus, but it doesn’t have great granularity for automation. AWS has great IAM controls for automation, but not granular delegation. They use Infoblox as primary DNS, but looking at creating more authoritative zones in Route 53 so they can take advantage of the automation when spinning up new systems.

What do we do with publicly addressed end-points on campus? Had to have a way of routing public endpoints to private address space. When you put in VPN tunnels you create hidden peering connections via your campus, so you need to put ACLs in place. Need to think about visibility of traffic.

AWS Security – Boston University (Gerard Schockley)

Lessons learned around Security Groups – change philosophy from individuals with desktop access to servers to using a VPN group or a bastion host. A challenge to convince them they can’t have dedicated access. How to reassemble breadcrumbs for forensics? VPC flow logs, CloudTrail, OS logs, is a time-consuming challenge. j

AWS Data Level Security – Harvard 

Have a policy to encrypt everything. Turning encryption at rest on by default. Encyrpt database backups (RMAN). Also need to encrypt data in transit – haven’t needed to do that on premise between non-routable subnets. Needed to work with app owners to make sure data gets encrypted in transit. Some institutions installing TripWire on individual systems. Looking at replicating data to other vendors. Libraries are in a bind because of their replication strategies make them unable to trust cloud vendors. There’s some discussion of whether we can urge vendors towards using some of the kinds of archival standards for preservation of digital materials that have evolved in the library world.

Notre Dame refactoring their security groups so that services are in groups and databases are in groups and users are in groups, and they can specify what apps can route traffic to which databases, not relying on IP addresses. That’s hard to do if you have to integrate on premise resources that don’t talk the same kind of security groups.

CalTech killed a $750k VDI on-prem project and is looking at AWS Workspaces very closely.

Most campuses seem to be building infrastructure like identity services into a “core VPC”. There is a 50-something peering limit before you hit some performance limits. One school is only going to peer VPCs for Active Directory and will open public IPs for Shib, LDAP, etc.

Stanford moving their production identity infrastructure to AWS in the next year, in containers. Other schools also heading that direction. Cornell has put AD into AWS, using multiple regions.

Notre Dame looked at AWS’ directory service, but it needed a separate forest from campus, so didn’t meet their needs.

Notre Dame planning to put VPN service into the cloud as well as on premise so it will continue to exist if campus is down. Arizona standing up AD in Azure, bound to campus and setting up some peering to AWS. Boston moving all their AD infrastructure to Azure – looking at Azure AD.  Stanford looked at Azure AD but decided not to use it and are building their own AD in Azure.

IPS/IDS in your VPC? Gerard – cost is “staggering”. Stanford using CoreOS, which can’t be modified while running, and running IDM systems in read-only containers – that provides intrusion prevention.

Cloud Forum 2016 – survey results and Grit!

We’re at Cornell for the 2nd Cloud Forum! We started out last night with a lovely reception and then Bill Allison, Bob Flynn and I had a great dinner at the Moosewood Restaurant.

This morning kicks off with some summaries from Gerard Shockley of the registration survey. There are 92 registrants (which was capped) from 52 institutions and 4 technology providers (AWS, Microsoft, Google, ) and attendees in 83 roles, from CIO to architects to faculty.

  • 75% reported that cloud strategy is a work in progress.
  • 52% using AWS, 17% Azure, 19% Google, Oracle 2.7%, 205 using some form of “community cloud”
  • 71% report no validated cloud exit strategy
  • 30% say they’re “cloud first”, 52% using “opportunistic cloud”
  • 79% report on-premise facilities that are > 5 years old.
  • Most realize that reducing cost is not the main reason to move to the cloud. Improving service, added flexibility and agility, and improving support for research rank high.
  • Staff readiness is the highest ranked obstacle to broad cloud adoption.
  • 34% have signed a Net+ agreement for either IaaS or Paas.
  • 70% have central IT involved in cloud support for Research Computing
  • 28% say their institution plans on performing clinical research in the cloud
  • 56% say they have signed a HIPAA BAA with a cloud service provider

Next, a session from Sharif Nijim from Notre Dame titled “Grit!”

There’s a shift in how we do things – e.g. from capacity planning to cloud financial engineering. Picking a partner to provide infrastructure services is a whole new level of trust. Hiring staff who can deal with the rate of change in the cloud is critical and hard. We’re all running software that is cloud unfriendly – how many of us are helping the vendors evolve? We’re all prototyping and learning and putting things in production and continuing to learn – sometimes the hard way.