Higher Ed Cloud Forum 2017 – Intro and Multi Account AWS Strategy

Survey Results

46 institutions attending, 4 vendors, 81 unique roles among 90 attendees.

40% cloud first, 12% have a documented cloud exit strategy.

82% AWS, 14% Azure, 4% Google, 2% other

Staff readiness is the #1 obstacle to broad adoption

42% have signed the I2 Net+ agreement, 11% have enterprise agreement with cloud provider

21% have containers/serverless in production, 9% non-prod, 70% not currently adopting.

Managing and Automating a Multi-Account Strategy in AWS: Brett Bendickenson (Arizona)

Have their own agreement with AWS. Currently have about a 75 accounts in their consolidated billing. 24 accounts in central IT.

UITS Cloud Advisory Team — cross functional group from within UITS to advise and decide on cloud practices and policies.

  • Tagging Policy – extremely important to get right up front. Service, name, environment, created by, contactnetid, accountnumber, sub account

Multi-account strategy. Workloads segregated into production and non-prod accounts. Tipping point was properly restricting everything by permissions – can do it with IAM roles, but it’s a lot of work. Decided on further segregation by teams / technologies, e.g. Kuali, PeopleSoft, IAM. Each has prod and non-prod accounts.

Each account has an account steward (director or dept. head) — responsible for spend, security, etc. Each account has an email list, with the address used for the root login address. Password stored in common vault, secured with MFA hardware token (kept in Ops). Linked to a central billing account. Set of account foundation templates are deployed. Started using AWS Organizations.

Account foundation modeled after the AWS NIST 800-53 Quickstrart CloudFormation Template. Set of CloudFormation templates which deploy roles, security controls, etc. Sets up an EC2 instance that runs a set of Ansible playbooks that set up Shib, bas AWS info, IAM, Logging, Lambda.

Federated Roles – SysAdmin, IAMAdmin, InstanceOps, ReadOnly, BillingPurchasing. Using Grouper for authorizations.

Using federated identities, no IAM users (generally).

CloudTrail enabled in all accounts. Enabled for all regions, records all API calls, sent to a central S3 Bucket in root account. CloudTrail logs also saved to CloudWatch logs in account for local reference.

Alarms set for changes in Network ACL, Security Group changes, Root Account activity, unauthorized access, IAM Policy changes, access key creation, cloud trail changes. (not all used in non-prod)

Lambda Functions – Alarm details (interrogates cloud trail events and sends actual API calls that raised the alarm); CreatedBy automated tagging for EC2 instances; OpsWorks tagging helper; OpsWorks tagging helper; Route53 helper (updates DNS); Tag monitoring – checks tags on instance launch (looking at Cloud Custodian from CapitalOne (open source)); AMI lookup

Arizona’s code: https://bitbucket.org/ua-ecs/service-catalog

CSG Winter 2017 – Cloud ERP Workshop

Stanford University – Cloud Transformations – Bruce Vincent

Why Cloud and Why now? Earthquake danger; campus space; quick provisioning; easy scalability; new features and functions more quickly

Vision for Stanford UIT cloud transformation program: Starting to behave like an enterprise. Shift most of service portfolio to cloud. A lot of self-examination – assessment of organization and staff. Refactoring of skills.

Trends and areas of importance: Cloud  – requires standards, process changes, amended roles; Automation – not just for efficiency – requires API integration; IAM – federated and social identities, post-password era nearing for SSO; Security – stop using address based access control; Strategic placement of strong tech staff in key positions; timescale of cloud ignores our annual cycles.

Challenges regarding cloud deployments: Business processes tightly coupled within SaaS products, e.g. ServiceNow and Salesforce; Tracking our assets which increasingly exist in disparate XaaS products; Representing the interrelationships between cloud assets; Not using our own domain namespace in URLs.

Trying to make ServiceNow the system of record about assets – need to integrate it with the automation of spinning instances up and down in the cloud.

Cloud ERP – Governance and Cloud ERP – Jim Phelps, Washington

UW going live with Workday in July. Migrating from old mainframe system and distributed business processes and systems. Business process change is difficult. Built an integrated service center (ISC) with 4 tiers of help.

Integrated Governance Model:  across business domains; equal voice from campus; linking business and technology; strategic, transformative, efficient…

Governance Design: Approach – set strategic direction; build roadmap; govern change – built out RACI diagram.

“Central” vs “Campus” change requests – set up a rubric for evaluating: governance should review and approve major changes.

Need for a common structured change request: help desk requests and structured change requests should be easily rerouted to each others’ queues.

Governance seats (proposed): 7 people – small and nimble, but representative of campus diversity.

Focus of governance group needs to be delivering greatest value for the whole university and leading transformational change of HR/P domains. Members must bring a transformational and strategic vision to the table. They must drive continuous change and improvements over time.

Next challenge: transition planning and execution – balancing implementation governance with ISC governance throughout transition – need to have a clear definition of stabilization.

Next steps: determine role of new EVP in RACI; Align with vision of executive director of ISC; provost to formally instantiate ISC governance; develop and implement transition plan; turn into operational processes

UMN ERP Governance – Sharon Ramallo

Went live with 9.2 Peoplesoft on 4/20/2015 – no issues at go-live!

Implemented governance process and continue to operate governance

Process: Planning, Budgeting; Refine; Execution; Refine

  • Executive Oversight Committee – Chair: VP Finance. Members: VP OIT, HR, Vice Provost
  • Operational Administrative Steering Committee: Char: Sr. Dir App Dev;
  • Administrative Computing Steering Committee – people who run the operational teams
  • Change Approval Board

Their CAB process builds a calendar in ServiceNow.

USC Experience in the Cloud – Steve O’Donnell

Current admin systems  – Kuali KFS/Coeus, custom SIS (Mainframe), Lawson, Workday, Cognos

Staffing and skill modernization: Burden of support shifts from an IT knowledge base to more of a business knowledge base – in terms of accountability and knowledge.  IT skill still required for integrations, complex reporting, etc. USC staffing and skill requirements disrupted.

Challenges: Who drives the roadmap and support? IT Ownership vs. business ownership; Central vs. Decentralized; Attrition in legacy system support staff. At risk skills: legacy programmers, data center, platform support, analysts supporting individual areas.

Mitigation: establishing clear vision for system ownership and support; restructure existing support org; repurpose by offering re-tooling/training; Opportunity for less experienced resources – leverage recent grads, get fresh thinking; fellowship/internships to help augment teams.

Business Process Engineering – USC Use cases

Kuali Deployment: Don’t disrupt campus operations. No business process changes. Easier to implement, but no big bang.

Workday HCM/Payroll: Use delivered business process as starting point. Engaged folks from central business, without enough input from campus at large. Frustrating for academics. Workday as a design partner was challenging. Make change management core from beginning – real lever is conversations with campus partners. Sketch future state impact early and consult with individual areas.

Current Approach – FIN pre-implementation investment

Demonstrations & Data gathering (requirements gathering): Sep – Nov. Led by Deloitte consultants; cover each administrative area; work team identifies USC requirements; Community reviews and provides feedback. Use the services folks, not the sales folks.

Workshops (develop requirements)- Nov – Feb. Led by USC business analysts, supported by Deloitte; Work teams further clarify requirements and identify how USC will use Workday; Community reviews draft and provides feedback

Playbacks (configure): March – May. Co-led by consultants and business analysts; Workday configured to execute high-level USC business requirements; Audience includes central and department-level users

Outcomes: Requirements catalog; application fit-gap; blueprint for new chart of accounts; future business process concepts; impacts on other enterprise systems; data conversation requirements; deployment scope, support model

CIO Panel – John Board; Bill Clebsch; Virginia Evans; Ron Kraemer; Kelli Trosvig

Cloud – ready for prime time ERP or not? Bill – approaching cautiously, we don’t know if these are the ultimate golden handcuffs. How do we get out of the SaaS vendors when we need to? Peoplesoft HR implementation has 6,000 customizations and a user community that is very used to being coddled to keep their processes. ERP is towards the bottom of the list for cloud.

Virginia – ERP was at the bottom of list, but business transformation and merger of medical center and physicians with university HR drove reconsideration. Eventually everything will be in the cloud.

John – ERP firmly at the bottom of the list.

Kelli – at Washington were not ready for the implementation they took on – trusted that they could keep quirky business processes, but that wasn’t the case. Took a lot of expenditure of political capital. Everyone around the table thought it was all about other people changing. Very difficult to get large institutions onto SaaS solutions because the business processes are so inflexible. Natural tendency is to stick with what you know – many people in our institutions have never worked anywhere else. Probably easier at smaller or more top-down institutions.

Ron – Should ask is higher-ed ready for prime time ERP or not? We keep trying to fix the flower when it fails to bloom. People changing ERPs are doing it because they have to – data center might be dying, cobol programmers might be done. Try to spend time fixing the ecosystem. Stop fixing the damn flower.

Kelli – it’s about how you do systemic change, not at a theoretical level.

Bill – what problem are we trying to solve? Need to be clear when we go into implementations. At Stanford want to get rid of data centers -space at too much of a premium, too hard to get permits, etc.

John – there’s an opportunity to be trusted to advise on system issues, integration, etc.

Kelli & Ron – The financial models of cap-ex vs. op-ex is a critical success factor.

Ron – separating pre-sales versions from reality is critical. That’s where we can play an important role.

John – we have massive intellectual expertise on campus, but we’ve done a terrible job of leveraging our information to help make the campus work better. We’ve got the data, but we haven’t been using it well.

Bernie – we need to start with rationalizing our university businesses before we tackle the ERP.

Ron – incumbent on us to tell a story to the Presidents. When ND looks at moving Ellucian they think what if they can stop running things that require infrastructure and licenses on campus? Positions us better than we are today. Epiphany over the last 6 months: We have to start telling stories – we can’t just pretend we know the right things to do. Let’s start gathering stories and sharing them.

Kitty – Part of the story is about the junk we have right now. The leaders don’t necessarily know how bad the business processes and proliferation of services are.

Cloud Forum 2016 – Cornell’s BI move to the cloud

Jeff Christen – Cornell

Source Systems – PeopleSoft, Kuali, WOrkday, Longview. Dimensional data marts: finance, student, contributor relations, research admin. BI Tools – OBIEE and Tableau

They do data replication and staging of data for the warehouses. Nightly eplication to stage -> ETL -> Data Marts

Why replication/stage? Consistent view of data for ETL processing, protects production source systems; tuning for ETL performance.

Started journey to cloud 2 years ago. Were using Oracle streams – high maintenance, but met some needs. Oracle purchased a more robust tool and de-supported Streams. ETL tools challenge – were using Cognos Data Manager for 90% of their work, but IBM didn’t continue to support it. Replaced it with WhereScape RED, but requires rewriting jobs.  Apps were already moving off-premise. WorkDay for HR/Payroll, PeopleSoft to AT&T hosting; Kuali financials moving to AWS. Launched pilot project to answer “what would it take to run data warehouse environment in AWS?”

Small pilot – Kuali warehouse in AWS. Which existing tools will work? Desire to use AWS services such as RDS where possible; Testing of both user query performance and ETL performance.

Why Oracle RDS and not Redshift? Approximately 80% of the Kuali DW is operational reporting. Needs fine-grained security at the database level; A lot of PL/SQL in the current environment; Currently exploring Redshift for non-sensitive high volume data

Some re-architecting: Oracle Streams not supported with Oracle RDS (used Attunity). Oracle Enterprise Manager scheduler not supported with Oracle RDS – using Jenkins (so beautiful and simple); No access to OS on RDS databases – installed Data Manager on separate Linux EC2 instance; Using WhereScape to call Data Manager from the RDS database.

Need to be more efficient. On premise the KDW had two physical servers. Found some inefficiencies in ETL code and some dashboard queries were masked by large servers. Prioritization of ETL code conversion by long running areas helped get AWS within nightly batch window. Some updates made to dashboards to improve performance or offer better filter options. Hired database tuning consultant (2wk) to help with Oracle tuning.

Testing and User Perception. Started with internal unit testing. Internal query execution time comparisons between on premise and AWS. User testing of dashboards on premise versus AWS. Repoint of production OBIEE financial dashboards to AWS for a day (x3). Some queries came back faster, some slower. Went through optimization and tuning to get it comparable across the board.

Cutover to AWS. Cutover Sept. 8. Redirected all non-OBIEE ODBC client traffic in October. Agreed to keep the on premise KDW loading in parallel for two month end closings as a fall back.

Next Steps. Parallel Research Admin Mart already in AWS – expect cutover by end of CY. Need more progress on ETL conversion before moving student and contributor marts. Continue Big Data / non-traditional data investigation (Cloudera on AWS). Redshift for large non-sensitive data sets.

Lessons learned: Off premise hosting does not equal Cloud technology. Often hard to get data out of SaaS apps.

Cloud Forum 2016 – Lightning Rounds #2

Cloud VDI – Bob Winding (Notre Dame)

Use cases they looked at:

  • Classes that need locally installed software
  • Application delivery instead of high-end lab machines
  • Workstations for researchers wher the whole project is in the cloud
    • NIST 800-171, ITAR, etc
    • Heavyweight, graphics and processing-intensive work

Looked at: Workspaces (AWS); Microsoft RDP and RDP Gateway, Fra.me, Ericom Blaze and Ericom Connect

Performance is everything – did tests with PTC Creo, Siemens NX10, and Solidworks. Set up test environment in Oregon. Nobody in central IT knew how to operate the software. Found in almost every case that the remote setup was beating the local desktop performance. In some cases, local environment crashed under load, but in AWS loaded in under 2 minutes. (G2X.large).

Researchers observed that they can transfer

Cloud Governance – Do You Need a CLoud Center of Excellence? Laura Babson (Arizona)

a group that leads an organization in an area of focus

Establish best practices, governance, and frameworks for an organization

Applications vs Operations – what do you about tagging, automation, monitoring, security, etc. Don’t want to end up with different ops solutions for different applications.

CoE can help streamline decision making. CCoE can make decision if funding isn’t required, or make a recommendation to a budget committee if funding is required.

Recent decision making: Account strategy – how many and where to put each workload? Campus to Cloud Connectivity’ Monitoring; Tagging policy

Can help with communication and engagement across the organization

AWS CloudFront – Gerard Shockley (Boston U)

What is a CDN? geographically dispersed low latency, high bandwidth solution for distributing http and https.

Terminology: Distribution (rules that control how cloudfront will access and deliver content); Origin (where the content lives)

Only works with publicly visible infrastructure at AWS

Easy to get metrics and drill down into specifics

DevOps != DevOps – Orrie Gartner (Colorado)

Brought a new data center online 3 years ago to consolidate IT across campus, built a private cloud

Ops and Devs teams work close together, automating everything, fine with accepting higher risks, building strong relations between teams, performing continuous integration and deployments.

Didn’t go well this summer moving to the public cloud – lack of understanding of vision and goals from other silos.

Ensure the entire enterprise strives for the same end goal, communicates that goal

Created a vision and articulated cloud strategy. 6 phase roadmap to to public cloud, includes embracing DevOps culture. Line in strategic plan – encourages every team to articulate how they will embrace DevOps concepts.

Educate Up. Educate Laterally. Educate Down.

Change is not easy – changing culture in the organization. Prosci ADKAR – model embraced for making organizational change. Small steps, like encouraging process folks to use Jira, the same tool used by the devs and ops folks.

Us versus Them – a View From the Information Security Bleachers- David McCartney (Ohio State)

Security is not the enemy – they’re scared, unaware, and unprepared for the cloud.

Scared – “how can we stop you?”

Unaware – why move? what kind of data? what security is needed (vs. what you think you need)? what did we do to deserve this?

Unprepared – How do current security services expand? What do you mean “no agent”? Logging? Auditing? Access management? Vulnerability scans? incident response? What about regulatory and framework requirements?

Model Us + Them – Embrace security, buy them booze.

Engage security early, sell the opportunity to do something new and exciting, provide options for training and guidance.

MCloud: From Enable to Integrate – Mark Personett (Michigan)

MCloud is an umbrella service. Strictly IaaS – currently offering AWS, but might mean others later

First iteration launched in 2014 – access to UM enterprise agreement, optional consolidated billing; data egress waiver; M Cloud Consulting Service

Working on launching M Cloud AWS Integrate: provisioning – private network space, shibboleth integration, etc; Guardrails – security best practices, common logging, reporting, etc; Foundational services in AWS – AD, Shib, Kerb, DNS, etc; Site to Site VPN services.

Azure Remote App – Troy Igney (Washington U in St. Louis)

two core requirements when enrollment in second year CS class spiked. Needed Visual Studio. New computers too expsensive. On prem VDI – too expensive. Off Prem VDI – Azure Remote App.

Goal – deliver consistent development environment across a range of BYOD devices.

Challenges: Support an entire class’s logons at once. Required Micsrosoft off-menu configuration.

Advantages – template once and deploy, capacity costs based on current enrollment – dynamically adjust for enrollment changes.

Largest RemoteApp deployment directly supporting classroom delivery.

Microsoft dropped RemoteApp in favor of Citrix virtualization technologies.

Lots of lessons learned supporting remote VDI

Adopting Cloud-Friendly Architecture for On-Premise Services – Eric Westfall (Indiana)

Indiana primarily on premise with an increasing amount of SaaS. Have newer data centers and heavy investment in VMWare. Inevitable to get to hybrid environment, but in the meantime working to be prepared – “cloud-ready” app architecture.

12 factor principles
Stateless Architecture
Microservices
Object Storage (using S3 API in on-prem solutions)
Non-Relational databases

Facilitating DevOps culture

Containerization – investing heavily in Docker. Adopting Docker Data Center

Hope it will allow to take advantage of existing infrastructure investments. Give dev and ops staff opportunities to experiment with cloud services. Allow modernization of app architecture and deliver practices. Prepare for inevitable future.

Cloud Initiative and Research – Steve Kwak (Northwestern)

Cloud Governance – October 2015. IT Directors from the schools and enterprise IT. Hired a consultant to help develop governance.

Cloud Architecture and COnsulting Team – April 2016 – 5 initial team members. set up initial environments at AWS and Azure. Worked through billing and accounts, and providing consulting.

Running cloud days and “open mic” sessions with AWS .

Research environments – 3 centrally managed – HPC (heavy upfront investment for dedicated compute, always a queue); Social Science cluster (aging infrastructure, limited support); Research data storage (separate storage from HPC). Looking to burst HPC to the cloud and move the other two.

Genomics pilot in AWS. Hire on a 3rd party team to put architecture together.

HPC Environment -working on targeting specific workloads in cloud with scheduler, and figure out bursting.

Controlled Approach to Research Computing in AWS – Paul Peterson (Emory)

Mindset of security team – need a similar set of controls in cloud as on-premise. This is quite challenging.

Started working to build Research Cloud. Collected 24 use cases and put them in three categories, divided into 2 VPC types. Worked with AWS professional services to build out VPCs. Pilot started this summer, going to end of year.

Type1 VPC- one availability zone, no Internet gateway – access only through Emory. Single sign-on with Shib.

Tpe2 has two availability zones, and an Internet gateway.

Goal of project team is to make requests for VPCs easy. Automation is key.

Generate VPC service. Created an inventory of accounts, LDS groups, Exchange distribution lists, and CIDR ranges.

Service gets next available account, adds admins to LDS group, creates SAML provider, Creates account alias, selects cloudfront template, get next available CIDR range Creates stack, compute subnets for account. Takes less than 5 minutes.

We Demand, On-Demand: Berkeley Analaytics Environments, VDI and the Cloud – Bill Allison (Berkeley)

Central IT budgets getting cut 10% year-over-year.

VDI use cases have been mostly around desktop pps, not research. Funded a pilot through December. User and use-case driven (faculty oriented) – need to tell story from a faculty perspective. Research IT group is like field workers, mst with PhDs.

Analytics Environment on Demand – not a change in the way you compute, at least on the surface. Use the skills you know already. Creating an abstraction layer.

Art of Letting Go – Relationship advice for dev and ops in the cloud – Bryan Hopkins (Penn)

Team lead for cloud app dev team. Cloud First program – replace homegrown frameworks with off the fhelf frameworks; replace waterfall with agile; replace monliths with integrations and composed apps

Three things we’ve learned so far: 1. Have a clear try-and-scrap phase in R&D – give it leeway. 2. Accept that interests and traditional roles will collide. Dev team can help with platform tasks, ops team can help with dev. Everyone cares about Jenkins. Bring them together. 3. Let go of notions of perfection and clean lines. Off-the-shelf means you get what’s on the shelf.

 

 

 

Cloud Forum 2016 – Cloud DevOps and Agile, 1 Year In

Melanie McSally, Ben Rota – Harvard

Cloud Program since February 2015 – Migrated 285+ applications (43% of goal), implemented Cloud Shield, Designed and implemented centralized cloud billing. Only 42 apps were lift and shift. Even simplest migration ends up having lots of refactoring.

All new applications have been put in the cloud

IdM team realizing $8500/month in savings by using elastic sizing of resources

Lessons learned –

  1. Get security and network design right as early as possible. Goal was to make cloud security as good or better than on premise
  2. Moving to cloud is 2 parts culture : 1 part technology. Be prepared to answer basic, non-technical questions – If things are working fine now, why move? Will cloud really save money? I understand cloud is the future, but we’re really busy! Doing thing the right way takes too long!
  3. You won’t do as well when you have to split your focus. When things get migrated, the app teams have to manage in two environments. Better to migrate entire portfolios at once.
  4. Everyone is accountable for the cloud – teams need a shared vision, shared goals, and aligned priorities. Corollary: When teams come forward really fast, it’s likely because they have a technical challenge you might not want to touch. Understand training needs of those you work with, before you get there.
  5. Communicate, communicate, communicate! Create a unified baseline understanding. Build partnerships to figure out the questions. Be open and transparent. Address the workforce fears up front.
  6. Don’t max out all the dials at once! Started with new program with new teams, new technology (Cloud), new management, and new processes (Scrum). In retrospect would have provided more help for the team. They didn’t have developers in their cloud program – they would change that if doing it over.
  7. Migrations + engineering + operations = impossible. Recommendation is to create small teams and have them focus on a specific goal. Separated migrations from operations. Operations will quickly consume all capacity.
  8. Cost savings take time to actualize. Learning how to manage costs in the cloud takes time. Could save money if they could close the data center (power and real estate expensive in Cambridge), but in a shared environment that’s hard. Push other benefits of cloud.
  9. Don’t forget about cost management.
  10. Be open to changing your strategy when new information presents a better way.

Cloud Forum 2016 – ERP In The Cloud

Jim Behm (Michigan), David McCartney (Ohio State), Glen Blackler UC Santa Cruz, Erik Lundberg, Washington

UMich – Currently running PeopleSoft (Student, Fin, HR) Click Commerce, Blackbaud. Investigating IaaS for the Student system and planning others.

Ohio State – Currently running PeopleSoft (Finance, HR, and Student) converting Finance to Workday and then HR. Exploring Workday Student. Timing 3 years for finance, 5 years for HR.

UC Santa Cruz – Banner, PeopleSoft, custom IdM on Solaris. Moving it all to AWS by Spring 2018.

Washington – Most modern ERP is 30+ years old on Cobol mainframe. Moving HR/Payroll to Workday, others to follow. Launching in June 2017. Completely restructuring business processes around HR, creating a single service center. Then will tackle Finance. Lessons learned – don’t try implementing software without redoing business process. Looking at how to create sustainable organization capable of tackling these huge projects over 15-20 years.

What impact has your cloud move had on your IT staff?

Bentley University: Didn’t take into account the level of effort involved in regression and security testing. Unanticipated costs and resource issues.

Notre Dame moving ERP to AWS. Had a big impact on storage team who don’t need to do what they used to.

Harvard moving Peoplesoft HR into the cloud. Looking at it as a people issue, not technology. Very sensitive data and people who manage it on premise are invested, but don’t have the skills in the cloud. Don’t want to rely on the cloud team’s expertise. Holding Peoplesoft Day once a week with a consultant who has expertise moving Peoplesoft to AWS, the cloud team, and the Peoplesoft team, working together to solve problems and remove barriers. Building continuous integration and lots of automation.   Arizona doing that too.

Ohio State gutting the data warehouse and rebuilding from scratch. Not sure yet where it will end up.

How have you dealt with information in the cloud and the security ramifications?

Ohio State – Workday is different in terms of access than something like Box. Running into challenges getting enough visibility into the system. Concerns about ability to get logs and information they can consume.

UCSC – people don’t understand the different between SaaS and IaaS. Having to educate them on the local responsibilities still inherent in moving to AWS.

If you chose SaaS how did you enlist your campus and business partners to sacrifice flexibility of the current way for business standardization? How challenging was this?

At Cornell HR decided to move to Workday without consulting IT initially. Was a wake up call for IT in terms of commoditization.

Cloud Forum 2016 – Migration of OnBase to AWS

Sharif Nijim – Notre Dame

OnBase in AWS? Really? Windows app. AWS does Windows fine, OnBase doesn’t do the cloud fine. Licensing is painful for using elasticity. Looked at Hyland’s own hosted offering, but it was way more expensive.

A few lessons learned: EFS  doesn’t do CIFS – not useful if you want Windows File Service in the cloud. There are some products that can help. If they had to do it over again they’d probably run their own Windows File Servers in AWS, but they used Panzura because they had some licenses.

Moving the data – had a couple of terabytes. Tried AWS Snowball. Was complete overkill for what they needed. Transferred 16 GB of database in about 17 minutes. S3 multi-part parallel works well, but there were ~7 million small document files. Had to zip it up for transfer optimization and then rehydrate. Then tried Robocopy to trickle data over a couple of weeks. In order to make a choice, had to understand how the application actually works. Document is written to disk and never changes (annotations go in database). OnBase segments by directory loosely the size of a DVD. So it doesn’t matter if it takes a long time to move data, as it never changes.

OnBase uses NTLM Auth, which doesn’t work well with load balancers, so had to stand up HAProxy. Hoping that OnBase will implement Shib in the next year or so. Notre Dame default procedure is to shut down servers outside of 7am – 7pm. But with OnBase the hash codes for the licenses screw up how the license checks work. Had to get rid of load balancers and elasticity. Still gained separation of web tier from app.

June 25 2016 cut over production users and nobody noticed. Shut down 25 servers and liberated 2 TB of space in production environment.

Plea to cloud providers – make it easy to provision automation from the GUI,  by exporting to templates or whatever.

Cloud Forum 2016 – Lightning Rounds #1

5 minute lightning rounds

How the Cloud is living up to its promise in Cornell Student Services – Phil Robinson

Might have the largest apps portfolio at Cornell – around 190 apps and sites, POS systems, etc. Compliance requirements including HIPAA. Pain points include lots of technical debt from inherited tech. Lots of time spent keeping up with server patching and upgrades. Looking to leverage elasticity to match student cycle spikes. Built a class roster with scheduler on AWS – scaled to over 1k simultaneous users in July, then scaled down. They have 10 apps in production in AWS. Identified an inspired team member to act as champion, prioritized cloud solutions. “Automate like crazy”

Using AWS workspaces for Graduate Students in applied social sciences – Chet Ramey & Jeff Gumpf (Case Western)

pilot project to test virtual desktops via AWS Workspaces. Department was eliminating a computer lab as building was being remodeled. Workspaces are easy to provision, manage, and use on multiple devices. Each person gets a Workspace, provisioned with stats software and other tools in Spring 2016, paid for by central IT. Originally planned for 3 courses and 26 students. Initial setup took about one hour. After first week of operation the pilot was expanded to 6 courses and 110 workspaces. Users were provisioned through the AWS Console. Built a master Workspace, created an image and two bundles from it, and used them to provision users. Problem with SPSS installer – won’t run on Windows Server. Got around that. Included Google Drive client for storage. About $150/student/semester, but with new AWS hourly pricing would be ~ $80.

Bringing IT Partners on Campus Along For the Ride – Susan Kelley (Yale)

Technology Architecture Committee – govern design and architecture, approve POCs, encourage documentation of strategies, working groups. Reviewed 31 projects in the last year. Formed a Cloud Working Group – 8 central IT staff and 7 IT partners. Decision 1: AWS and Azure. Med School helped with how to interpret Azure bills. School of Architecture wanted to get out of managing servers locally – used as test case for VPC, within one year migrated all their infrastructure to cloud. Now they go around telling other IT teams what they learned.

Securing Research Data: NIST 800-171 Compliance in the Crowd – Bob Winding (Notre Dame)

Lots of work will need to be compliant by end of 2017.  Research that contains “controlled, unclassified information” – ITAR. Held a workshop with AWS and several other schools. Worked to create a Quickstart Guide and a Purdue Educause paper. GovCloud and Quickstart cover about 30% of the controls mandated. GovCloud is US persons only region, so that helps. Providing a VDI/RPC gateway in the Shared Services VPC – VDI client is the audit boundary. As long as you run on University-managed equipment you have an isolated environment. Still process-intensive, but you don’t have to worry about infrastructure.

Cloud Adoption: A Developer’s Perspective – Brett Haranin

Cost Engineering in AWS: Do Like We Say, Not Like We Did – Ben Rota

Lessons learned:

  • Be careful that your people don’t confuse “cheap” with “free”
  • For cost estimates, you generally only need to worry about RDS, EC2, and S3
  • Easiest way to save money is to shut down what you don’t need (engineers aren’t used to doing this on premise)
  • Enforce tagging standards that help you understand your spend (including tags for testing)
  • Look out for unattached storage
  • Consider over-provisioning storage rather than buying PIOPS
  • Multi-AZ RDS instances are a low-risk way to get into RI purchases
  • Real bang for buck in RI purchases is to do them at all

How to do? Set up Trusted Advisor or third party tool to help get the view of what’s going on.

Dirty Dancing In The Cloud – Scotty Logan

Why are we moving to the cloud? Geo-diversity, scalability, etc. Don’t forklift to the cloud. “Go Cortez” – burn your boats behind you. Go to new stuff, DevOps, CI, CD, etc. But you still have FIrewalls, IP Addresses – tightly coupled. Use TLS and client certs instead … but my CISo says we need to use static IPs and VPNs! If you have to, use NAT Gateways (AWS Service)… until you can get to the happy place.

Jetstream: Bob Flynn (Indiana)

Expanding NSF XD’s reach and impact. 70% of NSF researchers claimed to be resource constrained. Jetstream is NSF’s first production cloud facility. Infrastructure running Indiana and Texas, with dev at Arizona. Built on OpenStack. For researchers needing a handful of cores (1 to 44), devs, instructors looking for a course environment. Set of preconfigured images (like AMIs) to pick from. Went live September 1, over 125 XSEDE projects. NSF soliciting allocation requests, including Science Gateways. jetstream-cloud.org

Research Garden Path Case Studies – Rob Fatland (Washington)

CloudMaven – a github repo. Don’t recreate solutions. Has AWS procdurals. Has page on HIPAA compliance.

Prototyping library services using high performance NoSQL platform Erik Mitchell (Berkeley)

Costs about $8 per book to put it in a storage facility. Looked at levels of duplication across two libraries and 27 fields = 378 million data items to compare. Looked at big data solutions. Used BigQuery on Google – NoSql database with a GUI and SQL-like query language. Was able to analyze the data easily and discovered lots of places to save effort and money. Not everything needs to be an enterprise service, if the cloud service is easy enough to use at a local level.

Umbrellas for  a Rainy Day: Cloud Contracts and Procurement – Sara Jeanes (Internet2)

In the world of cloud, ” contract is king” – all you own is the contract. Typical procurement processes are long and cumbersome – doesn’t work for the cloud. Challenges: Timeliness, Risk Management, Price Variability, Pilots and Trials. Possible solutions: Consortia, “piggybacking”, Community communication and collaboration

Red Cloud and Federated Cloud – Dave Lifka (Cornell)

Talked to lots of researchers – they liked everything a la carte – cheap computing on demand, with no permanent staffing or aging hardware.  Built Red Cloud, a Eucalyptus stack (100% AWS compatible so you can burst). Gave each of them root, but then they built a subscription model and a for-fee service building VMs for researchers. Available externally as well as internal to Cornell. Aristotle – data building blocks. Bursting to other institutions and then AWS. Building an allocation and accounting system. It’s about time-to-science. Portal to tell people what resource they can get when and at what cost.

 

Cloud Forum 2016 – Routing to the Cloud

DNS – Notre Dame (Bob Winding)

Found early that everything relies on DNS. Need to integrate with AWS DNS to take advantage of both. How many “views” do you have on campus? Want to resolve all the views, but not undermine virtues of Route 53. Think about query volumes, what do you do on campus? They delegate zone administration with Infoblox on campus, but it doesn’t have great granularity for automation. AWS has great IAM controls for automation, but not granular delegation. They use Infoblox as primary DNS, but looking at creating more authoritative zones in Route 53 so they can take advantage of the automation when spinning up new systems.

What do we do with publicly addressed end-points on campus? Had to have a way of routing public endpoints to private address space. When you put in VPN tunnels you create hidden peering connections via your campus, so you need to put ACLs in place. Need to think about visibility of traffic.

AWS Security – Boston University (Gerard Schockley)

Lessons learned around Security Groups – change philosophy from individuals with desktop access to servers to using a VPN group or a bastion host. A challenge to convince them they can’t have dedicated access. How to reassemble breadcrumbs for forensics? VPC flow logs, CloudTrail, OS logs, is a time-consuming challenge. j

AWS Data Level Security – Harvard 

Have a policy to encrypt everything. Turning encryption at rest on by default. Encyrpt database backups (RMAN). Also need to encrypt data in transit – haven’t needed to do that on premise between non-routable subnets. Needed to work with app owners to make sure data gets encrypted in transit. Some institutions installing TripWire on individual systems. Looking at replicating data to other vendors. Libraries are in a bind because of their replication strategies make them unable to trust cloud vendors. There’s some discussion of whether we can urge vendors towards using some of the kinds of archival standards for preservation of digital materials that have evolved in the library world.

Notre Dame refactoring their security groups so that services are in groups and databases are in groups and users are in groups, and they can specify what apps can route traffic to which databases, not relying on IP addresses. That’s hard to do if you have to integrate on premise resources that don’t talk the same kind of security groups.

CalTech killed a $750k VDI on-prem project and is looking at AWS Workspaces very closely.

Most campuses seem to be building infrastructure like identity services into a “core VPC”. There is a 50-something peering limit before you hit some performance limits. One school is only going to peer VPCs for Active Directory and will open public IPs for Shib, LDAP, etc.

Stanford moving their production identity infrastructure to AWS in the next year, in containers. Other schools also heading that direction. Cornell has put AD into AWS, using multiple regions.

Notre Dame looked at AWS’ directory service, but it needed a separate forest from campus, so didn’t meet their needs.

Notre Dame planning to put VPN service into the cloud as well as on premise so it will continue to exist if campus is down. Arizona standing up AD in Azure, bound to campus and setting up some peering to AWS. Boston moving all their AD infrastructure to Azure – looking at Azure AD.  Stanford looked at Azure AD but decided not to use it and are building their own AD in Azure.

IPS/IDS in your VPC? Gerard – cost is “staggering”. Stanford using CoreOS, which can’t be modified while running, and running IDM systems in read-only containers – that provides intrusion prevention.

Cloud Forum 2016 – survey results and Grit!

We’re at Cornell for the 2nd Cloud Forum! We started out last night with a lovely reception and then Bill Allison, Bob Flynn and I had a great dinner at the Moosewood Restaurant.

This morning kicks off with some summaries from Gerard Shockley of the registration survey. There are 92 registrants (which was capped) from 52 institutions and 4 technology providers (AWS, Microsoft, Google, ) and attendees in 83 roles, from CIO to architects to faculty.

  • 75% reported that cloud strategy is a work in progress.
  • 52% using AWS, 17% Azure, 19% Google, Oracle 2.7%, 205 using some form of “community cloud”
  • 71% report no validated cloud exit strategy
  • 30% say they’re “cloud first”, 52% using “opportunistic cloud”
  • 79% report on-premise facilities that are > 5 years old.
  • Most realize that reducing cost is not the main reason to move to the cloud. Improving service, added flexibility and agility, and improving support for research rank high.
  • Staff readiness is the highest ranked obstacle to broad cloud adoption.
  • 34% have signed a Net+ agreement for either IaaS or Paas.
  • 70% have central IT involved in cloud support for Research Computing
  • 28% say their institution plans on performing clinical research in the cloud
  • 56% say they have signed a HIPAA BAA with a cloud service provider

Next, a session from Sharif Nijim from Notre Dame titled “Grit!”

There’s a shift in how we do things – e.g. from capacity planning to cloud financial engineering. Picking a partner to provide infrastructure services is a whole new level of trust. Hiring staff who can deal with the rate of change in the cloud is critical and hard. We’re all running software that is cloud unfriendly – how many of us are helping the vendors evolve? We’re all prototyping and learning and putting things in production and continuing to learn – sometimes the hard way.