Why you should care – Bruce Vincent and Scotty Logan – Stanford
How do we reconcile: desire for continuous functional improvement; need for efficient deployment workflow; platform variations desire for portability; expectation of zero service disruption? Can’t disrupt ongoing practices. Outage windows – “is never good for you?”
The problem – how to manage change. Streamlining deployment. Going right to live – scary thought?
State of the art 2011 – (cloud implementation)
Containers are a game changer: Application consistency; portability; rapid prototyping, testing, deployment; disposable servers. There were always problems in making sure that the environment is the same in dev and prod. Developers can’t deal with the complexity.
Version upgrades can be done discretely, tested, and staged. Orchestration builds entire environment automatically. Container OS is tiny and disposable, so almost no sysadmin or patching required. Very cost effective and no hypervisor overhead. Docker supported on AWS, Google Compute, OpenStack, and soon Azure.
Your whole stack as code: Programming professionals are driving DevOps as a new standard in software engineering practice. Continuous Integration; Blue-Green deployment; You get more productivity from your developers with DevOps; As a nice additional benefit, good developers want to work in your shop.
Using Terraform to script virtual data center at AWS.
Organizational Skills and Issues – Charlie Kneifel
Innovation doesn’t move fast enough – balance between the right amount of process and allow the innovation. At Duke have a group that meet on a weekly basis. At Duke made progress in automation and reaped some paybacks.
DevOps maturity model: Duke case study/demo – Mark McCahill
Devops won’t happen overnight.
The basics – have had in place, virtualized compute, virtualized storage, puppet configuration management, SVN/Git repository, ticketing system
Standardization – lovingly hand-crafted systems created by artisan sysadmins fail. CVL and CM-manage illustrated that standard build processes work. Clockworks team: 2 devs + 3 team leads (Linux, Windows, Monitoring) + architect.
Clockworks – configure & provision custom VMs. ServiceNow ticket process to handle whatever we haven’t yet automated. Chaos control opportunities: TSM backup configuration; self-service Shib SP registration; self service Commode site cert signing (Locksmith).
Next Steps: Stevedore: Automate drupal and wordpress via Docker container orchestration. Containers: data, mySQL, php, apache; site cert creation and installation; shib Sp registration.
IDM in containers: Kerberos KDS container now in test. Continuous builds via Jenkins; automate testing; retain old container – we can rollback.
Antikythera – DevOps automation isn’t just for admin & web sites. Research computing provisioning proof of concept – compute, storage, apps/containers… and SDN as it is more widely deployed. Lets you have clear provenance of code and datasets for any specific job.
Summary – migrate ticket-driven artisan-crafted work and processes to self-service apps. Orchestrate automation via self-services app APIs; Automation dashboards for both research and admin computing.
Bill Allison – Berkeley: Moving to Continuous
Case study: The Berkeley Desktop
Fall 2011 – OMG v0.1: Everything is broken or breaking all the time. No time for staff to work on solutions. Compromised machines. Standard image doesn’t work on laptops. standard image too hard to change.
Imaging a machine took 4 hours of senior tech time. Varying hardware standards, no significant automation, manual work, no checklists.
Too busy to improve.
Split Desktop Design and Engineering from Ops and support.
Tackle the things that increase costs: Labor, productivity loss; change; variance
Now have 11-12k computers under management, around 5k with the full Berkeley desktop.
Artifacts are public in GitHub so others can use them.