CSG Fall 2016 – ITIL and DevOps

Why is this important?

  • Does ITIL make sense in an era of continuous delivery and integration?
  • Will the volume of applications and sites overwhelm the management methodology?
  • Distributed IT is not well versed in ITIL
  • Does DevOps include formal review? Shouldn’t Tier 0 sites and apps get reviewed for changes?

Survey results

  • Almost all respondents have a formal Change process and board
  • Divided on if PaaS/SaaS need formal change reviews
  • Some said that changes are only managed for major changes
  • Most respondents not mature yet with DevOps practices
  • Some groups doing agile development, but not all

Harvard working on trying to reinvent ITIL in the cloud environment – since it’s all software now, release management practices are more appropriate than change management.

Would be good to have changes (even pre-approved ones) logged in ServiceNow so incidents could be correlated with changes.

In new cloud deployments people aren’t patching, but blowing machines away and deploying new ones. How does change process handle that?

Notre Dame trying to eliminate human access to the cloud console for production systems

Nobody in the room is doing continuous deployments to ERP systems

Cornell – with self-healing infrastructure they may not even know there’s an outage.


What Have We Learned From ITIL?

I’ve been at least peripherally involved with IT Service Management methodologies at two institutions over the past decade. At the UW I was responsible for the creation of the first IT Service Catalog and began the process of creating a service request process. Here at Chicago one of my first accomplishments was finishing up the creation of our IT Service Catalog and selecting and procuring an ITSM tool. I’m now working with others to improve our implementation and use of that tool.

In that context I’ve been thinking about ITIL and how it fits with our evolving notions of agile development, DevOps, and what Gartner calls “bi-modal IT”. ITIL and other now traditional ITSM methodologies were built to combat the chaotic world of IT in the 1980s and 90s as information technology began to rule the world. ITIL offers standardized processes designed to give IT organizations control over their work and the environments they manage.

As is the way with methodologies, even though ITIL was meant as a set of methods you could pick from and tailor to your needs, it has been taken as more of a religious crusade. It’s not uncommon for IT shops to build up a massive set of bureaucratic processes based on the ITIL language that, instead of providing responsivene IT, become yet another way for IT to stand in the way of people getting work done.

I’m not a huge fan of ITIL and its methods, and I think it’s largely become an artifact of an older way of thinking about IT processes, but I do think there are some valuable lessons we’ve learned from ITIL over the years, and we shouldn’t overlook those in our rush to the newer ways of working. So I’m going to attempt to document some of the things I’ve personally found valuable from working with ITIL. Feel free to add to the list or differ in the comments.

  • Incidents and Problems are not the same thing. Incidents are reports of someone having a hard time. Problems are things that have gone wrong and need fixing.
  • Service Requests are not incidents – they are requests from people for the things that you regularly provide – the things in your Service Catalog. A request for a new email account is a service request. A request for a project to build a new email infrastructure is not a service request (unless that’s the business you’re in). Service requests should be able to be automated and not treated as artisanal creative work.
  • Life goes better when you have some process to control changes in your IT environment. In the agile and DevOps world continually deployed changes are governed by automated tests and easy rollbacks, which are considered pre-approved changes in the ITIL world.
  • It’s important to measure your service usage. Metrics can help plan for capacity, help understand how people are using services (and where there are issues), and help you know when it’s time to retire or replace a service instead of continuing to invest in it.

What else?

CSG Winter 2015 – DevOps workshop

Why you should care – Bruce Vincent and Scotty Logan – Stanford
How do we reconcile: desire for continuous functional improvement; need for efficient deployment workflow; platform variations desire for portability; expectation of zero service disruption? Can’t disrupt ongoing practices. Outage windows – “is never good for you?”
The problem – how to manage change. Streamlining deployment. Going right to live – scary thought?
State of the art 2011 – (cloud implementation)
Containers are a game changer: Application consistency; portability; rapid prototyping, testing, deployment; disposable servers. There were always problems in making sure that the environment is the same in dev and prod. Developers can’t deal with the complexity.
Version upgrades can be done discretely, tested, and staged. Orchestration builds entire environment automatically. Container OS is tiny and disposable, so almost no sysadmin or patching required. Very cost effective and no hypervisor overhead. Docker supported on AWS, Google Compute, OpenStack, and soon Azure.
Your whole stack as code: Programming professionals are driving DevOps as a new standard in software engineering practice. Continuous Integration; Blue-Green deployment; You get more productivity from your developers with DevOps; As a nice additional benefit, good developers want to work in your shop.
Using Terraform to script virtual data center at AWS.
Organizational Skills and Issues – Charlie Kneifel
Innovation doesn’t move fast enough – balance between the right amount of process and allow the innovation. At Duke have a group that meet on a weekly basis. At Duke made progress in automation and reaped some paybacks.
DevOps maturity model: Duke case study/demo – Mark McCahill
Devops won’t happen overnight.
The basics – have had in place, virtualized compute, virtualized storage, puppet configuration management, SVN/Git repository, ticketing system
Standardization – lovingly hand-crafted systems created by artisan sysadmins fail. CVL and CM-manage illustrated that standard build processes work. Clockworks team: 2 devs + 3 team leads (Linux, Windows, Monitoring) + architect.
Clockworks – configure & provision custom VMs. ServiceNow ticket process to handle whatever we haven’t yet automated. Chaos control opportunities: TSM backup configuration; self-service Shib SP registration; self service Commode site cert signing (Locksmith).
Next Steps: Stevedore: Automate drupal and wordpress via Docker container orchestration. Containers: data, mySQL, php, apache; site cert creation and installation; shib Sp registration.
IDM in containers: Kerberos KDS container now in test. Continuous builds via Jenkins; automate testing; retain old container – we can rollback.
Antikythera – DevOps automation isn’t just for admin & web sites. Research computing provisioning proof of concept – compute, storage, apps/containers… and SDN as it is more widely deployed. Lets you have clear provenance of code and datasets for any specific job.
Summary – migrate ticket-driven artisan-crafted work and processes to self-service apps. Orchestrate automation via self-services app APIs; Automation dashboards for both research and admin computing.
Bill Allison – Berkeley: Moving to Continuous
Case study: The Berkeley Desktop
Fall 2011 – OMG v0.1: Everything is broken or breaking all the time. No time for staff to work on solutions. Compromised machines. Standard image doesn’t work on laptops. standard image too hard to change.
Imaging a machine took 4 hours of senior tech time. Varying hardware standards, no significant automation, manual work, no checklists.
Too busy to improve.
Split Desktop Design and Engineering from Ops and support.
Tackle the things that increase costs: Labor, productivity loss; change; variance
Now have 11-12k computers under management, around 5k with the full Berkeley desktop.
Artifacts are public in GitHub so others can use them.