CSG Fall 2016 – ITIL and DevOps

Why is this important?

  • Does ITIL make sense in an era of continuous delivery and integration?
  • Will the volume of applications and sites overwhelm the management methodology?
  • Distributed IT is not well versed in ITIL
  • Does DevOps include formal review? Shouldn’t Tier 0 sites and apps get reviewed for changes?

Survey results

  • Almost all respondents have a formal Change process and board
  • Divided on if PaaS/SaaS need formal change reviews
  • Some said that changes are only managed for major changes
  • Most respondents not mature yet with DevOps practices
  • Some groups doing agile development, but not all

Harvard working on trying to reinvent ITIL in the cloud environment – since it’s all software now, release management practices are more appropriate than change management.

Would be good to have changes (even pre-approved ones) logged in ServiceNow so incidents could be correlated with changes.

In new cloud deployments people aren’t patching, but blowing machines away and deploying new ones. How does change process handle that?

Notre Dame trying to eliminate human access to the cloud console for production systems

Nobody in the room is doing continuous deployments to ERP systems

Cornell – with self-healing infrastructure they may not even know there’s an outage.

Tom Vachon, Harvard

Harvard’s cloud at a glance

  • 684 applications targeted for migration by 7/18, 300+ migrated already
    • Shutting down one on-prem data center
  • 1 VPC per account on average
    • Centrally Billed: 131 Accounts
    • 45 Accounts/VPCs on Direct Connect
    • Looking to make Cloud a University-wide strategic program
  • Cloud Shield – physical firewall
    • Kicked off 7/15 in response to a security breach
    • POC – 11/15 – 2/16
    • Started automation code 3/16
    • 15,000 lines of code
    • Production ready 7/16
    • Design goals
      • provide highly available and highly redundant AWS network access
      • Provide visibility of traffic into, out of, and between cloud applications
      • Provide next-gen firewall protections
      • Inline web filtering to simplify server configuration
      • Provide multicloud connectivity
    • Tech details
      • Diverse paths and POPs – Boston has 2 direct connects, and a POP in Equinix in Virginia with private network connection to campus
      • Primarily done for visibility
    • Actively discourage host-based firewalls
      • Use security groups instead
      • Don’t use Network ACLs
  • Will provision services with public IPs
    • They have overlapping private address spaces
  • Design manager of managers in Python
    • Create an ops & maintenance free architecture in Lambda
    • Provide REST API through AWS API Gateway
    • Isolate changes by segregating integrations in AWS Lambda
  • Leverage AWS DynamoDB for
    • Schemaless session cache
    • Dynamic reconfiguration
  • Challenges
    • Static DNS names
      • use ELB or ALB for applications
    • Everyone needs to be on Harvard IP space
      • Delegates six /16s for AWS
    • Legacy application stacks
      • Java has a “mostly hate” relationship with DNS
        • Lots of apps cache DNS forever
    • Reduced S3 visibility
    • Inability to do app-by-app identification
      • Grouping by data classifications
    • Items which are unknowingly locked down to AWS IP space
      • eg doing a yum update to AWS Linux from a non-AWS ip space
  • Virtual firewalls per VPC were going to cost >$4 million over three years, this model costs $1.6 million over five years
  • Most applications got faster when distributed across this model
    • Less switching in the way

Panel Discussion

  • Biggest technical challenges so far?
    • Georgetown  – have to run virtual firewalls in HA. Looking at replacing with TrendMicro
    • Harvard – lack of visibility in AWS
    • UNL – Vast offerings from vendors – how to wrap heads around it?
    • How to support on prem and burst out, especially for research instruments?
    • Cornell – Keeping up with the technology. Having people to manage and implement solutions. Encouraging lack of consistency in an effort to use the best new technology to solve problems.
    • Wisconsin – Have to worry about security a whole new paradigm in the cloud.
    • Notre Dame – pace of innovation. Do we prepare for a more rapid pace of change (and those costs) or learn to live with not implementing the latest?

 

 

What Have We Learned From ITIL?

I’ve been at least peripherally involved with IT Service Management methodologies at two institutions over the past decade. At the UW I was responsible for the creation of the first IT Service Catalog and began the process of creating a service request process. Here at Chicago one of my first accomplishments was finishing up the creation of our IT Service Catalog and selecting and procuring an ITSM tool. I’m now working with others to improve our implementation and use of that tool.

In that context I’ve been thinking about ITIL and how it fits with our evolving notions of agile development, DevOps, and what Gartner calls “bi-modal IT”. ITIL and other now traditional ITSM methodologies were built to combat the chaotic world of IT in the 1980s and 90s as information technology began to rule the world. ITIL offers standardized processes designed to give IT organizations control over their work and the environments they manage.

As is the way with methodologies, even though ITIL was meant as a set of methods you could pick from and tailor to your needs, it has been taken as more of a religious crusade. It’s not uncommon for IT shops to build up a massive set of bureaucratic processes based on the ITIL language that, instead of providing responsivene IT, become yet another way for IT to stand in the way of people getting work done.

I’m not a huge fan of ITIL and its methods, and I think it’s largely become an artifact of an older way of thinking about IT processes, but I do think there are some valuable lessons we’ve learned from ITIL over the years, and we shouldn’t overlook those in our rush to the newer ways of working. So I’m going to attempt to document some of the things I’ve personally found valuable from working with ITIL. Feel free to add to the list or differ in the comments.

  • Incidents and Problems are not the same thing. Incidents are reports of someone having a hard time. Problems are things that have gone wrong and need fixing.
  • Service Requests are not incidents – they are requests from people for the things that you regularly provide – the things in your Service Catalog. A request for a new email account is a service request. A request for a project to build a new email infrastructure is not a service request (unless that’s the business you’re in). Service requests should be able to be automated and not treated as artisanal creative work.
  • Life goes better when you have some process to control changes in your IT environment. In the agile and DevOps world continually deployed changes are governed by automated tests and easy rollbacks, which are considered pre-approved changes in the ITIL world.
  • It’s important to measure your service usage. Metrics can help plan for capacity, help understand how people are using services (and where there are issues), and help you know when it’s time to retire or replace a service instead of continuing to invest in it.

What else?

CNI Fall 2013 – Creating A Data Interchange Standard For Researchers, Research, And Research Resources: VIVO-ISF

Dean B. Krafft, Brian Lowe, Cornell University

What is VIVO?

  • Software: an open0source semantic-web-based researcher and research discovery tool
  • Data: Institution-wide, publicly-visible information about research and researchers
  • Standards: A stnadard ontology (VIVO data) that interconnects researchers

VIVO normalizes complex inputs, connecting scientists and scholars with and through their research and scholarship.

Why is VIVO important?

  • The only standard way to exchange information about research and researchers across divers institutions
  • Provides authoritative data from institutional databases of record as Linked Open Data
  • Supports search, analysis, and visualization of data
  • Extensible

An http request can return HTML or RDF data

Value for institutions and consortia

  • Common data substrate
  • Distributed curation beyond what is officially tracked
  • Data that is visible gets fixed

US Dept. of Agrigculture implementing VIVO for 45,000 intramural researchers to link to Land Grant universities and international agricultural research institutions.

VIVO exploration and Analytics

  • structured data can be navigated, analyzed, and visualized within or across institutions.
  • VIVO can visualize strengths of networks
  • Create dashboards to understand impact

Providing the context for research data

  • Context is critical to find, understand, and reuse research data
  • Contexts include: narrative publications, research grant data, etc.
  • VIVO dataset registries: Australian National Data Registry, Datastar tool at Cornell

Currently hiring a full-time VIVI project director.

VIVO and the Integrated Semantic Framework

What is the ISF?

  • A semantic infrastructure to represent people based on all the products of their research and activities
  • A partnership between VIVO, eagle-i, and ShareCenter
  • A Clinical and Translational Information Exchange Project (CTSAConnect): 18 months (Feb2012-Aug2013) funded by NIH))

People and Resources – VIVO interested primarily in people, eagle-i interested in genes, anatomy, manufacturer. Overlap in techniques, training, publications, protocols.

ISF Ontology about making relationships – connecting researchers, resources, and clinical activities. Not about classification and applying terms, but about linking things together.

Going beyond static CVs – distributed data, research and scholarship in context, context aids in disambiguation, contributor roles, outputs and outcomes beyond publications.

Linked Data Vocabularies: FOAF (Friend of a Friend) for people, organizations, groups; VCard (Contact info) (new version); BIBO (publications); SKOS (terminologies, controlled vocabularies, etc).

Open biomedical Ontologies (OBO family): OBI (Ontology of biomedical investigations); ERO (eagle-i Research Resource Ontology); RO (Relationship Ontology); IAO (Information Artifact Ontology – goes beyond bibliographic)

Basic Formal Ontology from OBO – Process, Role, Ocurrent, Continuant, Spatial Region, Site.

Reified Relationships – Person-Position-Org, Person-Authorship-Article. RDF Subject/predicate model breaks down for some things, like trying to model different position relationships over time.  So use a triple so the relationship gets treated as an entity of its own with its own metadata. Allows aggregation over time, e.g. Position can be held over a particular time interval. Allows building of a distributed CV over time.  Allows aggregating name change data over time by applying time data to multiple VCards with time properties.

Beyond publication bylines – What are people doing? Roles are important in VIVO ISF. Person-Role-Project. Roles and outputs: Person-Role-Project-document, resource, etc.

Application examples: search (beta.vivosearch.org) can pull in data from distributed software (e.g. Harvard Profiles) using VIVO ontologies.

Use cases: Find publications supported by grants; discover and reuse expensive equipment and resources; demonstrate importance of facilities services to research results; discover people with access to resources or expertise in techniques.

Humanities and Artistic Works -performances of a work, translations, collections and exhibits. Steven McCauley and Theodore Lawless at Brown.

Collaborative development – DuraSpace VIVO-ISF Working Group. Biweekly calls Wed 2 pm ET. https://wiki.duraspace.org/display/VIVO/VIVO-ISF+Ontology+Working+Group

Linked Data for Libraries

December 5, 2013 Mellon made a 2 year grant to Cornell, Harvard, and Stanford starting Jan 2014 to develop Scholarly Resource Semantic Information Store model to capture the intellectual value that librarians and other domain experts add to information resources, together with the social value evident from patterns of research.

Outcomes: Open source extensible SRSIS ontology compatible with VIVO, BIBFRAME and other ontologies for libraries.

Sloan has funded Cornell to integrate ORCID more closely with VIVO. At Cornell they’re turning MARC records into RDF triples indexed with SOLR – beta.blacklight.cornell.edu

 

CSG Fall 2012 – Balancing Central and Distributed Services

Bernie Gulacheck from Minnesota is leading a discussion on Central and Distributed Services. This is not a new topic, but the context has changed. We’ve seen the delivery of technology services change over the years. In the late ’80s and early ’90s distributed service units in Libraries, Administration, Academic Computing, were amalgamated into central IT units. Then the conversation shifted to the current landscape of distributed technology units and a central unit. The model along service continuum was often new technology emerging in the distributed units and then later being centralized for economies of scale. The cloud shifts this dynamic, where both central and distributed units can shift or bring new services into being in the cloud.

We’d like to believe that each unit that manages its own technology services is focused on its mission so as to create complementary and not duplicative services – sometimes that’s the case, sometimes it isn’t. What are the elements that facilitate this model? One comment is that what works is transparency – letting the deans and administrators know what is being offered centrally and going through the services each school is offering to see where there is duplication. The service catalog was very important in making this happen. Making that visible allows the conversation about efficiency and making sure that the quality of central services is acceptable to the schools.

Cornell has a structure where the distributed technology leaders also report in to an associate CIO in the central office – they are learning how to build the trust and efficiency in the group. They are building a brand of IT@Cornell that encompasses the entire concept, and that’s starting to work. The services organization is trying to lower cost and maximize efficiencies in order to provide the best service possible to demonstrate the utility value of central services so they don’t have to be duplicated locally.

Kitty notes that we have to be conscious that some services can only be delivered by the person who sits right next to the user – need to know the people, who’s got grant deadlines, etc. Also it’s a challenge for us to make core services easy enough to use.

Bernie notes that often the cloud services are superior to what we can offer, but the factors preventing us from moving in that direction are some of the same factors that prevent the distributed units from moving services to the center.

Elazar notes that trust is a key factor – they’re rolling out a new desktop support environment that will cover the whole institution, and it’s the same with consolidating data centers. Ron Kraemer notes that little things count – like referring to services as the “OIT” data center instead of the “Notre Dame” data center.

Tom says that the ability to present central services as something that distributed units can just use, much as they do the cloud, is important.

Sometimes the actual consolidation of services, even when everyone agrees it makes sense, can be perceived as threatening people’s jobs, which makes it hard to make progress.

Tracy notes that the more you can include people from the units within the central organization as much as possible can help build the relationships. Also you have to build a story and stick to it that gives people hope and a sense of purpose – where is the evolution of their position?

The concept of the say/do ratio is important – ideally would be 1:1.

Developing soft skills in the organization is important.

Bill notes that they started something called the Stanford Technical Leaders Program, where they brought in MOR to help build skills with 13 technical people from the central unit and 13 distributed people from around the campus. Last year they put on an un-conference, and registration fills within minutes after they open the web site. Once it gets to management it’s a failure – want to build the soft skills and the relationships.

It’s important to be honest that jobs are shifting and new skills will be needed – it won’t always be possible to retrain people, and in some cases groups will shrink.

At Brown they looked and found that they’re 49% central and 51% distributed, and in many cases the distributed people are being paid better than in central IT.

Tom notes that governance has helped, but that inputs from distributed units hasn’t always come through those administrative processes. Being able to prioritize and schedule work realistically is important.

Bill talks about “getting beyond polite.” He was told that that his (Bill’s) presence in the room was too loud, and without him in the room the discussion gets more down to earth.

I noted that often people ask for the help of the central unit in solving problems but we don’t have the capacity to deliver help in a timely manner. Bernie then asks what happens when we build services that have been requested by the distributed technology services but the units then opt out and complain about cost increases? Chuck has found that an effective technique is to let the unit lead the project and be responsible for end-to-end including announcement can be effective. Ilee says that making sure that the distributed units are involved in the definition of the services and that having a way to communicate with the deans is important.

Where we’re still in hot water is where we’ve over-promised and underestimated the complexity of replacing local services with central services, which burns our goodwill chips. We don’t want to stifle innovation in the units.

There are often pressures on the CIO to optimize cost in IT, but deans and other leaders can be hesitant to have conversations about the steps necessary to achieve those savings.

It might be possible to give schools score cards about where they are in comparison to each other and central units – has to be done independently (e.g. by the finance unit). Can help deans make decisions on how to allocate resources.

Having visibility into all the IT requests can help people understand what is happening and alert people to potential duplications of effort.

At one institution they don’t use the word “distributed” but use “federated”.

One person notes that if you have distributed people also report in to the central unit that you let the units off the hook a bit – can be a double-edged sword.

CSG Fall 2012 – Projecting Infrastructure into the Cloud

Tom Barton from Chicago and Michael Gettes from Carnegie Mellon are leading a discussion on Projecting Infrastructure into the Cloud.

Identity Federation & Attribute Release – Federated Access anyone? Release directory info! In InCommon identity providers get into the federation, but not always service providers. – get your SP into the Federation. At CMU they release directory information – For everyone- eduPersonPrincipalName (which for them is an email address), and eduPersonScopedAffiliation. For non-students: givenName, surname, commonName, email. Allows for very quick integration of cloud providers. Will this work for others? Ken notes that projects such as Vivo have lots of data with no access control.

Contracts – we spend lots of time on compliance and security, but not on functionality and defining the relationship. CMU and PSU are requiring their vendors to join InCommon. One comment is that vendors are increasingly resistant to joining InCommon.

There’s a bunch of discussion about things that are beyond identity – how do we deprovision users, how do we communicate limitations, where things are easier or harder in the cloud. Kitty notes that in some contract negotiations with cloud vendors they are requiring targets about load and latency testing from different points in the world.

[CSG Winter 2011] Higher ed from both sides now

Greg Jackson (Educause)

Collaboration – we don’t do it very well across our organization.
– We sign NDAs for No Benefit
– We let vendors pick us off
– We keep our cake (we hold on to resources we really should be sharing)

Battles – we fight those we can’t win. Prevalence will sometimes win out over quality.
– Google is going to win
– The CFO is going to win
– Verizon/AT&T/Sprint are going to win
– Oracle is going to win – not everything, but everything it cares about
We don’t engage very well if we characterize them as evil

Optimization
– Being different from peers isn’t the same as being ahead of peers. No competitive advantage to how we use IT at our institutions.
– Being ahead of peers isn’t the same as winning.
– Distinctiveness yields value, but it also consumes it
– It doesn’t matter what computer you use, because standardization has largely been achieved
– When standardization fails, idiosyncrasy accelerates

Tracy notes that we’re different because our environments demand us to be.
Greg – we don’t want to aspire to mediocrity. We shouldn’t innovate in different directions just for the sake of different directions.

Management
– We reject cost accounting
– We prefer tactics to strategies
– We send good money after bad
– We prefer right to timely
– We eat (or alienate) our seed corn
– We mistake users for customers

Association
– We squabble (especially in public)
– We waste too much time on governance
– We spread ourselves too thinly
– We obsess