csg-spring-2010 – Oren Sreebny

[CSG Spring 2010] SaaS requirements for higher ed

Tracy Futhey is leading a conversation on SaaS requirements for higher education.

Spent summer gathering docs on shared services from various campuses. In August started looking at email and hosting. Engaged a team from NACUA in October. Came up with email Issues matrix in November and worked out a model contract in March and a draft RFP model in April.

Strategies adopted by sub-team
– Avoid hardcore Technical Requirements list. (outsourcing service/function is not dictating technical solutions)
– Recognize/Leverage limitations on free services (build RFP with expectation of payment for services)
– Assume reuse; organize materials accordingly
– Admit Rumsfeld was right: “there are also unknown unknowns, the ones we don’t know we don’t know”.

Issues spreadsheet – five big issues – Data Stewardship, Privacy, Integration, Functionalities, Service Level

Working with Educause to distribute as open source documents.

What may be next?
– Assess interest in glomming RFP (CSG + …?)
– Finalize plan for Educause to hold docs
– Issue common RFP in June/July?
– Responses in August?
– Campus discussions in fall? Vendor negotiation? (not clear vendor(s) will be responsive to our concerns, or that we will like the responses)
– Decisions by Jan 1, 2011?
– Pilots during spring 2011?
– Fall 2011 go-live dates?

[CSG Spring 2010] Service Management – Service Lifecycle Cradle 2 Grage

Romy Bolton (Iowa) and Bernard Gulachek (Minnesota) are talking about service lifecycle.

At Minnesota they think a lot about service positioning – not to just react to perceived need. An unquenching appetite with limited resources is not a good recipe. Tried to apply a general administrative services framework for the institution about where services should be placed along a continuum from distributed to centralized. Developed principles and examples to help communicate with people in the distributed units.

At Iowa they started “Project Review” process in the late 90s. Tuesday afternoon meetings – employee time with the directors and CIO. Open to everybody. Re-tooled project framework in 2007, service lifecycle management in 2008. Light ITIL framework

Emphasis on service definition, publication, end user request, provisioning. They still have project review, plus a project called Discovery to explore ideas, ITS Spotlight to call attention of staff to services. IT admins on campus have regular monthly meetings with 100+ people. Beginning to work on Do It Yourself provisioning tool.

Service definition starts in project planning phase
– identify service owner and provider
– identify KPIs for service
– Reassess risks and cost-benefit for service
– Identify critcality of service on scale of 1-4
– Update 5 yr TCO and funding source
– Document service milestones
– Update status in ITS Service Catalog as appropriate

Iowa uses Sharepoint as intranet and for publishing their service catalog and Drupal for IKE (their knowledge management site). They’re just building out the self-provisioning service.

Tom Barton notes that there’s something called a Service Provisioning Markup Language – sort of languishing, but maybe some new energy is flowing into it.

Iowa – triggers for Service Review: User needs; environmental change (e.g. the cloud for email); financial; security event; hardware refresh; new software version; end of life for product. Review is not a small effort. Business and Finance office helps gather info. Includes: Service Overview, Customer Input, Financial Resources, Utilization and customer base, service metrics, market analysis, labor resource, recommendations. Owned by the senior directors.

At Minnesota they do annual service reviews of all of their common good services – “just began to enforce that”, in part borne out of frustration at not being able to sunset services. Two or three people focus on this, working with service owners. The current example is what services continue as they roll out Google Apps.

Service Performance and Measurement

Designed for strategic conversations with stakeholders that go beyond the operational. Began gathering availability data about a year ago – looking at whether services are alive. Klara notes that defining whether a service is up can be complex, but that it can be easier to measure simply whether a user can access a service. They have a systems status page showing current status – mixture of automated and human-intervention. Using Cisco’s Intuity product to track monthly/annual measures. They give roll-ups of info to deans and IT leaders. Include benchmark comparisons with Gartner or Burton benchmarks if available. They publish the cost of services annually, so they understand what they’re paying for and how that’s changed over time. http://www.apdex.org is a new alliance for understanding application performance measurement.

At Stanford they’ve established Business Partners – senior people who know the organization who act as the pipeline in to the service managers. They meet with clients at a senior level.

[CSG Spring 2010] Service Management – CIO Panel

The morning is all about service management topics. My notes are going to be pretty sketchy because I’m coordinating the workshop and giving several presentations, but I’ll do my best and put up the slides from my parts.

Klara (Chicago) notes that culture is key in trying to implement service management. Steve (Iowa) agrees. At Iowa they built lightweight service and project management frameworks because that’s what the culture would tolerate. It’s a trigger-based process. Different events are recognized by service managers or owners and then initiate a review of a service. They put a lot of accountability on the service owner – they have to bring the right metrics forward. The review process gives them a chance to have some oversight of those metrics.

Bill Clebsch (Stanford) – doesn’t like ITIL or anything that looks like it comes from the outside to tell the organization what to do. So tries to talk first about accountability – that’s how they brought time tracking into the organization. Before that they did metrics – “a star performer’s best friend”. Put up customer-facing metrics, work they did with MIT. That was foundational to moving culture to more of a performance orientation. They’re a big Remedy shop. Every help desk at the university runs through their Remedy. Often people’s only knowledge of the organization is the service desk, so that’s a good place to start. Now working on change management. Remedy is good if you want to drink the kool-aid. They started a service portfolio effort about three years ago. Budget cuts are the best friend for getting these things done – makes your own organization aware, and makes your clients aware that they can’t behave in aberrant ways. Setting ambitious goals is good.

Kerry (Carnegie-Mellon). In addition to culture, timing is key. Service portfolio effort started at CMU in the central IT organization when Joel first became CIO – didn’t understand what services were being provided. Was beginning to have success when an external advisory board visit – CMU was growing from being a start-up to a global enterprise. Changed the conversation. “Who is responsible for a service” was a hard question. Started answering by “whoever Kerry or Joel calls to fix it.”

Bill – every year do an extensive client survey, and scores have gone way up in recent years, as have metrics and employee surveys. Having the organization much more outwardly focused matters as much as the data. Sense of ownership is huge.

Klara – Chicago is not as mature, yet deans still want to give things up to IT.

Steve – the role of technology is changing, which makes people more willing to cede control of it to the central IT group. Bill – when things get boring or risky, hand off to central IT.

A question about the relationship of project management to services. At CMU the transition from project to service was difficult because they didn’t yet know how to declare a project done. They’re now paying a lot of attention to review of projects and transition to services. Klara – important to be mindful about how to operationalize a project – bring in the other stakeholders like service desk, operations, etc.

Question about how to decide to stop doing services – Steve- service reviews help, when utilization is declining and other alternatives exist, then there can be a project to shut down the service. Bill – have a dedicated service portfolio team that looks at what services should be brought up and shut down. They have actually shut down some services. Team is made up of service managers, some directors, some executive directors. We’re moving into an era of being more service brokers than providers, and will do less provisioning. That will require a different kind of service managers. They have a few people in the organization who are explicitly service managers, with no other role.

Question about cost of services. At Iowa they allocated all of the IT costs to services – it was a lot of work, but the data was very interesting and started good discussions. In the process of trying to automate that. Tension between being efficient and being able to invest to help research and teaching to be better.

Time tracking is essential to doing costs of services.

Critical to not let the perfect be the enemy of the good. Shel notes that Bell Labs decided to got to activity-based accounting and four years later the internal accounting department had grown to 450 people.

Shel – you have to make the judgement on what your allocation model is for given services. You may not make the perfect decision, but you need to decide.

[CSG Spring 2010] Storage Futures – Cloud Options discussion

Shel Waggener – Link campus into cloud providers?
– Duraspace integration?
– UC Systemwide storage solution
– Purchase mass storage from commercial provider e.g. Amazon
– Let everybody do their own.

File Sharing through cloud: Institutional sharing?
– Eliminated Xythos (done)
– Common contract with Dropbox?

Student and faculty portfolios?
– Alumni offerings

Bernard – in context of move to Google, thy’ve clarified policies around PHI, ITAR data, FERPA.

One institution reports that as far as their CISO is concerned, if it’s verifiably sufficiently encrypted, they’d regard it the same as shredded paper.

[CSG Spring 2010] Storage Strategies

Storage strategy survey results. Storage management is equally distributed between central IT, distributed, both, or not sure.

What’s provided centrally? All offer individual file space. Most offer backups for distributed servers and departmental file space. Half offer desktop backups.

Funding models – just about all have some variety of pay for what you use. Most have some common goods, and about half have base plus cost for extra.

About half do full cost recovery including staff time.

Challenges – data growth is top, tiered storage is next, along with centralizing and virtualization.

Biggest explicit challenges : Data growth, perception of cost, research storage.

Storage at Iowa
Central file storage: Base entitlement, individuals 1-5 GB, depts, 1 GB per FTE. 4 hour recovery objectives. 99.97% uptime. 89% participation. Enterprise level, high availability.

One price fits all network file storage, offered some lower-cost network storage, e.g. without replication or backup, now they’ve got lowest-cost bare server storage – lots of enthusiasm for that model.

http://its/uiowa.edu/spa/storage/

Low cost SAN for servers $0.36 – $1.68 per year, depending on service level. Recovery is hw and sw, no staff time or data center charges.

Storage Census 2010

51% of storage being used by research. 35% Admin and Overhead (including email), 11% Teaching, 3% Public Service.

72% of storage is backup vs. online.

Next steps: identify and promote research solutions; build central backup service; build, promote archival solutions.

Storage @ U VIrginia – Jim Jolkl

Hierarchical Storage Manager Services: Storage for long-term research data (centrally funded but not well marketed); Library materials (funding via Library contributions to infrastructure); RESSCU (off-campus service for departmental disaster recovery backups).

Enterprise Storage – Based on Netapp clusters. NFS, CIFS for users, ISCSI, SAN internally. Works really well, highly reliable, replicated. Mostly used for central services. For departments it’s $3.20/GB/yr to $3.50 without backups. Lots of incidental sales to people who want a gigabyte or so for additional email quota. Doesn’t work for people who want a lot of storage.

New mid-tier storage service – focus on a reasonable and affordable storage service for departments and researchers.
Requirements: reliable, low cost, low overhead, self service. Unbundled services – optional remote replication and backups. Access via NFS and CIFS. Snapshots – users deal with their own restores. Offering Linux and WIndows versions. Doing group files based on their groups infrastructure. Using RAIDKING disk arrays. Using BetterFS on Fedora, Windows server for the windows side.

Cost model – 1 hour plus $0.34/GB/yr (raid5, but not replicated). Next year expect to drop price by 50%. Currently about 22 TB leased on NFS and only marginal WIndows use to date. All of the complaints about the costs of central storage have gone away. Research groups interested in buying big chunks.

Shel Waggener – Berkeley Storage & Backup Strategy

Shel says scale matters and no matter who says they’re doing it better faster cheaper, without scale they’re not.

2003 – every department runs own storage – including seven within central IT.
2004 – data center moves creates opportunity for common architecture
2006 – dedicated storage group formed. No further central storage purchases supported except throuh storage team.
2007 – Hitachi wins bakeoff. 250 TB. Email team works with storage group to move from direct-attached to SAN
2010 – over 500 hosts using pool – 1.25 PB expanding to 3 PB this year.

SAN-based approach. Lots of serial attached SCSI disk – moving away from fiber-channel.

Cheapest storage is now 25 cents gigabyte per month. The most expensive tier (now $4.00/GB/Month) bears the cost of the expensive infrastructure that the other tiers leverage.

Failure rate on cheap disk is reliable, but recovery time is longer.

At the cost of storage, they don’t have quotas for email.

One advantage is paying for today’s storage today. Departments buy big arrays and use 5% in the first two years, which is much more expensive. But that’s what’s supported by NIH and NSF.

Backing up 338 users’ desktops (in IST) takes up 1.3 TB.

[CSG Spring 2010] Storing Data Forever

Serge from Princeton is talking about storing data. There’s a piece by MacKenzie Smith called Managing Research Data 101.

What do we mean by data? What about transcribing obsolete formats? Lot of metadata issues. Lots of issues.

What is “forever”? Serge thinks we’re talking about storing for as long as we possibly can, which can’t be precisely defined.

Why store data forever?
– because we have to – funding agencies want data “sharing” plans – e.g. NIH data sharing policy (2003). NIH says that applicants may request funds for data sharing and archiving.
Science Insider May 5 – Ed Seidel says NSF will require applicants to submit a data management plan. That could include saying “we will not retain data”.

– Because we need to encourage honesty – e.g. did Mendel cheat?
– Like open source help uncover mistakes or bugs.
– Open data and access movement – what about research data?

Michael Pickett asks who owns the data? At Brown, the institution claims to own the data.

Cliff Lynch notes that most of the time the data is not copryightable, so that “ownership” comes down to “possession”

There’s a great deal of variation by branch of science on what the release schedules look like – planetary research scientists get a couple of years to work their data before releasing to others, whereas in genomics the model is to pump out the data almost every night.

Current storage models
– Let someone else do it
– Government agency/lab/bureau e.g. NASA, NOAA
– Professional society
–

Dryad is an interesting model – if you publish in a given model you can deposit your data there. That’s like genbank.

Duraspace wants to promote a cloud storage model based on dspace and fedora.

There are a number of data repositories that are government sponsored that started in universities.

Shel says that researchers will be putting data in the cloud as part of the research process, but where does it migrate to?

Serge’s proposal – Pay once, store endlessly (Terry notes that it’s also called a ponzi scheme).

Total cost of storage =
I = initial cost
D = rate at which storage costs decrease yearl, expressed as a fraction
R = how often, in years, storage is replaced
T = cost to store data forever

T = I + (1-d) to the r *I + (1=d) to the 2r * I + ….

if d=20%, r = 4, T=I * 2

If you charge twice the cost of initial storage, you can store the data forever.

They’re trying to implement this model at Princeton, calling it DataSpace.

People costs (calculated per gigabyte managed) also go down over time.

Cliff – there was a task force funded by NSF, Mellon, and JISC on sustainable models for digital preservation – http://brtf.sdsc.edu

[CSG Spring 2010] Staffing for Research Computing

Greg Anderson from Chicago is talking about funding staff for research computing.

Most people in the room raise their hand when asked if they dedicate staff to research computing on campus.

At Illinois they have 175 people in NCSA, but it doesn’t report to CIO.

Shel notes that employees have gotten stretched into doing lots of other things besides just providing research support. They’re trying to rein that back in in their career classification structures by requiring people to classify themselves. Now there’s 300 generalists classified as such.

At Princeton they’ve started a group of scientific sysadmins. The central folks are starting to help with technical supervision, creating some coherence across units. At Berkeley the central organization buys some time from some of the technical groups to make sure that they’re available to work with the central organization. Groups don’t get any design or consultation help unless they agree to put their computers in the data center.

At Columbia they have a central IT employee who works in the new center for (social sciences?) research computing – it’s a new model.

Greg asks how people know what the ratio of staff to research computing support should be and how do they make the case?

Shel asks whether anybody has surveyed grad students and postdocs about the sysadmin work they’re pressed into doing. He thinks that they’re seeing that work as more tangential to their research than they did a few years back.

Dave Lambert is talking about how the skill set for sysadmin has gotten sufficiently complex that the grad student or postdoc can’t hope to be successful at it. He cites the example of finding lots of insecure Oracle databases in research groups.

Klara asks why we always put funding at the start of the discussion of research support? Dave says it’s because of the funding model for research at our institutions. The domain scientists see any investment in this space by NSF as competing directly with the research funding. We need to think about how we build the political process to help lead on these issues.

[CSG Spring 2010] Research Computing Funding Issues

Alan Crosswell from Columbia kicks off the workshop on Research Computing Funding Issues. The goals of the session are: what works, what are best practices, what are barriers or enablers for best practices?

Agenda:
– Grants Jargon 101 – Alan
– Funding Infrastructure, primarily data centers- Alan
– Funding servers and storeage – Curt
– Funding staff – Greg
– Funding storace and archival life cycle – Serge and Raj
– Summary and reports from related initiatives – Raj

Grants Jargon
– A21: Principles for determining costs applicable to grants, contracts and other agreements with educational institutions. What are allowed and unallowed costs.
– you can’t charge people different rates for the same service.
– direct costs – personnel, equipment, supplies, travel consultants, tuition, central computer charges, core facility charges
– indirect costs a/k/a Facilities adn Admin (F&A) – overhead costs such as heat, administrative salaries, etc.
– negotiated with federal government. Columbia’s rate is 61%. PIs see this as wasted money.
– modified direct costs – substractions include equipment, participant support, GRA tuition, alteration or renovation, subcontracts > $25k.
Faculty want to know why everything they need isn’t included in the indirect cost. Faculty want to know why they can buy servers without paying overhead, but if they buy services from central IT they pay the overhead. Shel notes that CPU or storage as a service is the only logical direction, but how do we do that cost effectively under A21? Dave Lambert says that they negotiated a new agreement with HHS for their new data center. Dave Gift says that at Michigan State they let researchers buy nodes in a condo model, but some think that’s inefficient and not a good model for the future.
Alan asks whether other core shared facilities like gene sequencers are subject to indirect costs.

Campus Data Center Models
– Institutional core research facility – a number that grew out of former NSF supercomputer centers.
– Departmental closet clusters – sucking up lots of electricity that gets tossed back into the overhead.
– Shared data centers between administration and research – Columbia got some stimulus funding for some renovation around NIH research facilities.
– Multi-institution facilities (e.g. RENCI in North Carolina, recent announcement in Massachusets)
– Cloud – faculty go out with credit card and buy cycles on Amazon
– Funding spans the gamut from fully institutionally funded to fully grant funded.

Funding pre-workshop survey results
– 19 of 22 have centrally run research data centers, mostly (15) centrally funded. 9 counts of charge-back, 3 counts of grant funding)
– 18 of 22 respondents have departmentally run research data centers, mostly (14 counts) departmentally funded (3 counts of using charge back, 4 counts of grant funding)
– 14 have inventoried their research data centers
– 10 have gathered systematic data on research computing needs

Dave Lambert – had to create a cost allocation structure for the data centers for the rest of the institution to match what they charge grants for research use, in order to satisfy A21’s requirement to not charge different rates.

Kitty – as universities start revealing the costs of electricity to faculty, people will be encouraged to join the central facility. Dave notes that security often provides another incentive for people because of the visibility of incidents. At Georgetown they now have security office (in IT) review of research grants.

Curt Hillegas from Princeton is talking about Server and Short to Mid-Term Storage Funding
talking about working storage, not long-term archival storage
-some funding has to kick-start the process – either an individual faculty member or central funding. Gary Chapman notes that there’s an argument to be made for central funding of interim funding to keep the resources going between grant cycles.

Bernard says that at Minnesota they’ve done a server inventory and found that servers are located in 225 rooms in 150 different buildings, but only 15% of those are devoted to research. Sally Jackson thinks the same is approximately true at Illinois. At Princeton about 50% of computing is research, and that’s expected to grow.

Stanford is looking at providing their core image as an Amazon Machine Image.

At UC Berkeley they have three supported computational models available and they fund design consulting with PIs before the grant.

Cornell has a fee-for-service model that is starting to work well. At Princeton that has never worked.

Life Cycle management – you gotta kill the thing, to make room for the new. Terry says we need a “cash for computer clunkers” program. You need to offer transition help for researchers.