Oren’s Blog

Campus Safety and Security, pt. 2

UVa events – Marge Sidebottom and Virginia Evans (UVA)

How do we determine where high risk areas are on any given day, and are they located in the right places for the controversy that might accompany any given guest speaker? Beginning to populate a system to record those. Look at controversial speakers, as well as protests. The lone wolf terrorist is the other common concern  – may find information that helps to plan better. Expect threat assessment team to look at issues within their own areas, and mitigate those – if they can’t then it escalates to the threat assessment team, which meets weekly.

Aug 11 & 12 – protest by white supremacists and neo-nazis. There were lots of advanced preparations by the city and the campus. This culminated a series of events over the previous months in different parks. Several hundred showed up at UVa on Friday night with lit torches and surrounded a small number of students. Violence broke out, but police dispersed activity. By late morning Saturday there were thousands in a small area of downtown Charlottesville, including heavily armed alt-right protesters. Then the car ramming event happened, and then the police helicopter crashed.

The University had begun planning three week prior to the event. Had 2 meetings a week of the emergency incident management team, and the President held daily meetings. There is a city/county/university EOC structure. The city decided to have their EOC in a different location, which compromised communications. University teams went on 12 hour shifts beginning Friday morning.

When protesters moved on campus, the events developed very rapidly. It became clear that they were not following the plan they had committed to.

Having the EOC stood up was very useful. Had the University’s emergency management team in a separate room, so they could be briefed regularly. At 11:50 on Saturday, cancelled activities on campus starting at noon to not have venues that presented opportunities for confrontations. Worked carefully with a long-planned wedding at the chapel, but it did take place. They were unaware of admissions tours that were going on – once they found out, rallied faculty to accompany student guides and families, and then ended tours early.

Taking care of the needs for mental health attention for participants is important.

John DiFava (MIT Chief of Police)

MIT culture – it can’t happen here, and it won’t happen here. Also the culture of the city of Cambridge is very open and loose. Campus police used to be focused on friendly service and would call in external agencies when in need. Times have changed – policing on campus is just as complex and demanding as any other type of policing. Universities are no longer isolated.

Columbine massacre had a tremendous impact – Officers followed procedure to establish perimeter and wait for tactical units to arrive. Now they are taught to make entry.

9/11 attacks had a significant impact on policing. MIT police lost all of their officers to other jurisdictions immediately. Interagency cooperation was was inadequate. Created a cascading effect – the cavalry was out of town, so had to rely on local resources.

New reality  – had to be able to function wihtout assistance; aid would not arrive as quickly and in the quantity it once did.

Steps taken to improve capability and performance – a comprehensive approach: Recruitment process, promotional system, supervision, training improvements – do in-service training with Cambridge Police and Harvard; firearms requalification three times a year (twice during the day, once in low light); specialized training for every officer; active shooter training (with Cambridge PD, Harvard, and MSP).

Work with Institute entities – Emergency management reports to Police.

Emergency Communication: Interface Between Public Safety and IT
Andy Birchfield , Jeff McDole, Andy Palms: University of Michigan

Certain emergency phases – Pre-incident planning, inbound emergency notification, emergency assessment, emergency alert operation, emergency notification delivery. The value to the community of notifications is based on total time of all phases.

Pre-incident planning: Activities include: message templates; policy and procedure; establish expectations and know your community; analysis of delivery modes with recognition of delivery times for each mode; evaluation of lessons learned; training and exercises; prepare infrastructure.

Inbound emergency notification: Making it simple, do it like they do it every day (students choose their cell phones over using the emergency blue phones); Get as much information as possible: video, audio (phone), text; Enable people to contact us in the ways they know — social media, apps, etc; coverage and capacity; knowing where the person is.

Emergency Assessment: Issues include confirmation, authorization, timeliness. If you can get a message out in 8-10 minutes of an incident, you’re doing well.

Emergency alert operation: additional modes and desired content will delay message creation; decisions and effort slow the operator; Hick’s Law: the time it takes for a person to make a decision as a result of the possible choices they have: increasing the number of choices will increase the decision time logarithmically.

Emergency notification delivery: Speed is the priority. Issues: Get people to sign up for the right service(s) – there is not a single mode; infrastructure coverage. They can get delivery to every email inbox in Ann Arbor (~105k) in about 7 minutes, but email is not the only mode. They have apps with push notifications – time of delivery is right around 10 seconds. The future is focused messages to appropriate recipients, by topic, or location, by individual choice.

Emergency Notification Systems: Ludwig Gantner, Andrew Marinik, VA Tech

VTAlerts – designed with redundancy. Goal is that every member of the community will be notified by at least one channel. Originally built in-house, but now a complex hybrid environment with some local and some vendor channels in the cloud.

New beginning – recognition of prioritized support for public safety. Group within IT expanded to include more channels: VT Alerts, blue light telephones, next generation 911, security camera system. Having one group responsible gives one point of contact in IT for public safety officials. Having dedicated staff allows for much better response times. They’ve removed dependencies on single individuals.

Communication – notification and collaboration – use the ticketing system.

Sustainable support – important to be proactive rather than reactive in public safety. New monitoring capabilities, improved redundancy, long term planning, channel development.

Collaboration – IT recognizes technical needs; public safety prioritizes items.

ENS philosophy: What is happening, where is it happening, what do we want you to do about it? They have 21 templates.

Current Challenges: How do we institutionalize the process to avoid backsliding when people change? What are appropriate success metrics for system evaluation? What are the cyber-security concerns of the components and system as a whole?

Evolving Radio Technologies – Glenn Rodrigues, UC Boulder

LMR (land mobile radio) project at CU Boulder. Business problem: lack of ability to communicate between Public Safety officials and leadership during planned events and unplanned incidents; Officers don’t feel safe doing their job without proper communications. Plan of action: Complete LMR audit for University; short term fixes; long term fixes.

Audit: requirements – contractor had to be vendor neutral; LMR customer interview and use case mapping; technical recommendations backed with data. Output: Clients – CUPD + 9 other business units. Biggest problem was coverage inside buildings (and system overloads). Tech assessment: most equipment was over 10 years old and malfunctioning, no real resource dedicated to monitor and engage with customers, most portable radios were not optimal. Business assessment: lack of policy enforcement (internal and external); lack of visibility of individual unit needs; lack of engagement with business partners. Plans: stabilize current LMR system under limited budget in 3 months by replacing high risk or failed equipment, leverage existing University assets (monitoring, backup power). Longer term: want to patch LMR into the campus fiber backbone. RFI in process.

John Board – Duke

Had the opportunity to green field a managed, networked, camera system. Lawyers were concerned about lack of standardization and maintenance of existing cameras. Started with parking decks. Goal was evidentiary, not live surveillance. Budgeted cost actually included maintenance, ongoing verification and network and storage costs. All cameras installed and operationally verified by OIT. Cisco VSOM, decent API. 1024 cameras in operation now.

The institution is zealous about privacy. They have a policy about access to live and stored images; have a retention policy; there’s a committee that decides where cameras go (you can’t put up cameras up outside the system). Challenge around need vs demand.

Wanting to do automated image analysis to verify that cameras are working, e.g. deviation of sample image vs reference image. EE faculty proposed writing an algorithm for this. After some experimentation came to an algorithm that filters ~80% of good cameras, while reliably identifying 100% of bad cameras. By using 3-day averages, safely filters 95% of good cameras – declaring victory!

Va Tech – Crowd Monitoring and Management on Game Day – Major Mac Babb

Stadium holds 66k people. Originally built in 1965. Hokie Village across the street, 20 parking lots, most of which are licensed for alcohol.

Unified Command off 7th floor of stadium. 160 Officers, Office of Emergency Management, Communications / Dispatch, Rescue, Fire, Game Operations, Event Staff/Security/ADA Services (545 event personnel), Parking, and Stadium Facilities and Grounds Ops.

Technology Assistance – CAD terminals and radio dispatch. See same screens as regional center. Access to around 400 cameras around campus. Weather systems fed into ops center, Veoci incident management program, Athletics comms channels, social media, emergency notification system. Supported by security center at public safety building.

Team Tops Technology – University of Washington’s Approach to Crisis Commnunications – Andy Ward

Seattle Crisis Communication Team – News & Information, Police, Marketing, IT, Emergency Management, Housing & Food Services, UW Medical Center.

Roles – Initiator, Incident Commander (for communications), Communicator, Monitors

Crisis communications toolkit – UW Alert Blog (wordpress.com) — can send messages to banners on the UW home page and to the hot-line telephone. UW Alert (e2campus) sends text and email messages. UW Alert facebook and twitter channels. There’s an outdoor alert system (talkaphone) and an indoor alert system (PA capabilities on fire alarm system (problem is they have to send to all buildings at once)). Plan to use Red Cross Safe & Well system to account for people.

97% of time crisis communication team  is activated by campus police — 20 some people, calling into a conference bridge. Initiator briefs team, primarily incident commander who decides what action to take. Person who initiates the call should be ready to send out the first message. Decide which tool(s) to use to send alert, and then team stays on the bridge after the message is sent.

Police are not the incident commanders for communications.

When incident is over, they send out an all-clear message.

IT’s role during an incident: Monitor technology performance; Troubleshoot immediately; Provide technical expertise; Provide depth to the team.

Police have ability to send messages if immediacy is needed.

Subteams from all 3 campuses meet and recommend policies.


CSG Fall 2017 – Campus Safety & Security, pt 1

We’re at Virginia Tech this time. The topic of this special day-long workshop at CSG is about Campus Safety and Security and what we’ve learned in the ten years since the VaTech shootings and in the wake of other major events at our campuses in terms of mass notifications and using technology to protect the people at our institutions.

Scott – The technology is easy once we’ve communicated the capabilities and limitations of the systems are, so realistic expectations can enable planning.

VaTech President formed a working group as an outcome of event in 2007: Telecom Infrastructure Working Group. Looked at 14 major university and regional systems. Involved over 80 committed professionals and faculty from IT, law enforcement and administration, with contributions of more than 60 additional individuals. Examined: Performance, stress-response and interoperability of all communications for multiple areas. Notifications to community, internal communications, etc. Who is the community, how are they notified? What’s the risk of sending targeted communications. It’s increasingly feasible to know locations of individuals – do we track that and attempt to target notifications to that? Nuances of what the event is has importance. How many preformed message templates should you have? Important to vet the accuracy of the information being communicated – time for analysis, but how much time do you take?

In the analysis, the technology was only involved in the response — the mitigation, preparedness, and recovery involved other parts of the institution.

WebEx with Klara Jelinkova from Rice – Hurricane Harvey Response

Wed Aug 23 – Harvey strengthens to tropical storm
Thursday strengthens to Cat 1
Friday goes to Cat 4 and makes landfall.

When it happens that quick, you have what you got: They had a service ist with criticality and emergency preparedness plan for when people can’t come to work. Primary datacenter can operate for 10 days without power, and they needed it. The secondary network is on a medical backbone.

Planning – moving to VOIP, not all data available in off site tape backup, so did a quick emergency backup to AWS Glacier (which challenged the firewall) – now looking at getting rid of tape entirely. Also looking at backup of HPC and research data — the researchers are supposed to pay for it, but nobody does. Moving major systems to cloud.

New plans they need: load balancers dependent on OIT datacenters being operational – looking at redesign in the cloud. IDM is utilizes SMU for continuity, but needs to move to cloud for scaling. Have a sophisticated email list service – everybody wanted to use it rather than the the broad blast emergency notification system. Realized that the list service is more critical than the alert system.

CISO was flooded and evacuated, so the learning management person ended up running the IT crisis center.

Institutional lessons: Standing Crisis Management Team – Good. Includes student representation. Contracts – where are you on the list for food and fuel delivery? Things that matter: flushing toilets, drinking water, food, payroll (people go to the cash economy, so make sure they have funds), network, communications services. Knowing where your people are and what they are facing – where do they live, mash that up with flooded areas – can they get to work, do they have internet, etc. Loaded everybody from ERP, geocoded addresses and put them on map and overlaid intelligence. Had needs assessment tools: housing assessments, childcare, etc. (forms built in Acquia). Lot of the hourly workers are not English speakers and don’t have smartphones (or know how to get to the resources). Put students to work in phone banks to call every person who didn’t respond to surveys. Put together departmental reports that they sent daily. Had less requests for temporary housing than offers to house people. Assessed impact of damage on specific courses. Was used to figure out when they were ready to reopen.

What worked: collecting data centrally but distributing initial assessment to divisions for analysis and followup. Didn’t sweat getting the data perfect initially. Gave deans and VPs sense of ownership. Brought in an academic geospatial research team for analysis that helped work with IT.

Quality of HR data was an issue.

Melissa Zak, UC Boulder, Ass’t Vice Chancellor of Safety  – Digital Engagement

October 5 – 3 significant events. Pre-event: strategy relations functional exercises, prior trainings, EMPG/EMOG/ECWG process and plans, alert notification systems, success of cyber teams (including law enforcement).

Somebody parked at stadium and started chasing people with a machete. Low threshold event because there was a small population present, but included community members there for treatment. One person on dispatch – requires a lot of multitasking at the best of times. First alerts went out within 15 minutes of first report to dispatch.

2nd event at 1 pm – coffee shop employee called corporate office about the first event, and they directed closing all the shops in the city, which led to reports of active harmer events at multiple shops across the city. Social media begins to erupt from campus. Sent out an alert that it was all clear, that there was no incident.

3rd event – 7:37 pm another alert went to one student from another college about an event. But then people started wondering whether the alert system had been hacked. Really highlights the impact of messages spreading by social media – students will drive event.

What went right? Great communication partnership with CUPD, CU, Boulder Police, Coroner, and CU Athletics.

What didn’t go as well? Messaging and clarity of messages. Community notification channels are important. If you have lots of people subscribed, it takes time to receive messages, and they may not arrive in order. Have now realized that sending notifications every 15 minutes is the best cadence. Now have a policy to send notifications informing people of any major deployment of police.

How do we deal with people who mainly communicate via social media channels?

Communication resource limitations – need to invoke more resources than just the one dispatcher.



CSG Spring 2017 – Challenges of Shifting IT to a Trusted Business Transformation Partner

Jim Phelps (Washington) is setting the tone for the workshop on shifting IT to a business partner.

Crisis in retail – new Nordstrom building in Seattle with the network density of a data center and the reconfigurability of a maker space. User-centered design and hyper-personalization. Built on internet of things, machine learning and AI, all designed for end user on mobile devices. Driven by big data analysis in close to real time. Incredibly tight link between IT and business. Autonomous systems like Amazon Echo.

Not just retail – example of Pacific Northwest National Labs personalized app.

Technical Drivers and Cultural Drivers lead to Business Transformation

Comparing our crisis to their crisis – Google crisis in retail – 266 k results, crisis in higher education has 385k results. We have 147% more crisis!

What does digital mean for higher education? What can we do with AI, IOT, hyper-personalization to be user-centric and personalized? What help do our business partners want with this transition?

HBR Analytics asked business leaders what will be IT’s most important contribution to the business over the next three years?

Lowest ranked: Lead and implement most IT projects. What they really want is IT to drive business innovation, manage security and risk, support business-led IT initiatives and establish architectures to support digital. Looking for a different engagement model with evangelizing, consulting, brokering, coaching and (last) delivering.

Impacts of the distributed university – fragmentation in leadership and mission, as well as IT  and the business. IT’s leadership challenge: Working across IT teams to better align, work to overcome barriers between IT and the business; working upward to enable better informed more unified leadership; working across units to create shared language and definitions.

Tom Lewis (Washington) – A UCD Approach to the Five Methods of Engagement

Evangelizing – keep abreast of emerging digital trends and educate business partners on opportunities – know campus needs to identify emerging trends to pinpoint the right ones, and work with campus educate partners.

Consulting – Offer advice and frameworks to enable successful business leadership of technology investments. They offer a User Centered Design framework to help people focus on their users. Example of customer journey mapping. Points of engagement: project planning by helping draft project charter and scope; research design; data analysis. Helped team identify research questions, define scope, redirect focus from artifact to gathering insights, provide step-by-step advice.

Brokering – know campus need to provide internal connections; work with vendors to provide external connections; work with campus to provide leadership of SaaS investments. Example of Canvas selection and implementation. Identify opportunities through knowledge of campus needs. Validate campus needs and understand priorities.  Work intensively with the vendor.

Coaching – develop employee skills and share expertise with others.

Takeaways: Know campus needs, work campus, work with service owners (internal or vendors).

Harvard IT Academy – Trusted Advisor, Facilitated by Deirdre O’Shea

IT Academy – Reskilling our IT professionals for a changing IT landscape, started summer 2015.  Skills identified – having a service mindset, being a trusted advisor, foundational knowledge of agile, ITIL, security, and project management. This year will start to identify technical skills by job families. Four levels in each competency – create common language, take foundational knowledge to think through how to apply it, take concepts and implement them for your team, expert, where you teach others. 52 IT facilitators across Harvard, 135 level 1 classes as of 4/30/17, 2,918 participant completions in Level 1.

Methods – co-facilitation, interactive dialog & exercises, challenge & support, materials, action plan. 3 year investment $1.5 million. Majority is for content licensing and bringing a vendor for ITIL certification. Two full-time staff.

Service Mindset – first class they rolled out. Trusted Advisor starts here. Three competencies: Accountability, Collaborative Partnerships, and Empathy. If you aren’t putting users at the center, you can’t become a trusted advisor.

Trusted Advisor – three competencies: effective communication, connecting, proactive problem solving. Introduction activity – who do you consider a trusted advisor? What characteristics did they demonstrate? (table exercise). Effective communication: factors that influence communication, active listening, miscommunication/ladder of inference, information exchange, questioning, exploring differences.

Active listening – create the right environment, listen until you no longer exist, paraphrase, perception check.

Connecting – partnership spiral, positioning ourselves as a value added partner, trust & credibility, developing & improving relationships.

View building trust as a Marathon of Sprints – good work sustained over time.

Proactive problem solving – identify future needs, influence strategies, motivate users to problem solve, benefits vs. features. Feature describes “what”; benefit describes “so what?”. A feature is what something is, a benefit is what something does.

Bring it together with a case study. Interview stakeholders, build advice, etc.

Christina Tenerowicz  (Colorado) – Business Analysis Relationship at CU

Business Analysis & Solution Architecture – started in Research Administration, now moved to central IT.

21 people in the group.

What’s successful? Business partnership, leadership, and technology. Everyone is accountable for successful delivery of a project. Program manager for each vertical – Research Admin, HR, Student Services, Academic Admin, Advancement and Athletics, Finance. Meet monthly with directors, do a multi-year roadmap showing business benefits along with costs and resources. They offer Business Needs Discovery and Requirements as a service. After a go-live you need to be there for post-implementation support and adoption.

Challenges – Relationship management; continual care and feeding; communication; educating and coaching leadership on business analysis.

Paul Erickson “IT as either an adoption agency or a hospice”

Louis King – we don’t look where we can divest.

Mojgan Amini – Started putting all IT staff through Lean Six Sigma training, and invited business partners which really helped the conversation.

CSG Spring 2017 – Automating Campus Network Configuration

We’re at Yale for the Spring CSG meeting. It’s a beautiful, sunny New England spring day!

The first workshop is on Automating Campus Network configuration, provisioning, and monitoring Workshop Presentations.

Mark McCahill – Duke – Thinking about network automation/monitoring

Campus wireless is one of the most complicated things we run. Campus APs – averages ~6k in 250 buildings across our campuses. RF spectrum issues. How reproducible are trouble reports?

How many staff support your network – 7.5 in engineering/architecture, 10 field staff, average.

We have not converged on network management tools at all.

Monitoring taxonomy – how can we categorize tools? Data gathering, analysis, alerting, trending.

Automation strategy – understand the environment – monitoring!
Ideal end-state – standardized process, consistent quality, reduced cycle time/increased productivity.

User centric monitoring of the wireless network

Users don’t tell us that much. Should I even tell IT there’s a problem? It’s not a good experience just because they don’t complain.

Crowd sourced monitoring – boomerang. javascript in a web page that attempts to download files of various sizes – can figure out latency and performance.

via.oit.duke.edu – zero-install. Duke’s shib page includes boomerang code. Results reported to via.oit.duke.edu, stored in mySQL db. Self-service diagnostic testing available at via to check performance to various data centers.


You get into big data fairly quickly – it’s a statistics game. Put pages at your cloud and different data centers to measure to them individually.

Where are the trouble spots? Key questions; what are the chances of a good connection? Which wireless segments are overloaded? Instead of depending on vendor tools, use R to analyze data from boomerang. You can do statistical process control to gather objective measures.

How to monitor when they can’t connect? Simulate users with strategically situated Raspberry Pi devices that do the EAP+PEAP authenticate & DHCP dance to get on the network. Source: https;//github.com/duke-automation/raspi – dumping data into splunk for analysis.
C program makes wpa_supplicant API calls to repeatedly cycle WiFi connection monitoring. Found bimodal distribution of DHCP response times. Also found no correlation between sites. Raspberry Pi tracking wlan interface drops, link quality and signal level.

Next steps – more Raspberry Pis in the field and more monitoring. Check http performance with boomerang on the Pis. Look more at DNS and number of SSIDs detected – could be rogue SSIDs.

Network data collection is a ‘big data’ problem, which is great for statistical analysis. Will use Apache Spark cluster to speed longitudinal analysis.

Should have an iOS app that says “this isn’t good for me right here right now”. Yale has one. https://github.com/YaleSTC/wifi-reporter

Eric Boyd, Michigan – perfSONAR overview

In the context of ScienceDMZ – how do you make sure you’re getting the end-to-end performance?

perfSONAR – enables fault isolation, verify correct operation, widely deployed in ESnet and other networks.

Problem statement – wile networks interconnect, each network is owned by a separate organization – how do we troubleshoot across them? Performance issues are prevalent and distributed. Local testing will not find everything. “soft failures” are different and often go undetected. Elephant flows (giant research loads) vs. mouse flows (web, email). with packets dropping at .0046%, you only get 5% of optimal speed.

perfSONAR is open source, supported by ESnet, GEANT, Internet2.

Something will break – sometimes things break, + human error. 3 phases to deployment: get system up and running; holy cow, we have a lot of network problems; how do we keep it good?

Distributed information sharing mechanism. Can plug any tool into it. DNS lookups, building HTTP tool now. Using a $200 box to deploy on a network to measure performance. Trying to automate things – you don’t want spend > .5 FTE on performance monitoring.

William Diegard – Rice – Network automation topics

You should do perfSONAR.

But some commercial products: Splunk, Extrahop, Deepfield. Not talking about single, largest automation system we run: the wireless controller.

What is automation? Anything that lets you spend time doing new things or be more efficient.

Splunk – MapReduce for your log file processing. The mother of all grep tools. Rice using it to track things and automate things like DMCA violations. Automatic system to look for POE shutdown on Cisco access switches. Monitor Data Transfer Node activities.

Extrahop – Application Performance Monitor, passive network traffic tap grabs “wire data” and reveals it. Make you realize how little you know. Does a bunch of statistical analysis.  Answers question: “why is it slow?”  But – you have to care to take the time to look. Can deploy it in the cloud too. Rice uses it to measure eduroam performance, among other things.

Deepfield – it’s an internet2 service you probably already have. Looks at traffic that Internet2 sees, shows where traffic is going, but you don’t see everything from your border. Does nice categorization.

Sean Dilda – Duke – Cartographer

Why? Network changes faster than diagrams. Troubleshooting network problems is hard! What port is this computer on? What VLAN? What firewalls?

What does it do? Logs into every switch/router every three hours and builds internal map of network. Can look up by IP and Hostname, or lookup by Mac, or see Switch/Router interface stats. They also pull in building data, link to floor plans and google maps. Can get summaries of VRF data, show VLAN stats (including same VLAN number on different LANs). It maps Layer 2 layouts. Great for showing how things plug together for local support staff or new network engineers. WIll map routes from source to destination in a nice graphic layout.

Who can use it? All IT staff across the university, and anyone with access to IPAM. (based on Grouper groups).

Use it to allow local staff with IPAM permissions to change port VLAN, bounce wired port, clear ARP entry, block IPs from network.

New tool: Planishpere. Compines data from Cartographer, DHCP, wireless/radius, device registration, end point management, VMWare, Cisco, etc. Can gather a lot of data about end devices.

Next steps: F5 load balancers, firewall rules and network ACLs, IPS blacklists, Planisphere metrics.

Plan to distribute source.

Scotty Logan – Stanford – Network Delegation and Automation

9 pairs of physical  firewalls, 600 virtual firewalls. Half the rules change per year. 65k firewall rules. Only 4-5k changes are manual. Firewall automation first deployed 2007.

1300 Local network admins active in last 30 days. Only 1200 people in IT job roles per HR.

SNSR – “snoozer” – self registration of devices.

If you come in via VPN or VDI shows who is associated with session, to look up groups for authorization. Now have a web page for Firewall requests, creates ServiceNow requests, and if you have permission it gets updated within an hour without manual intervention (or with an approval loop in ServiceNow.

Device compliance DB – Fed from devices, BigFix. VLRE. Very Lightweight Reporting Engine – runs om Macs and Windows, reports status of machines: do you have the firewall on, disk encrypted, etc? Started deployment of 802.1x with a dedicated Radios pool. Added integration with compliance API to see if device must be compliant and if it is.

OpsWare – automated switch and router management. Backup switch configurations nightly (can do diffs on them), scheduled config changes, check all devices for specific settings.

Matt Brooks – CMU – Controlling Network Access

Limited release of .1x mostly in common spaces and where people float between buildings. WPA2 enterprise, pushing people towards it from clear-text network. Controls IP assignments via DHCP, mout outlets are deactivated by default, self service portal for activating outlets. “Quick-reg” network for on-boarding.

Updating switch and router configs – NetMRI from Infoblox used for regular backups of running configs from every switch and firewall via TFTP; Visual diff tool to review changes; Password changes; software upgrades.  CMU NetConf – initial switch config, interface config changes day-to-day via self-service portal.

Scotty Logan – Dirty Dancing in the Cloud

Why are moving to the cloud? Geo-diversity? Scalability? Cost? Availability?

Don’t do: artisanally crafter services; manual testing; manual deployment; tightly coupled services.

Do do: Devops, loosely coupled

Firewalls and IP addresses are not loosely coupled!
Difficult to get contiguous elastic IPs from AWS. So people do VPC VPNs, and Direct Connect and private routing. Like dumping a 1950s appliance in your brand new kitchen.

If only we had… Inter-networking, and transport layer security, and strong authentication… oh wait – we do!

But.. my CISO says we need to use static IPs – you need to talk to your CISO.

If you have to, use NAT gateways or NAT instances with Elastic IPs

Amazon now supports /56 IPv6 subnet for VPCs.

Azure only allows 200 mbps per link, which HPC jobs can blow out very quickly. Duke doesn’t think that extending campus data center into the cloud is a good idea.

NetDB – Delegated administration. All 1300 local network admins can control firewalls, metadata, delegate domains, carve up address space, etc for devices via self-service.

William Diegard 

Needed to replace stack. Ended up with Infoblox. Talked about outsourcing DNS, which was a huge traumatic conversation. Good think about picking Infoblox was the conversations around campus. Infoblox allows for self-service and visibility of networking to campus. Training session was done by user support team, not networking. They don’t have as much need for frequent firewall rules as other schools due to the broad segmentation of the network.

Matt Brooks 

Application Suite – all home grown. Moving more towards InfoBlox. Using NetReg as  IPAM system – registration of machines on devices, tracks network and switch metadata. CANDO – tracks structure cabling on campus. Allows (central IT)  users to request and sometimes modify outlet configs, and configure interfaces on systems you own in the data center (add VLAN to your trunk, etc). NetConf – switch auto-builder does automated provisioning of new switches and PortAdmin does automated configuration of interfaces based on activations in CANDO.

Staffing and Skill Sets for Network Automation Teams

Matt Brooks – CMU

How is team structured? 9 network ops engineers, two network design engineers, 3-4 network software engineers. Pair developers and network engineers in offices.

What do we look for in developers? Pick two: Developer experience (required) and one of SysAdmin or Networking experience.  Must be genuinely interested in learning the third part. Will learn the third skill by working outages for things that aren’t yours and working on technologies you don’t currently know. Look for generalists, not specialists. A truly curious and self-driven person. The team runs the servers they run on.

War stories

Mark – In early days of perfSONAR discovered that part of backbone wasn’t as strong as it should be. perfSONAR made one of the core routers slow down so much that hospital VOIP didn’t function. Silver lining was that it pointed out some bad router configs.

William – When you start giving client services access to tools, people can do powerful things. Someone wiped out the entire admin database, which caused all the ports to drop.

Scotty – Guy who runs DNS infrastructure pushed out a software change that took down DNS.

Eric – One of perils of network automation is you concentrate your mistakes. What was a local problem becomes a global problem. Automated config of Internet2 switches, in deployment paused between steps to check accuracy, which caused a race condition that erased all the rules on the network. Took Internet2 network down for 20 minutes.

Scotty – maybe we should be applying webscale iterative deployment and testing to our switches, where we have thousands of devices.

Matt – When moving some addresses to a new space, engineer copied a SQL block out of a wiki, but the where clause was outside the highlighted code area on the wiki, so the statement got applied all across the network. Took a full weekend to restore.

Also – tied into IdM systems. Deprovision outlets and systems that a person is individually responsible. Glitch in IdM system caused 1500 active accounts to be de-activated. 4k devices deleted from network very rapidly. Cobbled together a script to figure out what happened and then put data back in place. Managed to do it in an hour.

CSG Winter 2017 – Recommendations/guides for updating IT skill portfolio

Paul Erickson – Nebraska

Framing the issue – don’t have the skill sets or expertise for a cloud world. Run, grow transform – on-prem, traditional focus on “run” – how to change a working environment and shift investment/resources (how do you change an engine while the car is running?)?

What skill sets are we missing? Process management (granting/revoking; provisioning; integration; authorizations/permissions, vendor coordination, managing interdependencies); Integration; Product/Service management; Client relationship management.

People who contributed in the past might not have the skills to take us into the future. How do we offer them opportunities to grow that honor their contributions and allow them to grow? Adapt and evolve in an environment of continuous change.

Identify ideal employee skills. Help those who are great technologists make the transition.

Denise Cunningham – Columbia

Head of HR for technology division.

One reason people resist change is because they focus on what the have to give up instead of what they have to gain. Important to keep this at front of mind.

A framework for Organizational Performance & Change: Burke-Litwin Model


External environment (e.g. the cloud) impacts the organization. The spine of the model – external environment influences leadership, which influences management practice, then work unit climate, then motivation, then individual.

Focus on Work Unit Climate: What it feels like to work here; nature of our interaction with each other; interpersonal relations in the group; what we focus on and consider important.

What factors influence Work Unit Climate? Leadership and management practices. Work unit climate is the most direct factor in performance.

There’s a learning climate or a performance climate. Learning: emphasis on improving skills and abilities; stresses process and learning; motivated to increase competence and change. Performance: emphasis is on demonstrating skills; stresses outcomes and results; people are afraid to make mistakes or change.

Goals: Learning: quality, trying new things original ideas; effort. Performance: following standard procedures; high performance standards; getting task done on time.

Feedback: Learning climate: supportive/coaching role; improving work quality; two-way feedback, questions encouraged. Performance climate: evaluative role; level of competence compared to other employees, one-way feedback, questions discouraged.

When implementing change employees want to hear about it from their manager.

There is no correlation between strong individual contributors and leaders.

Changing organizational culture can take 12-18 months. Or two years in higher education. Can’t do it at all without leadership being a part of it.

When people say they’re going to get in trouble, that can be a rationale for not changing. How do you make sure new staff don’t become part of a dysfunctional culture. Ask questions at hiring about the core values. Zappo’s does this well. Build the values into the performance appraisal.




CSG Winter 2017 – Cloud ERP Workshop

Stanford University – Cloud Transformations – Bruce Vincent

Why Cloud and Why now? Earthquake danger; campus space; quick provisioning; easy scalability; new features and functions more quickly

Vision for Stanford UIT cloud transformation program: Starting to behave like an enterprise. Shift most of service portfolio to cloud. A lot of self-examination – assessment of organization and staff. Refactoring of skills.

Trends and areas of importance: Cloud  – requires standards, process changes, amended roles; Automation – not just for efficiency – requires API integration; IAM – federated and social identities, post-password era nearing for SSO; Security – stop using address based access control; Strategic placement of strong tech staff in key positions; timescale of cloud ignores our annual cycles.

Challenges regarding cloud deployments: Business processes tightly coupled within SaaS products, e.g. ServiceNow and Salesforce; Tracking our assets which increasingly exist in disparate XaaS products; Representing the interrelationships between cloud assets; Not using our own domain namespace in URLs.

Trying to make ServiceNow the system of record about assets – need to integrate it with the automation of spinning instances up and down in the cloud.

Cloud ERP – Governance and Cloud ERP – Jim Phelps, Washington

UW going live with Workday in July. Migrating from old mainframe system and distributed business processes and systems. Business process change is difficult. Built an integrated service center (ISC) with 4 tiers of help.

Integrated Governance Model:  across business domains; equal voice from campus; linking business and technology; strategic, transformative, efficient…

Governance Design: Approach – set strategic direction; build roadmap; govern change – built out RACI diagram.

“Central” vs “Campus” change requests – set up a rubric for evaluating: governance should review and approve major changes.

Need for a common structured change request: help desk requests and structured change requests should be easily rerouted to each others’ queues.

Governance seats (proposed): 7 people – small and nimble, but representative of campus diversity.

Focus of governance group needs to be delivering greatest value for the whole university and leading transformational change of HR/P domains. Members must bring a transformational and strategic vision to the table. They must drive continuous change and improvements over time.

Next challenge: transition planning and execution – balancing implementation governance with ISC governance throughout transition – need to have a clear definition of stabilization.

Next steps: determine role of new EVP in RACI; Align with vision of executive director of ISC; provost to formally instantiate ISC governance; develop and implement transition plan; turn into operational processes

UMN ERP Governance – Sharon Ramallo

Went live with 9.2 Peoplesoft on 4/20/2015 – no issues at go-live!

Implemented governance process and continue to operate governance

Process: Planning, Budgeting; Refine; Execution; Refine

  • Executive Oversight Committee – Chair: VP Finance. Members: VP OIT, HR, Vice Provost
  • Operational Administrative Steering Committee: Char: Sr. Dir App Dev;
  • Administrative Computing Steering Committee – people who run the operational teams
  • Change Approval Board

Their CAB process builds a calendar in ServiceNow.

USC Experience in the Cloud – Steve O’Donnell

Current admin systems  – Kuali KFS/Coeus, custom SIS (Mainframe), Lawson, Workday, Cognos

Staffing and skill modernization: Burden of support shifts from an IT knowledge base to more of a business knowledge base – in terms of accountability and knowledge.  IT skill still required for integrations, complex reporting, etc. USC staffing and skill requirements disrupted.

Challenges: Who drives the roadmap and support? IT Ownership vs. business ownership; Central vs. Decentralized; Attrition in legacy system support staff. At risk skills: legacy programmers, data center, platform support, analysts supporting individual areas.

Mitigation: establishing clear vision for system ownership and support; restructure existing support org; repurpose by offering re-tooling/training; Opportunity for less experienced resources – leverage recent grads, get fresh thinking; fellowship/internships to help augment teams.

Business Process Engineering – USC Use cases

Kuali Deployment: Don’t disrupt campus operations. No business process changes. Easier to implement, but no big bang.

Workday HCM/Payroll: Use delivered business process as starting point. Engaged folks from central business, without enough input from campus at large. Frustrating for academics. Workday as a design partner was challenging. Make change management core from beginning – real lever is conversations with campus partners. Sketch future state impact early and consult with individual areas.

Current Approach – FIN pre-implementation investment

Demonstrations & Data gathering (requirements gathering): Sep – Nov. Led by Deloitte consultants; cover each administrative area; work team identifies USC requirements; Community reviews and provides feedback. Use the services folks, not the sales folks.

Workshops (develop requirements)- Nov – Feb. Led by USC business analysts, supported by Deloitte; Work teams further clarify requirements and identify how USC will use Workday; Community reviews draft and provides feedback

Playbacks (configure): March – May. Co-led by consultants and business analysts; Workday configured to execute high-level USC business requirements; Audience includes central and department-level users

Outcomes: Requirements catalog; application fit-gap; blueprint for new chart of accounts; future business process concepts; impacts on other enterprise systems; data conversation requirements; deployment scope, support model

CIO Panel – John Board; Bill Clebsch; Virginia Evans; Ron Kraemer; Kelli Trosvig

Cloud – ready for prime time ERP or not? Bill – approaching cautiously, we don’t know if these are the ultimate golden handcuffs. How do we get out of the SaaS vendors when we need to? Peoplesoft HR implementation has 6,000 customizations and a user community that is very used to being coddled to keep their processes. ERP is towards the bottom of the list for cloud.

Virginia – ERP was at the bottom of list, but business transformation and merger of medical center and physicians with university HR drove reconsideration. Eventually everything will be in the cloud.

John – ERP firmly at the bottom of the list.

Kelli – at Washington were not ready for the implementation they took on – trusted that they could keep quirky business processes, but that wasn’t the case. Took a lot of expenditure of political capital. Everyone around the table thought it was all about other people changing. Very difficult to get large institutions onto SaaS solutions because the business processes are so inflexible. Natural tendency is to stick with what you know – many people in our institutions have never worked anywhere else. Probably easier at smaller or more top-down institutions.

Ron – Should ask is higher-ed ready for prime time ERP or not? We keep trying to fix the flower when it fails to bloom. People changing ERPs are doing it because they have to – data center might be dying, cobol programmers might be done. Try to spend time fixing the ecosystem. Stop fixing the damn flower.

Kelli – it’s about how you do systemic change, not at a theoretical level.

Bill – what problem are we trying to solve? Need to be clear when we go into implementations. At Stanford want to get rid of data centers -space at too much of a premium, too hard to get permits, etc.

John – there’s an opportunity to be trusted to advise on system issues, integration, etc.

Kelli & Ron – The financial models of cap-ex vs. op-ex is a critical success factor.

Ron – separating pre-sales versions from reality is critical. That’s where we can play an important role.

John – we have massive intellectual expertise on campus, but we’ve done a terrible job of leveraging our information to help make the campus work better. We’ve got the data, but we haven’t been using it well.

Bernie – we need to start with rationalizing our university businesses before we tackle the ERP.

Ron – incumbent on us to tell a story to the Presidents. When ND looks at moving Ellucian they think what if they can stop running things that require infrastructure and licenses on campus? Positions us better than we are today. Epiphany over the last 6 months: We have to start telling stories – we can’t just pretend we know the right things to do. Let’s start gathering stories and sharing them.

Kitty – Part of the story is about the junk we have right now. The leaders don’t necessarily know how bad the business processes and proliferation of services are.

CSG Winter 2017 – New Models for Supporting the Academic Enterprise

How do we tie IT Strategic Plan to Teaching & Learning Mission?

Can IT move beyond its traditional role to expand its presence in and support for the academic enterprise?

Marin Stanek – UC Boulder

New IT strategic plan – the first one to focus on the academic mission.

Evolving role of IT – from being the fixer to a focuser. Creating new systems and services. Evolving to listening to campus, leading to further evolution to competence. We have the capacity to understand multiple agendas, and focus on overarching mission.

Focus on students – analytics, retention, etc. A rising rhetoric. Chancellor goal – increase grad rate from 68% to 80% in four years.

Went from a strategic plan with 20-some chapters to one that has the meat in four pages – it’s all about students. Small changes turn into larger results. Utilized LMS to put content first for student welcome. Brought innovative classroom techniques to administrative purpose.

Retention: Large Lecture redesign. Packed lecture hall with mediocre technology experiences. Identified 30 gateway courses that are strong predictor of student success. IT redesign team is engaged. Look at analysis and data to enhance the learning experience and student engagement. E-Bio class – 20% of students take this class. Held a design thinking challenge to understand student behaviors. Discovered that the TA plays a pivotal role in student success. How quickly TAs responded to student questions was the critical issue.

Strategy on a Page / Strategy, It’s Personal – Tom Lewis & Phil Reid, University of Washington

Example: When things go sideways – initiatives get started with no clear goals or clear points of contact. End result – still planning for the plan after 1.5 years. (names scrubbed to protect the innocent).

Strategic goal – strategy on a page. A way to articulate value and for partners to understand and align. Three columns: Change drivers; Initiatives; Outcomes.

Ideas –

Supporting the Academic Enterprise in New Ways: Ben Maddox, NYU

The teaching & learning mission is rife with … opportunity

Case Study 1: all politics are local – learning analytics exploration:

context: Hosted university-wide event to gauge interest (standing room only); distributed instructional technology team; no learning analytics data steward; new leadership (president, provost, CIO)

Identified willing partner to build vocabulary around learning analytics that make sense to faculty; Developed working group and business case; built a site.

Challenge: learning analytics is a sprawling, undefined space. Sudden moves in the space freak people out. Local interests may not transfer to broader needs.

Merits: academic sponsorship; justification for dedicated FTE; credibility through local partnership; leverages standing governance structure to define broader needs.

Strategic Support for Education from IT at Duke – John Board

25% of all Duke students take assembly language-based intro to computer architecture. 40% of all students take intermediate programming (and over half are women). Falure to persuade many under-represented students to go further. Teaching very large classes of 220 a semester is not in the ethos of ECE and CompSci. The Modest Disagreement: Programming should fun to draw people into field, vs. programming classes should train people to be “real” programmers. Standard curriculum instills almost no practical systems knowledge. Faculty are looking to IT to help remedy this. Most of the knowledge of real computing is in IT! Can be used to improve skill set of students who are going to be in the field in the real world. IT developed courses for students to take extra-curricularly in developing code.

Advice: don’t have separate advisory groups for admin and academic IT – it’s all connected.

Strategic planning process: 25 faculty and even more staff from central and distributed IT units) populating 7 working groups: living and learning; research computing support; communications and infrastructure; IT security; administrative and business systems; support models, procurement and licensing; mobile and web

Many recommendations: help people use tech more effectively; prov; support innovation in research and education

Under innovation, relevant points: support the evolving computing needs of our researchers; improve Duke’s competency in data analytics;

Technology engagement center: Windowless telephone with bunker has been transformed into bank of 3d printers. Co-lab with app developers, creating APIs, video production operations; mini courses in many topics; hardware hacking (arduino, sensors, IoT); research computing – led to graduates who wanted to donate specifically to IT

What are the merits and challenges of integrated models, where IT partners with units that support instructional spaces, pedagogy, and assessment, to provide unified instructional support to campus?

Phil Reid: Why unified T&L support, and why IT?

Goal – promote and support innovation in teaching and learning

Barrier: faculty motivation to change (and you can’t blame them – incentives aren’t aligned)

Ideas to overcome barrier:

  • inspirational leaders in novel pedagogy
  • better student learning outcomes
  • improved efficiency
  • disruptive technology

Instructional systems are the “ERP” of teaching and learning

Improving the student experience

Improving the faculty experience

What faculty want is one stop shopping – pedagogy, technology, classrooms, assessment/measurement – they want the Genius Bar

Marin Stanek – How do we bring people together?

There are simple tools that seem like magic to campus. Eg. tap into IT project management discipline for transformative academic projects. Advantages: creates structure; sets expectations for timelines, resources and responsibilities of the partnering department; executive sponsorship help momentum, buy-in and hand-off of initiatives. The IT project portfolio now has a preponderance of initiatives for teaching and learning.

Example – Pathway to Space (a new minor in Aerospace, designed to pull in non-engineering majors). Utilized project portfolio process: project definitions/charter doc; schedule, budget, timeline; exec sponsorship, watch warning signs; change management process; communicate! transparency & updates; crossing the chasm – handing off the creating or build it into the team

Ben Maddox: Running the Governance Gauntlet

Context: university-wide service pilot for instructional tech support; added 10 new instructional technologists based at the schools (“a distributed model, centrally convened”); added instructional tech committee to standing governance structure; new role (joint to IT & Provost) convenes monthly meeting; group sets and recommends shared service model.

Challenge: requires increased coordination and strong sponsorship. For schools that were less resourced, there was Provost support, with management from central IT.

Deans had to write proposals to Provost to ask for the instructional support.

Jenn Stringer (Berkeley) – Academic Innovation Studio (AIS): A Collaborative Service Model

Faculty was getting “no, but” instead of “yes, and”

Space + Partners + Commitment + Trust = AIS (no unit names included). Open to every faculty, instructor, etc.

2k sq ft of space. 4 partners deliver service: research IT; Ed Tech Services; Center for Teaching & Learning; Library; Collaborative Services (google, box, etc).

Commitment is key – part was not branding as IT space. It’s faculty space. Everybody was at table to design space. f2f time – built trust.

Oren Sreebny – Central IT and the University Innovation Sector


Marin –

Challenge: No clear career path for research computing profesiionals

No formal educational track; reward system missing; lmited career path

Solution: Create MA in research computing and a formal collaboration between Research Computing & the Libraries. Develop and advance data science and digital scholarship through discovery & reuse

Certificate in Cybersecurity

Challenge: further develop Cybersecurity track utilizing existing interdisciplinary telecom program. Use existing grad school structure to minimize admin hurdles. Tap into existing courses to create certificate program.

Staff member was teaching a course at another university – there was no clear reward program for him to teach on campus. Story in unfolding, requires tenacity from professionals, but requires incentive structure, and need to happen at speed to keep momentum.

Ben – Supporting Teaching & Learning by TEaching

Consultations for teaching and learning with technology increased by 60%-plus. Center for Advancement of Teaching had no tech curriculum. New Inst. Tech Groups that had lots of instructional experience. Faculty Collaborators value team members with teaching experience. Appetite for Share.

Created online interactive tutorials for T&L Services. Center for Advancement of Teaching uses Instructional Tech Teams to new Tech-oriented curriculum; Provost agreed to sponsor 2 University-wide events per year. Made schools aware that staff were interested in teaching opportunities.

Evan – Duke – Technology classes at Co-Lab

Co-Lab is a technology innovation incubator to encourage students. Started with challenges, but weren’t as effective as they’d hoped. Flipped it around to ask for ideas first. Turned it into more of a grants program, but a persistent problem is that they didn’t have as many students with development skills as they thought. Roots program – teach Python, HTML, Web Development, etc. https://colab.duke.edu/roots – Taught by IT professionals. Faculty began to notice – told them that students were less technical than they used to be. Worked with faculty to develop an intro to Linux course that they use as an informal prerequisite. Going to do a git class for a Physics course.

Duke Digital Initiative – innovation funding for faculty. Over 20 proposals from faculty, funded 10 of them. Why IT? Who else knows how to program a drone, take 360 degree video, and put it on a web site?

A Day in the life of Rob Fatland, Cloud Czar – Tom Lewis

Cloud and Data Research Computing – originated out of UW E-Science institute. Out and about on campus every day, looking for researchers to help. Build – Test – Share

Success stories: ORCA Transit Data – patterns of how people commute. Digital curation at the library – LIDAR data. Genomics – cut cost per genome from $60 – $15 w/help from AWS. Democratizing data and software: cloud plus GitHub plus software carpentry workshops.

Supporting the continuum of research computing – Oren


Data for Researchers – Jenn

Providing learning data to researchers from learning records store. Data warehouse for the interactivity data from your learning systems. Things you mine to get information on student success. Berkeley has a billion records from 2.5 years of data from LMS. Researchers want to mine the data to get insights into how people learn. Most data governance organizations are not thinking about this kind of data at all. There are standards around this data – two competing: xAPI, Caliper.

Take log data and convert into standardized statements – pushing for vendors to hand data over in that format. Canvas doesn’t  (yet) so UCB has to convert.

Learning Record Store: AWS based Learning Record Store; Multi-tenant LRS that can support multiple institutions; Scalability and cost; Faster deployments – lower dev/ops overhead; Lambda architecture which encompasses both Batch and real-time interaction. Have an API for researchers who go through proper approval process to get de-identified data.

Are we telling students what we do with their data? They’ve created an agency dashboard for students (not in production yet). Allows students to opt-in or out of use of their data (where appropriate). Lots of discussion of data ownership, but regardless, they want transparency and agency.

UC Learning Data Privacy Principles: pulled together leaders from across the UC system. Working to draft principles. Something to point procurement and vendors to.

Learning Data Recommended Practices – been circulating them, taking to committees, etc to socialize and increase awareness.

John – Using infrastructure for faculty researh

There are faculty who want to use the infrastructure for research. NSF did us a favor with the first round of CCNIE proposals – thinking about SDN in particular. Insisted PI had to be the University CIO. Unexpected benefit was to have regular meetings on progress. Regular conversation on new opportunities for cyber infrastructure grants. IT staff get opportunities to have time bought out to work on interesting problems. Faculty develop respect for the expertise of IT. OIT thinking about hiring a full-time grant writer on the staff.

Cloud Billing Challnges

Bob Flynn, Indiana University

Microsoft Azure – the challenges. Plsses – Account management; Identity management; Networking; Security management; Incident Response.

Minus – Billing. Hvae to make a pre-commit for your enrollment ($100/month) Everything that happens at your campus later is on the same bill. Enrollment owner pays that. First user that burns the $1200 gets it for free (unless they figure out a way to rebill). Ongoing usage – Does central IT (or Procurement) have to do rebilling? How does the account holder track their usage? Azure marketplace purchases sent to enrollment admin, not the one using them. There are issues with research and education credits. The solution? Started with Resource Groups and tags. Limited to 15 tages per resource group, and not all Azure tools are resource group ready. Notifications come to subscription owner. Started looking at allowing users to have their own subscription. VNet Peering allows you to centrally manage the campus network connection. PO number added to subscription name. Bell Techlogix pulling PO # via API – they’re building a portal for account owners and set alerts at PO thresholds.

Nicole Rawleigh, Cornell University

Have 65 accounts under AWS billing. In August 2015 they manually billed four financial accounts. Sept 2016 billed 45 accounts for 65 AWS accounts. Separating internal CIT costs from external units. Switched to doing multiple financial system edocs created manually. One consolidated bill, but also can have multiple other bills/credits. Credits are applied manually to accounts. Going to automation! API between AWS and Kuali Financial. Batch job runs once a month. Outstanding Challenges: Invoice attachment (they use CloudCheckr so users can see invoice charges), making sure that resources are correctly tagged; one financial edoc per financial system account, not per AWS account; Batch error report is hard to deal with; automates consolidated bill only.

Erik Lundberg, University of Washington

Using DLT / Net+ for AWS. DLT provides a great biling front-end. Individual AWS accounts are associated with separate POs and they get invoiced and paid directly. People can create a blank PO on their university budget. Invoicing is all electronic and automatic (through Ariba). Next steps – get AWS Educate and research grants covered under the DLT contract.


Cloud Forum 2016 – Research In The Cloud

Daniel Fink from Cornell – Computational Ecology and Conservation using Microsoft Azure to draw insights from citizen science data.

Statistician by training. Citizen science and crowd sourced data.

Lab of Ornithology: Mission – to interpret and conserve the earth’s biological diversity through research, education, and citizen science focused on birds.

Why birds? They are living dinosaurs! > 10k species in all environments. Very adaptable and intelligent. Sensitive environmental indicators. Indian Vulture – 30 million in 1990, virtually extinct today. Most easily observed, counted, and studied of all widespread animal groups.

Ebird. Global bird monitoring project- citizen science for people to report what they see and interact with data. 300k people have participated, still experiencing huge growth.

Taking the observation data and turning it into scientific information. Undestanding distribution, abundance, and movements of organisms.

Data visualizations: http://ebird.org/content/ebird/occurrence/

Data – want to know every observation everywhere, with very fine geographic resolution. Computationally fill gaps in observations, and reduce noise and bias in data using models.

Species distribution modeling has become a big thing in ecology. Link populations and environment – learn where species are seen more often or not. Link ebird data with remote sensing (satellite) data. Machine learning can build models. Scaling to continental scale presents problems. Species can use completely different sets of habitats in different places, making it hard to assemble broad models.

SpatioTemopral Exploratory Model (STEM) – Divid (partition extent int regions, train & predict models within regions, then Recombine. Works well, but computationally expensive. On premise on species in North America, fit 10-30k models, 6k CPU hours, 420 hours wall clock (12 nodes, 144 CPUs). Can’t scale – also dealing with growing number of observations in Ebird – 30% /year. Also moving to larger spatial extents.

Cloud requirements: on-demand: reliably provision resources. Open Source software: Linux, hadoop, R. Sufficient CPU & RAM to reduce wall clock time. System that can scale in the future. Started shifting workload about 1.5 years ago. Using Map Reduce and Spark has been key, but isn’t a typical research computation tool.

In Azure: Using HD Insight  and Microsoft R Server – 5k CPU hours, 3 hours wall clock.

Applications – Where populations are, When they are there, What habitats are they in?

Participated in State of North America’s Birds 2016 study. Magnolia Warbler – wanted to summarize abundance in different seasons. Entire population concentrates in a single area in Central America in the winter that is a tenth the size of the breeding environment – poses a risk factor. Then looked to see if the same is true of 21 other species. Still see immense concentration in forested areas of Central America – Yucatan, Guatemala, Honduras. First time there is information to quantify risk assessment. Looking at assessing for climate change and land use.

50 species of water birds using the Pacific Flyway. Concentration point in the California Central Valley, which has had a huge amount of wetlands historically, but now there’s less than 5% of what there was. BirdReturns – Nature Conservancy project for Dynamic Land Conservation. Worked with rice growers in Sacramento River Valley to determine what time of year will be most critical for those populations. The limit is water cover on the land. There’s an opportunity to ask farmers to add water to their patties a little earlier in the spring and later in the fall, through cash incentives. Rice farmers submit bids, TNS selects bids based on abundance estimates (most birds per habitat per dollar). Thy’ve put 36k additional acres of habitat since 2014.

Quantifying habitat uses. Populations use different habitats in different seasons. Seeing a comprehensive picture of that is new and very interesting. Surprising observation of a wood thrushes using cities as a habitat during fall migrations. Is it a fluke caused  by observation bias? Is it common across multiple species?

Compare habitat use of resident species vs. migratory species. Looked at 20 neotropical migrants, and 10 resident species. Found residents have pretty consistent habitat use, but migrants seasonal differences, showing a real association with urban areas on the fall. Two interpretations: 1) cities might contribute important refuges for migrant species or, 2) cities are attracting species but are ecological traps without enough resources. Collaborators are setting up studies to see. Hypothesis that they are attracted to lights.

Heath Pardoe from NYU School of Medicine – Cloud-based neuroimaging data analysis in epilepsy using Amazon Web Services.

Comprehensive Epilepsy Center at NYU is a tertiary center, after local physician and local hospital. Epilepsy is the primary seizure disorder (defined by having two unprovoked seizures in their lifetime). Many different causes and outcomes. Figuring out the cause is a primary goal. There are medications and therapies. Only known cure is surgery, removing a part of the brain. MRI plays a very big role in pre-surgical assessment. Ketogenic diet is quite effective in reducing seizures in children. Implanting electrodes can be effective, zapping when a seizure is likely to control brain activity. Research ongoing on use of medical marijuana to treat seizures. Medication works well in 2/3 of people, but 1/3 will continue to have seizures. First step is to take a MRI scan and find the lesions.  Radiologists evaluate MRI scans to identify lesions.

Human Epilepsy Project – study people as soon as they’re diagnosed with epilepsy to develop biomarkers for epilepsy outcomes, progression, and treatment response. Tracking patients 3-7 years after diagnosis. Image at diagnosis and three years. Maintain a daily seizure diary on iOS device.Take genomics and detailed clinical phenotyping. 27 epilepsy centers across US, Canada, Australia, and Europe.

Analyzing images to detect brain changes over time. Parallel processing of MRI scans. Using StarCluster to create a cluster of EC2 machines (from 20-200) (load balances and manages nodes and turns them off when not used). Occasionally utilize compute optimized EC2 instances for computationally demanding tasks. Recently developed an MRI-based predictive modeling system using OpenCPU and R.

Have a local server in office running x2go server that people connect to from workstations. From that server upload to EC2 cluster.  More than 10 million data points in a MRI scan. Cortical Surface Modelling delineates different types of brain matter. Then you can measure properties to discriminate changes. To compare different patients you need to normalize, by computationally inflating brains like a balloon – called coregistration.

There are more advanced types of imaging.

Some studies done with these techniques: Using MRI to predict postsurgical memory changes.  Brain structure changes with antiepileptic medication use.

Work going on – image analysis as web service: age prediction using MRI. Your brain shrinks as you age. If there’s a big difference between your neurologic age and your chronological age, that can be indicative of poor brain health.

Difficulty of reproducing results is an issue in this field. Usually developed models sit on grad student’s computer never to be run again. Heath developed a web service running on EC2 that can be called to run model consistently.

Cloud Forum 2016 – Cornell’s BI move to the cloud

Jeff Christen – Cornell

Source Systems – PeopleSoft, Kuali, WOrkday, Longview. Dimensional data marts: finance, student, contributor relations, research admin. BI Tools – OBIEE and Tableau

They do data replication and staging of data for the warehouses. Nightly eplication to stage -> ETL -> Data Marts

Why replication/stage? Consistent view of data for ETL processing, protects production source systems; tuning for ETL performance.

Started journey to cloud 2 years ago. Were using Oracle streams – high maintenance, but met some needs. Oracle purchased a more robust tool and de-supported Streams. ETL tools challenge – were using Cognos Data Manager for 90% of their work, but IBM didn’t continue to support it. Replaced it with WhereScape RED, but requires rewriting jobs.  Apps were already moving off-premise. WorkDay for HR/Payroll, PeopleSoft to AT&T hosting; Kuali financials moving to AWS. Launched pilot project to answer “what would it take to run data warehouse environment in AWS?”

Small pilot – Kuali warehouse in AWS. Which existing tools will work? Desire to use AWS services such as RDS where possible; Testing of both user query performance and ETL performance.

Why Oracle RDS and not Redshift? Approximately 80% of the Kuali DW is operational reporting. Needs fine-grained security at the database level; A lot of PL/SQL in the current environment; Currently exploring Redshift for non-sensitive high volume data

Some re-architecting: Oracle Streams not supported with Oracle RDS (used Attunity). Oracle Enterprise Manager scheduler not supported with Oracle RDS – using Jenkins (so beautiful and simple); No access to OS on RDS databases – installed Data Manager on separate Linux EC2 instance; Using WhereScape to call Data Manager from the RDS database.

Need to be more efficient. On premise the KDW had two physical servers. Found some inefficiencies in ETL code and some dashboard queries were masked by large servers. Prioritization of ETL code conversion by long running areas helped get AWS within nightly batch window. Some updates made to dashboards to improve performance or offer better filter options. Hired database tuning consultant (2wk) to help with Oracle tuning.

Testing and User Perception. Started with internal unit testing. Internal query execution time comparisons between on premise and AWS. User testing of dashboards on premise versus AWS. Repoint of production OBIEE financial dashboards to AWS for a day (x3). Some queries came back faster, some slower. Went through optimization and tuning to get it comparable across the board.

Cutover to AWS. Cutover Sept. 8. Redirected all non-OBIEE ODBC client traffic in October. Agreed to keep the on premise KDW loading in parallel for two month end closings as a fall back.

Next Steps. Parallel Research Admin Mart already in AWS – expect cutover by end of CY. Need more progress on ETL conversion before moving student and contributor marts. Continue Big Data / non-traditional data investigation (Cloudera on AWS). Redshift for large non-sensitive data sets.

Lessons learned: Off premise hosting does not equal Cloud technology. Often hard to get data out of SaaS apps.