Higher Ed Cloud Forum: When can a computer improve your social skills?

Ehsan Hoque (University of Rochester)

Behavior mining -> Applications -> Deployment

Automated Prediction of Interview Performance -> My Automated Conversation Coach (MACH) -> ROCSpeak.com

MACH – My Automate Conversation coacH — originated from people with Asperger’s wanting help developing conversational skills.

Originally a research application, got a grant from Azure to develop a cloud version. As people use the framework, the data gets fed back into the model, which improves the performance.

At the end, it’s not the specific cloud functionality but the interaction with the people at the vendor that makes things work.

Advertisements

Cloud Compute Services Expansion – Lessons Leaned

Mark Personett – University of Michigan

A project to: Enable all three campuses and Michigan to access cloud infrastructure with AWS, Azure and Google

Enterprise agreement, shortcake billing, training, consulting, preconfigured security/network settings, Shibboleth integration, reporting. What it’s not: cloud strategy, governance, or operations.

Lessons learned:

BAA doesn’t cover every service. BAA is just a legal document. Account and billing differences.

AWS at U-M: BAA separate from EA and have to do a separate process to add units to the BAA. Single-sign-on is not as integrated. No inherent hierarchy.

GCP: billing accounts and “projects” separate concepts. Billing sub-accounts. GCP is API and API is GCP. API explorer is extremely helpful in writing API calls.

Azure: Resource groups vs subscription not always clear (finding that they need to do subscriptions for each resource group in the general case). Office 365 challenges – if alumni get synced to your Azure AD they get rolled into your instance and under your terms. VPN – they have levels of VPNs – if you breach the bandwidth it resets your tunnel with no warning.

 

Higher Ed Cloud Forum: Beyond the Architecture — Rethinking Responsibilities

Glenn Blackler (UC Santa Cruz)

Cloud-First! Now What…?

Santa Cruz’s approach – hw infrastructure was going to turn into a pumpkin in sprint 2018. “Screw it – we’re all in, let’s jump.”

What’s our approach? How can existing teams support this change? Program work vs. migration specific work. Our focus – enterprise applications.

Defining the program: Plan for a quick win (build confidence, get familiar, identify training needs). Go big – went from a small PHP app to identity management infrastructure. All in! — moved Peoplesoft and Banner. Run concurrent migrations.

But really. … why? Need to continually talk to customers about why they’re doing it. Benefits of cloud migration aren’t apparent – have to sell it. The pitch: elasticity, DR/BR, Accommodation (additional test environments); modernized tools and team structures; sustainability.

Teams – Separation of duties – now have separation between sysadmins and app admins and developers. Always been a handoff, ticket driven organization. Don’t know what org looks like in new world – took really smart people and threw them in a room and told them to figure it out. Core team includes App and Sys admins, plus less frequent contributions from security, DBA, networking, devs.

Looking at Cloud Engineering Team that incorporates OS Setup/Config/App Config/Maintenance. DBA team still a bit separate. Security contributing across the board, but not necessarily hands on all the time. Teams are learning new things about each other that they didn’t know in the ticket-driven world.

Future – shared responsibilities mean fewer handoffs; engineers with wider breadth of skills; improved cross-team collaboration through shared code base; continuous improvement through evolving technical design and available services; adjusted job titles and responsibilities; ITS reorganization; budget impact, review of recharge model.

New ways of collaborating: Sys and App admins using a single git repository for code. Shared tools/technologies, password management; cross-functional tier 1 support;

Lessons learned – don’t lock decisions down too early, use governance to end debates, identify project goals that foster exploration (within timeline), use consultants carefully. Traditional PM will not work, push boundaries of what is possible, required vs. ideal – compromise is important; don’t compare with mature on-premise architecture; be prepared for rumors;

Not everyone is on the bus – what about those who don’t want to get on?

Higher Ed Cloud Forum – Lightning Round #1

Phil Robinson – Cloud Progress at Cornell Student Services IT

First AWS account – July 2015 – adopted a cloud first strategy. Now have about 30 apps on AWS (migrations, rewrites, new apps). Automate with Jenkins and Ansible. Retiring on-prem VMs.

Custom class-roster app, used by students to decide what to take. Added central syllabi feature this year. Using SNS+SQS as message bus, orchestrating events; CloudFront delivery for syllabi; On fly ClamAV scans on upload; ElasticSearch for searching; SES for notifications by email. Developed in 3632 hours.

Looking towards containerizing and VDI.

Gerard Schockley – BU iPaas RDS AWS

IPaaS ODS in RDS – integration service designed to integrate many data feeds into SnapLogic platform. Operational Data Store. Using AWS Aurora.

Bob Winding – Cloud Automation Journey

Most fully automated in GovCloud project. CloudFormation (VPCs, IAM, Security Groups, Centralized alerts); ANsible and CloudFormation for server builds; Consol federation with ADFS; Consistent process for all project accounts; new project account in a couple of hours; decentralized maintenance of CF Templates.

Penn –

What does “cloud native” mean at Penn?

Case study 1 – online giving portal: Data ETL (Talent); to Postgres RDS (fundraising metadata); S3 / Cloudfront; to Oracle on prem. Near real-time

Case study 2: Service ordering (VDI and Backup requests). On prep powershell makes changes in AD groups, sends messages through SQS

Case study 3 – Device registration. On prep registration; does API keys in Lambda

Sara Jeanes – Considerations in moving HPC workloads to the cloud

Initial framing questions: Do they have a preference for which cloud provider (do they have credits, different tech); Is there a multi-cloud resiliency need?

Workload questions: Can it be interrupted (use spot instances), large workloads firewall considerations (ScienceDMZ);

Jeff Minelli – Penn State – CloudCheckr enabling transparency at Penn State

Gain insights into financial transparency, spend optimization, resource utilization and right-sizing, cost allocation, best practices, security & compliance, collection and unification of AWS API data, continuous monitoring, reporting and alerts

Working with CloudCheckr to enable SAML. Basic group email notifications. Configuration of $100 spending alerts.

Trying to get CloudCheckr into InCommon.

Network Firewall Policies for Hybrid Cloud – Brian Jemes – University of Idaho

In cloud managing firewalls with server tags. Gets complicated when managing across on-prem and cloud. On prep have Cisco tools to manage ASA firewalls.

Options: manage hybrid cloud policy in on-prem firewall; manage hybrid policies with traditional firewalls in cloud; develop a hybrid tool.

Looking at a startup called Bracket Computing – cloud firewall policy manager. brkt.com – Provides micro-segmentation.

John Bailey – Washington University (St. Louis). Cloud IAM

Balance between security and usability. Enhncing usability with SPNEGO integrated auth. leverages kerberos token from machine login to perform a web SSO login, making the web login invisible to the customer.

Lou Tiseo – how categorizing resources help to understand cloud usage

Requiring seven different tags. Using Cloudyn management dashboard. Helped save costs by using reserved instances.

Chris Malek Caltech – Automation tools for AWS ECS and Batch

deployfish – configure almost all aspects of an ECS services (load balancing, app autoscaling, volumes, environment, etc). They’ve open sourced it. Create, inspect, scale, update, destroy and restart ECS services with single commands; manage multiple environments (test, qa, prod, etc). Integrates directly with terraform.  YAML driven

batchbeagle — allowing people to manage AWS Batch. Create, update, disable, and destroy queues. Create, update, disable, and destroy compute environments. Create job descriptions. Submit and manage jobs, etc.

Amanda Tan – Washington

Enabling cost notifications on AWS. Cost monitoring is difficult – should be zero effort. Two prong attack: auto-tag resources, send email notification with total spend and resource usage daily. Cloud Formation Template sets up Cloudwatch which invokes auto tag lambda function. AutoTag tags resources with owner and principal-id. Notification works off DLT billing records, provided in S3 buckets twice a day.

 

 

Self Service at Yale

Rob Starr, Jose Andrade, Louis Tiseo (Yale University)

Community told them needed to be able to spin machines up and down at will for classes, etc. Started with a big local open stack environment, now building it out at AWS.

Wanted to deliver agility, automate and simplify provisioning, shared resources, and support structures, and reduce on-premises data centers (one data center by July 2018).

Users can self-service request servers, etc. Spinup – CAS integration, patched regularly, AD, DNS, Networking, Approved security, custom images.

Self-service platform – current manual process takes (maybe) 5 days. With Self-Service, it takes 10 minutes. Offering: Compute, Storage, Databases, Platforms, DNS

All created in the same AWS account. All servers have private IP addresses.

ElasticSearch is the source of truth.

Users don’t get access to the AWS console, but can log into the machines.

Built initial iteration in 3 months with 3 people. Took about a year to build out the microservices environment with 3-4 people. Built on PHP Laravel.

Have a TryIt environment that’s free, with limits.

Have spun up 1854 services since starting, average life of server is 64 days.

Higher Ed Cloud Forum 2017 – Intro and Multi Account AWS Strategy

Survey Results

46 institutions attending, 4 vendors, 81 unique roles among 90 attendees.

40% cloud first, 12% have a documented cloud exit strategy.

82% AWS, 14% Azure, 4% Google, 2% other

Staff readiness is the #1 obstacle to broad adoption

42% have signed the I2 Net+ agreement, 11% have enterprise agreement with cloud provider

21% have containers/serverless in production, 9% non-prod, 70% not currently adopting.

Managing and Automating a Multi-Account Strategy in AWS: Brett Bendickenson (Arizona)

Have their own agreement with AWS. Currently have about a 75 accounts in their consolidated billing. 24 accounts in central IT.

UITS Cloud Advisory Team — cross functional group from within UITS to advise and decide on cloud practices and policies.

  • Tagging Policy – extremely important to get right up front. Service, name, environment, created by, contactnetid, accountnumber, sub account

Multi-account strategy. Workloads segregated into production and non-prod accounts. Tipping point was properly restricting everything by permissions – can do it with IAM roles, but it’s a lot of work. Decided on further segregation by teams / technologies, e.g. Kuali, PeopleSoft, IAM. Each has prod and non-prod accounts.

Each account has an account steward (director or dept. head) — responsible for spend, security, etc. Each account has an email list, with the address used for the root login address. Password stored in common vault, secured with MFA hardware token (kept in Ops). Linked to a central billing account. Set of account foundation templates are deployed. Started using AWS Organizations.

Account foundation modeled after the AWS NIST 800-53 Quickstrart CloudFormation Template. Set of CloudFormation templates which deploy roles, security controls, etc. Sets up an EC2 instance that runs a set of Ansible playbooks that set up Shib, bas AWS info, IAM, Logging, Lambda.

Federated Roles – SysAdmin, IAMAdmin, InstanceOps, ReadOnly, BillingPurchasing. Using Grouper for authorizations.

Using federated identities, no IAM users (generally).

CloudTrail enabled in all accounts. Enabled for all regions, records all API calls, sent to a central S3 Bucket in root account. CloudTrail logs also saved to CloudWatch logs in account for local reference.

Alarms set for changes in Network ACL, Security Group changes, Root Account activity, unauthorized access, IAM Policy changes, access key creation, cloud trail changes. (not all used in non-prod)

Lambda Functions – Alarm details (interrogates cloud trail events and sends actual API calls that raised the alarm); CreatedBy automated tagging for EC2 instances; OpsWorks tagging helper; OpsWorks tagging helper; Route53 helper (updates DNS); Tag monitoring – checks tags on instance launch (looking at Cloud Custodian from CapitalOne (open source)); AMI lookup

Arizona’s code: https://bitbucket.org/ua-ecs/service-catalog

Campus Safety and Security, pt. 2

UVa events – Marge Sidebottom and Virginia Evans (UVA)

How do we determine where high risk areas are on any given day, and are they located in the right places for the controversy that might accompany any given guest speaker? Beginning to populate a system to record those. Look at controversial speakers, as well as protests. The lone wolf terrorist is the other common concern  – may find information that helps to plan better. Expect threat assessment team to look at issues within their own areas, and mitigate those – if they can’t then it escalates to the threat assessment team, which meets weekly.

Aug 11 & 12 – protest by white supremacists and neo-nazis. There were lots of advanced preparations by the city and the campus. This culminated a series of events over the previous months in different parks. Several hundred showed up at UVa on Friday night with lit torches and surrounded a small number of students. Violence broke out, but police dispersed activity. By late morning Saturday there were thousands in a small area of downtown Charlottesville, including heavily armed alt-right protesters. Then the car ramming event happened, and then the police helicopter crashed.

The University had begun planning three week prior to the event. Had 2 meetings a week of the emergency incident management team, and the President held daily meetings. There is a city/county/university EOC structure. The city decided to have their EOC in a different location, which compromised communications. University teams went on 12 hour shifts beginning Friday morning.

When protesters moved on campus, the events developed very rapidly. It became clear that they were not following the plan they had committed to.

Having the EOC stood up was very useful. Had the University’s emergency management team in a separate room, so they could be briefed regularly. At 11:50 on Saturday, cancelled activities on campus starting at noon to not have venues that presented opportunities for confrontations. Worked carefully with a long-planned wedding at the chapel, but it did take place. They were unaware of admissions tours that were going on – once they found out, rallied faculty to accompany student guides and families, and then ended tours early.

Taking care of the needs for mental health attention for participants is important.

John DiFava (MIT Chief of Police)

MIT culture – it can’t happen here, and it won’t happen here. Also the culture of the city of Cambridge is very open and loose. Campus police used to be focused on friendly service and would call in external agencies when in need. Times have changed – policing on campus is just as complex and demanding as any other type of policing. Universities are no longer isolated.

Columbine massacre had a tremendous impact – Officers followed procedure to establish perimeter and wait for tactical units to arrive. Now they are taught to make entry.

9/11 attacks had a significant impact on policing. MIT police lost all of their officers to other jurisdictions immediately. Interagency cooperation was was inadequate. Created a cascading effect – the cavalry was out of town, so had to rely on local resources.

New reality  – had to be able to function wihtout assistance; aid would not arrive as quickly and in the quantity it once did.

Steps taken to improve capability and performance – a comprehensive approach: Recruitment process, promotional system, supervision, training improvements – do in-service training with Cambridge Police and Harvard; firearms requalification three times a year (twice during the day, once in low light); specialized training for every officer; active shooter training (with Cambridge PD, Harvard, and MSP).

Work with Institute entities – Emergency management reports to Police.

Emergency Communication: Interface Between Public Safety and IT
Andy Birchfield , Jeff McDole, Andy Palms: University of Michigan

Certain emergency phases – Pre-incident planning, inbound emergency notification, emergency assessment, emergency alert operation, emergency notification delivery. The value to the community of notifications is based on total time of all phases.

Pre-incident planning: Activities include: message templates; policy and procedure; establish expectations and know your community; analysis of delivery modes with recognition of delivery times for each mode; evaluation of lessons learned; training and exercises; prepare infrastructure.

Inbound emergency notification: Making it simple, do it like they do it every day (students choose their cell phones over using the emergency blue phones); Get as much information as possible: video, audio (phone), text; Enable people to contact us in the ways they know — social media, apps, etc; coverage and capacity; knowing where the person is.

Emergency Assessment: Issues include confirmation, authorization, timeliness. If you can get a message out in 8-10 minutes of an incident, you’re doing well.

Emergency alert operation: additional modes and desired content will delay message creation; decisions and effort slow the operator; Hick’s Law: the time it takes for a person to make a decision as a result of the possible choices they have: increasing the number of choices will increase the decision time logarithmically.

Emergency notification delivery: Speed is the priority. Issues: Get people to sign up for the right service(s) – there is not a single mode; infrastructure coverage. They can get delivery to every email inbox in Ann Arbor (~105k) in about 7 minutes, but email is not the only mode. They have apps with push notifications – time of delivery is right around 10 seconds. The future is focused messages to appropriate recipients, by topic, or location, by individual choice.

Emergency Notification Systems: Ludwig Gantner, Andrew Marinik, VA Tech

VTAlerts – designed with redundancy. Goal is that every member of the community will be notified by at least one channel. Originally built in-house, but now a complex hybrid environment with some local and some vendor channels in the cloud.

New beginning – recognition of prioritized support for public safety. Group within IT expanded to include more channels: VT Alerts, blue light telephones, next generation 911, security camera system. Having one group responsible gives one point of contact in IT for public safety officials. Having dedicated staff allows for much better response times. They’ve removed dependencies on single individuals.

Communication – notification and collaboration – use the ticketing system.

Sustainable support – important to be proactive rather than reactive in public safety. New monitoring capabilities, improved redundancy, long term planning, channel development.

Collaboration – IT recognizes technical needs; public safety prioritizes items.

ENS philosophy: What is happening, where is it happening, what do we want you to do about it? They have 21 templates.

Current Challenges: How do we institutionalize the process to avoid backsliding when people change? What are appropriate success metrics for system evaluation? What are the cyber-security concerns of the components and system as a whole?

Evolving Radio Technologies – Glenn Rodrigues, UC Boulder

LMR (land mobile radio) project at CU Boulder. Business problem: lack of ability to communicate between Public Safety officials and leadership during planned events and unplanned incidents; Officers don’t feel safe doing their job without proper communications. Plan of action: Complete LMR audit for University; short term fixes; long term fixes.

Audit: requirements – contractor had to be vendor neutral; LMR customer interview and use case mapping; technical recommendations backed with data. Output: Clients – CUPD + 9 other business units. Biggest problem was coverage inside buildings (and system overloads). Tech assessment: most equipment was over 10 years old and malfunctioning, no real resource dedicated to monitor and engage with customers, most portable radios were not optimal. Business assessment: lack of policy enforcement (internal and external); lack of visibility of individual unit needs; lack of engagement with business partners. Plans: stabilize current LMR system under limited budget in 3 months by replacing high risk or failed equipment, leverage existing University assets (monitoring, backup power). Longer term: want to patch LMR into the campus fiber backbone. RFI in process.

John Board – Duke

Had the opportunity to green field a managed, networked, camera system. Lawyers were concerned about lack of standardization and maintenance of existing cameras. Started with parking decks. Goal was evidentiary, not live surveillance. Budgeted cost actually included maintenance, ongoing verification and network and storage costs. All cameras installed and operationally verified by OIT. Cisco VSOM, decent API. 1024 cameras in operation now.

The institution is zealous about privacy. They have a policy about access to live and stored images; have a retention policy; there’s a committee that decides where cameras go (you can’t put up cameras up outside the system). Challenge around need vs demand.

Wanting to do automated image analysis to verify that cameras are working, e.g. deviation of sample image vs reference image. EE faculty proposed writing an algorithm for this. After some experimentation came to an algorithm that filters ~80% of good cameras, while reliably identifying 100% of bad cameras. By using 3-day averages, safely filters 95% of good cameras – declaring victory!

Va Tech – Crowd Monitoring and Management on Game Day – Major Mac Babb

Stadium holds 66k people. Originally built in 1965. Hokie Village across the street, 20 parking lots, most of which are licensed for alcohol.

Unified Command off 7th floor of stadium. 160 Officers, Office of Emergency Management, Communications / Dispatch, Rescue, Fire, Game Operations, Event Staff/Security/ADA Services (545 event personnel), Parking, and Stadium Facilities and Grounds Ops.

Technology Assistance – CAD terminals and radio dispatch. See same screens as regional center. Access to around 400 cameras around campus. Weather systems fed into ops center, Veoci incident management program, Athletics comms channels, social media, emergency notification system. Supported by security center at public safety building.

Team Tops Technology – University of Washington’s Approach to Crisis Commnunications – Andy Ward

Seattle Crisis Communication Team – News & Information, Police, Marketing, IT, Emergency Management, Housing & Food Services, UW Medical Center.

Roles – Initiator, Incident Commander (for communications), Communicator, Monitors

Crisis communications toolkit – UW Alert Blog (wordpress.com) — can send messages to banners on the UW home page and to the hot-line telephone. UW Alert (e2campus) sends text and email messages. UW Alert facebook and twitter channels. There’s an outdoor alert system (talkaphone) and an indoor alert system (PA capabilities on fire alarm system (problem is they have to send to all buildings at once)). Plan to use Red Cross Safe & Well system to account for people.

97% of time crisis communication team  is activated by campus police — 20 some people, calling into a conference bridge. Initiator briefs team, primarily incident commander who decides what action to take. Person who initiates the call should be ready to send out the first message. Decide which tool(s) to use to send alert, and then team stays on the bridge after the message is sent.

Police are not the incident commanders for communications.

When incident is over, they send out an all-clear message.

IT’s role during an incident: Monitor technology performance; Troubleshoot immediately; Provide technical expertise; Provide depth to the team.

Police have ability to send messages if immediacy is needed.

Subteams from all 3 campuses meet and recommend policies.