Oren’s Blog

CSG Spring 2018: Student success and student information strategy

Lech Maj (NYU)

Becky Joffrey (Cornell) –

Student success – the ability to have a 360 degree of the student lifecycle

Survey data:

How involved is IT team with student success initiatives – 66 (out of 100)

How involved is IT team with learning analytics: 70 (out of 100)

NYU – Bernie Savarese (Assistant VP for Student Success)

Reports through Enrollment Management (which includes Peoplesoft, Fiscal aspects, and Recruitment)

Student Success is everyone’s business!

Why? Position & Rank; Financial Stability; Perceived Value & Alumni Engagement; Deliver on Promises

First year retention and Graduation rate make up almost 1/3 of US News ranking. Found NYU was lagging peers in those measures.

Financial Loss – 1st year attrition Net loss $16 million

Goals: FIrst year retention – 96% by 2020, Six year graduation 90% by 2026: require keeping 50 extra students per year.

Student Success Steering Committee, with a Technology Task Force.

Guiding Principles: Use technology to drive relationships; Make a big complex place feel small: find and support students who need us most; identify and remove unintended barriers; find and surface evidence for continuous improvement.

Chose Starfish as a technology platform – launching in Fall 2018. Why a platform? See the whole student, aggregate critical information and systems, coordinate care, simplify resource referral, identify leading vs lagging indicators, deliver on promises.

Platform goals for Year One: All undergraduates in all schools; raise flags and alerts; predictive analytics and risk scores; appointment scheduling; shared notes; include student affairs/services

Need to be able to close loops – e.g. if a faculty member raises a concern about a student, they need to know what happened.

Students have the ability to see what’s going on in a dashboard.

Becky Joffrey – Cornell’s Student Engagement Ecosystem

Current State: Role based infrastructure (point solutions with a single role (club member, dorm resident, job seeker) in each system). Data lives withn each transactional application. Spend a lot of time marrying data from systems, but still struggle.

Desired state: Person-based infrastructure – many roles that change over time, data moves and grows with a person; invest in understanding constituents. Duh – CRM.

Ended up with 23 Salesforce orgs – that’s no way to implement CRM. Caused Provost to start project to think about student experience globally. Move from departmental intent to institutional intent.

Led to strong steering structure and governance. Provost funded, steering committee led by a dean and the Vice Provost of academic affairs, experience working group, analytics working group.

Initiative: Modernize technology to support student experience; focus on student services advising student activities and analytics. Goal is to connect all parts of the student experience. Audiences include students, the people at Cornell who support them.

experience.cornell.edu – Discovery website for student opportunities. When you click to apply for an experience, it takes you into Salesforce. A rich dashboard experience for the student that integrates and orchestrates all the different experiences. Advisors also have a dashboard. Other web sites can use the information to display filtered views of opportunities.

Putting finding tools in Drupal sites (Opportunities, Resources, Clubs, Events, People), and “Doing Things” in Salesforce (transactions)

The benefit is data: You can see who is doing X, but more importantly you can see who is NOT doing X; Data is collected via natural points of engagement vs. surveys and notes; Data benefits the entire institution, not just individual unit. Prioritize apps that will glean the richest sets of data.

Tips: Find a point of gravity that brings campus together; start with users’ problems; identify urgent need;  build horizontal solution, not a vertical one; consider breadth of tools available and how they integrate; create and extendable architecture

UC Berkeley – Oliver Heyer – BOAC and the Data Loch

Early Work: CalCentral – collects data from a variety of sources to allow students to do the work they need to do. Became student front-end to new Peoplesoft SIS.

Cloud LRS (learning record store) in AWS. Pulls in feeds of data for storage and analysis. Built LTI tool (student privacy dashboard) on caliper data from Canvas. Not in production yet.

Athletic Study Center didn’t have a view into Canvas data. Putting advisors into Canvas as observers didn’t provide a manageable way to provide information.

Berkeley is framing student success around issues of diversity and inclusion.

BOAC/Data Loch solves some important problems: Canvas Big Data > UCB Data Lake > Learning Analytics; Custom cohorts; An early warning system; Stability, security, and scalability. Have about 900 students being used by 40 advisers. Storing cached data from APIs in AWS (live use of local APIs didn’t scale)

Goals: Bird’s eye view on learning and other data emerging from varied sources; Data Collection layer; Data processing layer (redshift, spark, athena); Deliver insights as a service (description, predictive, prescriptive)

Using AWS Glue for ETL into data catalog, which can be queried by Redshift Spectrum and then tables extracted into Redshift.

Largest dataset – canvas event logs. 650 GB, ~4.5 billion requests records. Compressed into Parquet. Did a table scan with Redshift – cost $1.25 and took 5 minutes.

Next steps: Expansion of advising to College of Engineering in fall 2018. How to tell story to faculty? EDW collaboration – could move into data lake. Where does application live? Is it yet another place to go? Implications for campus data and cloud platform strategy in general?

MyUW: Supporting the Student Lifecycle – Jim Phelps (Washington)

Used User Center Design including a student diary where they asked students to collect their information needs.

Findings: information overload; critical information hard to find; time management is difficult; information needs are dynamic, but predictable

Design goals: Personal, critical, curated, relevant, timely

Arrive at: actionable and personalized content on “cards”

Show students what they most need based on time in the quarter. e.g. where are my classes in weeks 1 and 2, when can I register in week 5, when are my finals in week 8.

Aggregates data from multiple sources.

Understanding the student lifecycle experience – transitioning to UW, exploring majors, in major, transitioning to profession

Understanding the co-curricular experience – Present interest, social catalyst, internalized motivation, major blocker?, information seeking, participation in co-curricular. How do we build a social component to help students connect?

Husky experience toolkit – tailored messages.

Assessment – continue to assess effectiveness and usability if MyUW and Husky Experience Toolkit. Surveys, log analysis, “guerilla” user studies. Feeds back into user-centered design process.

UC San Diego – Harnessing the power of analytics for student success – Amin Qazi and Christopher Rice

What is a student? Matriculated; non-matriculated; extension; undergraduate/graduate/professional; how long is one considered a student?

What constitutes success? better grades; improved time-to-graduation; retention; quality of experience; getting needed courses; improved job performance or advancement; personal satisfaction

What do students want? High quality degrees that are more career oriented and time compressed; sequential programs linking graduate offerings; dual graduate programs; online degrees from reputable institutions; stackable progression models

Building advanced data capability to: prepare the university for the application of AI and Machine Learning; guide reallocation of scarce resources in data driven ways; harness automation; empower units to harness the power of analytics

Overview of next generation data warehouse – layered architecture. Core data in middle, applications feed data in and out, connected by APIs.

Platform predictive capabilities (builit on SAP Hana) – also working with Google and Amazon.

Bringing all information from university into a single data warehouse. Activity hubs (employee, student, academic activity, facilities, financial activity, advancement & alumni activity). Working on student activity hub first. Real-time data, personalized messaging and interactions, complete data integration, next-generation data science. Three classes of analytics: institutional, academic, learning.

Curated view of data, de-identified – demographics, enrollment, majors/minors, retention, student statistics per term, etc. Reports are generated by people in business units, delivered by Tableau or Cognos or API. Multiple levels of security. 10 years of data from SIS, seven years of data from LMS, rolling out now.

Goal is to have four activity hubs up by end of year, and sunset the enterprise data warehouse in two years.

Retrospective and Predictive analytics for student success.

Architecture is not enough. You must build a culture around analytics: Communities of practice; data governance committee; missionary work; easy to use platforms & tools; pushing analytics to the edges, away from ITS.

Strategic Academic Program Development (use data to build RFPs) – meritocracy of ideas, reach across campus, experiment (fail, learn, repeat), focus on what is best for the student?

 

 

 

CSG Spring 2018 – Strategically reallocating/restructuring IT resources

Paul Erickson (Nebraska, Lincoln)

Stage Set

Jim Phelps (Washington)

Laundry list of technologies: hyper personalization; AI and ML; IoT; Autonomous systems & robotics; everything on every device, everywhere; virtual and augmented reality; Big data, data driven organizations; hyper connected world

Cultural impacts of digital transformation – will transform all aspects of work and society in the same way that the industrial revolution and electrification transformed work/society; DX will lead to whole new technologies never seen before; There will be whole new classes of jobs, skills, and competencies; This is a period of great disruption; We cannot predict the winners and losers, either technology nor business; we need to be adaptable; we need resources to invest and investigate during this time of change; higher education will be disrupted too; the winners in higher education may not exist yet.

Change Management will be a critical competency. Example of Nordstrom adapting their culture of perfection to an agile IT process.

Are we a commodity provider or a business transformation partner? If it’s a commodity we can expect budgets to keep shrinking and visibility to go lower. If we are business transformation partner it implies whole new sets of skills. Staff must have customer experience and business skills. Teams need to respond quickly to business changes and opportunities. We need to be change leaders. IT needs to be transparent and build trust and actively manage relationships with the business. We need to understand the business well enough to bring opportunities to the table.

Transformation of IT’s stack itself: DevOps and all it implies (full-stack teams, job rotation, joing meetings, “servant leadership”, etc), chaos monkey. Culture shift – learning, trust, collaboration.

“We need to become IDEO consultants for campus, not Gartner consultants” Matthew Rascoff – Duke

Steve Fleagle – Iowa – OneIT

Board hired Deloitte to look at IT improvements. Adopted 4 recommendations. Proposed a 3 year self-implementation plan – 16 projects with a savings target of $3.6m.

Structure – project teams; coleaders from central and distributed units; program office to oversee projects; Steering committee

Communication – web site, kickoff meetings with each college, town hall meetings, IT leader retreats, monthly newsletter mailings, administrative updates.

Nine of 16 projects complete, seven active projects on track for closeout by end of FY18, 27.39 FTE redirected or eliminated. Two biggest efforts were desktop/device management and data center consolidation.

What went well? Executive support: Provost, new president; Clear direction from and access to the Board; Intentional transparency; High staff engagement: license to collaborate; Momentum: strong kickoff and early wins; Structure (program office, steering committee, change management)

What could have gone better? Challenging to convey what Board wanted and articulate parameters; generating campus buy-in was a long, exhausting process (lack of synergy at senior leadership levels. collegiate IT leaders caught between Deans and CIO), Mixed messages about involvement of other regent schools, not enough focus on cultural/political factors – us vs them mentality, when everyone agreed it was easy, changing roles felt threatening and caused resistance).

Lessons: Describe future state clearly or people won’t let go of the past; executive support is critical; mandate is a blessing and a curse (you still need buy-in); people need to know how change impacts them personally; helpful to identify common ground and diferences up front; good project managers are helpful; figure out who has skin in the game and work through concerns one conversation at a time). People really like success stories of how other people have gone through things.

Next Chapter: IT integration with health care IT

Brett Blackman – Nebraska – OneIT

President asked to look at efficiencies in IT across Nebraska, which is 4 campuses. Drivers: scale/efficiencies, improve IT security, improve/maintain services, cut $6m in permanent budget.

Strategy – fromed team structures to review IT organization (March 2017) – combined 360 central IT staff, 200 distributed IT staff. Learned from peers. Communication – be transparent with staff. Implemented new org model (September 2017) – balanced between scaled services and forward facing campus services; 80% of staff had some change to job; community of practice teams.

Five skill areas across the whole consolidated org: IT strategy and planning; Client Services; Security; Enterprise services; Infrastructure. Specific academic and application services exist at the individual campus level.

Outcomes – OneIT is the foundation for $6m in savings – reduced staffing largely through attrition; IT efficiencies; Procurement (joint contracts); Aligned distributed IT (administrative and academic); unifying central ITS campus budgets. Improved security; Improved/mantained services;

Lessons learned: Communication; Culture…change is hard. Enabling leadership at all levels; transition planning

Not everyone believes in collaboration when it means giving up local control.

IT, Procurement, and General Counsel at Nebraska (Paul Erickson)

As IT moves off-premise, more services are governed by contracts – IT has traditionally struggled with Ts&Cs. Not really commodity buy, so Procurement and GC weren’t comfortable with the issues. Had a multi-year effort to articulate ideal Ts&Cs.

Enterprise Architect & Strategy @ UW (Jim Phelps)

Shifting EA practice – was downward facing at tech, now very business architecture focused. How to link strategy management to investment planning to project portfolio management. EA value proposition – help set and then lead vision of change. Creates need for lots of workforce development.

IT Help desk Consolidation & Opportunities for Innovation – Phoebe Johnson (Minnesota)

Help desk Alignment – the who, what, where, and when. In 2012 there were more than 73 IT help desks across the University system.

IT Support Alignment: Cost savings; Great service; Regional zone support.

2 success stories: Liberal Arts Technology & Innovation Services – allows them to invest in relationships with faculty, students, and staff and foster a culture of innovation. College of Food, Agriculture & Natural Resource Sciences – Went from 8 end user support people to 1.5, freeing up people for academic and research tech and app development. Allowed them to focus on online courses.

Technology Advisory Council – Phoebe Johnson (Minnesota)

System-wide membership, deep technology expertise.

Purpose: Provide UMN staff and faculty with expert guidance & advice; reduce institutional cost; avoid redundant technologies; improve alignment with UMN standards, policies & practice. Engage in activities like work in RFP processes.

What we do: expert consultation and support to units; connecting people with similar needs; A portfolio of technologies.

How it works – inspiration, research, assessment, selection, purchase & implementation

Sarah Christen – Cornell

Wanted cloud initiative to partner with the rest of campus, not just central IT. Started talking about developers moving into a “broker” role – but that wasn’t a popular term. Moving from operators/administrators to innovators, developing an infinite number of solutions to meet problems. Creating infrastructure as code, and operations tasks are automated (which takes innovation), highly variable technology stacks – we consult ranter than run. Updates are frequent and automated (but still planned). New products are purchased frequently and need to be integrated into larger solutions.

Central IT as a transformation partner: focus on partnership – don’t reinvent the wheel when central IT can fill the gap. Staff dedicated to helping campus make the transition from the data center to the cloud – documented best practices, refactoring applications, collaboration venues, help/support tickets. Goal is to help with transition and training so the team an support and maintain their services.

Staff transitions: Desktop engineering (Appstream, VDI, CM and JAMF); Storage (starting to backend backups to the cloud, file replications in AWS); DBAs (embracing RDS, exploring new DB platforms, moving away from Oracle).

CSG Spring 2018 – Life in a Post Password World

We’re at Carnegie Mellon for the Spring CSG meeting. The first workshop is about “Life in

Password Security: How Safe Are Our Passwords – Richard Biever, Duke

Intro to passwords: How are they stored? What are hashes? What are the problems with hashes? (not all created equal – see NTLM in Windows)

What risk are we attempting to remediate (e.g. phishing or cracking?) (see https://haveibeenpwned.com/ )

Password Cracking: Methods of Attack: Brute Force; Brute Force with a mask (i->1, e->3, etc); Dictionary; Rainbow Tables (precomputed hashes) – gpu s make it easier to compute.

Vulnerabilities: Length, Type (passwords or passphrases), complexity

Password policies and entropy – higher entropy -> harder to guess, but harder to use. NIST 800-63v102 defines some standards. User chosen passwords are less entropic. User chosen 20 character password has less entropy than a random generated 8 character pw

Attack Dynamics: offline attachs against exfiltrated hashes. Microsoft hashes easiest. Modern GPU are fast – P50 = 14.7. billion hashes/sec – defeats 33.5 bits of entropy every second! Gets even faster if you use cloud GPUs.

Moving forward

Authentication strength: More than password strength: MFA for everyone. Vision – when you hit a shib web site, you should be able to use 2FA plus certificates. There’s a new standard called WebAuthn where you can use your phone as a token.

Ongoing projects: MFA for everyone – complete; MFA for VPN – complete; evaluating current password policy for security + user-friendliness; certificate management as a new factor in authentication strength.

Mark McCahill – Investigating Facial Recognition – now on smartphones (wouldn’t depend on just that one factor); needs high res camera, computer, fast network. Person database with ZGPU and machine learning moel to return inferences based on person database. Duke researcher Guillermo Sapiro’s group: ~150 ms latency for facial recognition ifnerence run on 100k faces.

Early POC project – facual recognition on doors – cheap sensor (raspberry Pi + 8 megapixel camera module + power supply (~$100 total) can stream video to inference engine and unlock the door. Issues: live detection via stereoscopic images? Gesture? Others? Consistent illumination? Neural Comput Stick – gigafflops on USB. Intel USB Movidius – 1 watt of power, 100 GFLOPS, 10 inferences/second in continuous inference mode, $79 retail. Move processing to the edge.

Survey Results – Tim Gleason (Harvard)

  • Do you believe that a long multi-word passphrase is sufficient? 78% said no
  • Do you use a personal Password Manager? 71% yes – lastpass most often cited.
  • How many passwords saved in your personal password manager ? 249 median (was 80 in 2016 meeting)
  • Is your password manager protected with 2FA – 54% yes.
  • What types of second factors are permitted?
    • Push on phone 23%
    • text message 17%
    • telephone call 21%
    • hardware token 22%
    • u2f fido tokens 11%
    • other 3%
  • do you use certs for personal authentication? 20% yes
  • Have you had a central AD hack? 25% yes
  • Has your institution been bitten by payroll bank account transfer attacks? 58% yes

Tim Gleason – Harvard

Password policy and standards (https://policy.security.harvard.edu) : Includes data classifications based on risk, password requirements, multifactor services; network positioning and protections. NIST 800-63B provide reference model. Harvard’s polcy states “all users are responsible for protecting their Harvard passwords…” Policy requires different passwords must be used for Harvard and non-Harvard accounts, and no shared passwords.

Three options for password complexity: HarvardKey (web auth system); Passwords > 20 characters; or < 20 characters with a bunch of complex options. They don’t expire passwords.

Deployed DUO in 2016, required for most services in 2017. LastPass Enterprise – free for Harvard affiliates.

Support Challenges: Single identity for life philosophy; Identity proofing is a distributed function between offices; HarvardKey enrollment.

Password recovery is generally self-service; lost or misplaced DUO tokens can require helpdesk interaction; user community is 24/7 challenge for identity proofing.

4 methods utilized by support teams to remote id:  . phone number, in-person, ask person to take a selfie and compare to official photo, trusted third party.

Consderable room for improvement in the user support experience.

A Decade of PKI – Jim Jokl, UVa

Why PKI? Stronger normal authentication for common applications, for use by everyone. Also strong authentication for sensitive data access.

Digital Certificate – bind identity of a person’s public key to their identity, signed by a certification authority.

Chose to do two difference Certificate Authorities – standard assurance, targeted for standard apps; and a High Assurance CA – offline CA using hardware crypto modules, uses a hardware token, requires in-person identity proofing. A few thousand in use at any one time.

High Assurance CA applications – VPN (each user gets a custom network filter for which apps they can access); System admins (ssh and web authentication for network management)

Standard Assurance CA apps – web authentication, VPN, Wireless, S/MIME for signed and encrypted email (not common nor encouraged).

Developed a provisioning tool that provisions certificate and wireless, VPN settings, security settings, and network registration.

WebSSO – most people have certificates on their devices so web authentication is easy to use. Still not used for most web logins.

Started process to migrate from home-grown CAs to SecureW2. Commercial product for standard assurance provisioning and CA. SecureW2 hosted web services provides: provisioning, configuration, SAML Authn.

Goal to switch to InCommon/Comodo for issuing certs. May end up using SecureW2’s CA instead.

Passwords Are Weeds – Stories from The Farm – Scotty Logan (Stanford)

June 2013 – HIPAA breach. Moved to required laptop encryption, then device management

Mid 90s – Built WebAuth SSO.

In 2011 built a two-step authentication system.

2013 – made everyone change passwords, and then AD was hacked so made them do it again. Made two-step mandatory for everyone. Late 2014 switched to Duo (keeping their old UI).

2014 – Meeting about IPv6 addressing and 802.1x for network authentication. Decided to use certificates. One per person, or one per person per device? UX advantages (no no need to transfer keypairs between devices, a lost device doesn’t affect other devices, can identify device in addition to person and associate device status with cert).

2016 – WiFi and Radius – passwords still terrible, but Duo mitigates Phishing. Still using WebAuth but no developers left; Increased use of SAML 2.0 from external providers; certificates for WiFi authentication and device management in place.

Built certcache – another CA – root provate key ever stored as a shole; CloudPath sub-CA issues certs to device/person pair. Data for associated devices stored with each certificate. CLoudPath calls webhooks when cert issued / revoked. Use AWS API gateway to transform URL into SQS message – no active code, configuration in Terraform. CertCache receives notifications from SQS, queries CloudPath for certificate details, stores in MySQL. Certificate status set to “unknown” while BigFix updates details. Cert  -Authn only allowed if status is “ok” or “unknown” (within seven days).

Late 2016 – concern about authentication after an earthquake. Switch to SAML 2, ditch webauth, What about WebLogin? – becomes just another SAML relying party.

2017 – work on migrating to containers on AWS.

Current Status – RADIUS: Cert authn VPN profiles in production – authz: CertCache (device status), LDAP (account status); Containerized; Still on campus but so is VPN service – going to investigate RADSec; VPN logs go to SUNAC for finer-grained network access.

Current Status – WebSSO – Migrating everything possible to SAML 2.0; WebLogin behind the IdP, still only on campus; WebSSO and supporting services running in AWS, but masters still on campus.

Going to disable text message and voice message 2FA (easily hacked formats) for some populations.

Phillip Kobezak, VaTech

Got into PKI around 2006 – lot of documentation and procedures, three CAs. Used Aladin eTokens for personal digital certificates – required in-person identity proofing. Used for limited populations. Started having support issues because vendor used specific functions in browsers that were falling out of support. In 2009 started using Vasco tokens with one time passwords.

Did a separate CA for wireless certs, operated for several years as primary wireless authentication until eduroam became popular. Now is shut down, but still use a separate identifier for the network.

Personal Digital Certs – now issuing self-service distributed online. Including key escrow. Uses: S/MIME email and project documentation signatures; Encryption of PDFs, including portfolios.

Two Factor deployment with Duo: 2013 AD compromise pointed out need for stronger auth. Duo on enterprise directory, AD, and VPN for all users, including alumni. Still need to address dsektops/laptops.

Path forward: evaluation of additional password-less approaches. Specifically interested in device registration with certs.

 

 

 

 

CSG Winter 2018 – Research and Teaching & Learning IT: Partnering with the Library

This morning’s workshop on partnering between IT and Libraries features Jenn Stringer/Chris Hoffman (Berkeley), Jennifer Sparrow/Joe Salem (Penn State), Diane Butler (Rice), Cliff Lynch (CNI), Louis King (Yale), David Millman (NYU)

The morning is starting off with some thoughts from Cliff Lynch (CNI):

Reminders of some things many haven’t lived through: In the early 90s there was a call not only for collaboration between IT and Libraries, but serious talk of merging. It was tried at a few institutions, like Columbia University. The takeaway was that it’s fairly crazy at large institutions. The mission expansion of each has been in differing rather than overlapping areas. But it’s been successful at a number of liberal arts organizations.

When CNI was founded it was totally viewed as a collaboration between the CIO and the head of the Library at member institutions. In the early 2000s that makeup was changing. The representation was the head of the library and someone doing research or academic computing, or doing digital work in the libraries. Led to increasing disengagement of the CIOs. Starting around 2000 started putting on executive roundtables with the intent of re-engaging the CIOs. It was fairly easy in the first few years to come up with topics in that sweet spot, but it got harder. If you look back from 1990 – 2005 you see that Libraries had low levels of technical expertise. At the same time libraries had developed some internal expertise in technologies important for digital humanities, data curation, etc, where there is now more competence than in the central IT org, which has structured its mission around infrastructure, compliance, etc. Libraries continue to rely on IT for fundamental infrastructure.

If you look at the landscape, how much IT capability is native to the library, and how much replicates or compliments the expertise in IT. This is hugely inconsistent. If you polled the CSG campuses you’d be surprised at the degree of variation in organic IT expertise in the library.

Collaborations involving library have become much more multilateral rather than bilateral with IT – involving partners like University Presses, Museums, research data management, digital scholarship centers (often involving academic school or department), geospatial centers, maker spaces. \

Don’t forget collaboration on institutional policies. Data governance, privacy and reuse of student data and analytics, responsibility of university to preserving scholarly products. Had a recent roundtable looking at policy implications of adoption of widespread cloud platforms.

This area does not lend itself to checklists.

UC Berkeley – Chris Hoffman

A history of good intentions – Museum Informatics Project – Housed in Library, Digital collections and DAMS. Complicating factors: Sustainability, budget cuts, grant funding; priorities; loss of key champions; culture.

Collectionspace – managing collections for museums.

Research Data Management – an impetus for change. New drivers (DMP requirements), new change leaders, new models for partnership. Benchmarking justified need. Broad definition of research data – all digital parts of a research project. Priority to nurture collaboration between IT and Library. Co-funded a position for program manager. Campus-wide perspective, investing in understanding and bridging cultures.

What’s next? More challenging tests to partnerships, RDM 2.0, Visualization and makerspaces, more fundamental technologies? (archival storage, virtual teaching and research environments); strategic alignment?

NYU – Stratos Efstathiadis, David Millman, David Ackerman

Research technology works closely with LIbraries.

Data Services – estab. 2008. 11 FTE Consultation and instructional support for scholars using quantitative, qualitative, survey design, and geospatial software and methods. Joint service of IT and Libraries.

Digital Library Technology Services – estab ca. 2000. Digital content publication and preservation. New services to support current scholarly communication. R&D to develop new services and partnerships, 19 FTE.

Research Data Management Services – estab 2015. 2 FTE. Promulgate beset practices in data organization, curation, description, publication, compliance, preservation planning, and sharing.

Research Cloud Services – new collaboration build on other preexisting services. Inteconnected research storage environment. REimagine a spectrum of cloud storage from dynamic to published final products. Provide backbone for researchers but also Libraries collections and workflows.

Yale – Louis King

Considerable history at Yale in working in digital transformation space.

Office of Digital Assets and Infrastructure – Sept 2008. Work closely with Library and ITS. Focus on Digital Assets & Infrastructure. Take advantage of disciplinary approach of libraries and technical capacity of IT.

Looking for ways to gain efficiencies and lower overhead for people who want to manage digital content.

Had some substantial initial success, but changes: Initial provost sponsor left Yale, 2009 financial crash, VP retired, two library director transitions, transition in IT director, emerging digital systems in Library.

Late 2012 relaunch as Yale Digital Collections Center, but closed in 2015. But it catalyzed momentum towards digital transformation at Yale. Established the foundation for many successful current and future collaborations.

Rice University – Diane Butler

Library and IT have been partners for a long time. For a very short time, the organizations were merged. Research IT and library have been partners since 2012 and informally even further back. Began iwth library providing the service and IT providing the core infrastructure but has morphed into a collaborative partnership.

Areas of collaboration: Data Management (through Library). Provide consultation, including creating DMP, describing and organizing data, storing data, and sharing data. Training, Access to resources such as platform for sharing and preserving publications and small-to-medium datasets. Still an area for work as faculty aren’t very engaged.

Digital Scholarship: Service provided by library with IT providing infrastructure. Preserving scholarship, navigating copyright and open access, managing and visualizing data, digitizing materials, consultation, etc.  Research IT has history in supporting engineering and sciences, but not so much in humanities.

Digital Humanities: Imagine Rio Project. Most successful collaborative project to date. An architecture and history professors joining together to imagine Rio de Janeiro. Searchable atlas of social and urban evolution of Rio.

Positive outcomes: Research IT had not supported Humanities or qualitative social sciences previously. Success of project has brought in more funding. Research IT now has 2 facilitators that are working with faculty in those disciplines.

At Rice the board has come up with some base funding for research computing, so that all of the work doesn’t have to be funded by grants.

Penn State – Joe Salem and Jennifer Sparrow

Strong history of working with libraries, IT, and student services on accessibility issues. Thinking about spaces in place and how to leverage institutional spaces. Built a “blue box” classroom.

Worked on the Dreamery – a co-learning space for bringing emerging technologies onto campus.

Driving strategic initiatives: Collaborative, technology-infused space. Inherited a space called the Knowledge Commons. Includes a corner with staffing from both Libraries and Academic Tech. Service partnership profile has grown from just a focus on media, to overall platform for supporting students. Work on curricular support together – open educational resources and portable content. Instructional design is a focus.

Learning Spaces committee – Provide leadership in innovative instruction.

What makes the partnership work … or not? What does each side bring to the table?

Chris – Berkeley

Visualization service at Berkeley. HearstCAVE: Connected virtual spaces over the Pacific Research Platform around preserving archaeology preservation. Thinking about how it connects with data science.

Markerspaces at UCB – pockets of excellence and experimentation. Jacobs Institute for Design Innovation. Talking with library and ETS to look at space.

Hooking the two together in a Center for Connected Learning.

Research Data Management at Yale
Much Ado about Something: Complex funder requirements; reliable verficiation of results; reuse of data in new research.

What are the responsibilities and rights of the University and faculty regarding research data? They put out a Yale Research Data & Materials Policy. Developed over 2-2 years with collaboration across the university. There is significant collaboration in support of that policy – Library and IT collaboration: Research Data Strategic Initiaitive Group, Research Data Consultation Group, Yale Center for Research Computing.

Recommendation: Research Data Service Unit; REports within LIbrary – Assessment, coordination, outreach and communication. Federated support model for all research data support services – research technology, data management, metadata, outreach & communications, customer relations, education and training, research data administrative analytics.

NYU – David Millman

Bottom-up requirements – survey local researchers: IT/Lib complementary styles, contacts. Survey peers: IT’Lib coordinated.

Executive review: Dean, AVP-level

NYU – research repository service identification. Umbrella of services – – researc lifecycle. Creation, manipulation, publication, etc. Holistic — customer focus. 1. HPC storage. 2 – medium” performance storage (CIFS, NFS); 3 – “published” sotrage – preserved, curated, citable.

IT/Library crossover strategy questions: business of universities: long-term preservation of scholarship. Any updates on our participation in digital preservation facilities? Some of our colleagues have recommended highly distributed protocols for better preservation. How do we approach this?

 

 

CSG Winter 2018 – Much Ado About GDPR

We’re in sunny, warm LA for the Winter CSG meeting, hosted by USC.  Last night, Asbed coordinated a group to go out for tacos at http://chichenitzarestaurant.com/ , which was excellent!

This morning we’re starting off with a workshop on GDPR, featuring: Sharif Nijim (Notre Dame), Jim Behm (Michigan), Paul Erickson (Nebraska), Alan Crosswell (Columbia), and Kitty Bridges (NYU)

GDPR = General Data Protection Regulation – 127 days until enforcement on May 25

Membership survey :
87% think GDPR is an institutional risk
58% identified as beginners in GDPR
70% either don’t know or don’t think their institution will be compliant
41% have engaged outside counsel
50% General Counsel and IT partnership to lead compliance initiative.

What is GDPR? Alan Crosswell.

EU regulation on personal data protection, applicable to people, products, or services. Replaces old regulations dating back to 1995. Covers: personal data (relating to people). Examples: IP address, genetic data, health data, research data, video surveillance. Who is covered? EU individuals or any company that offers products/services to EU individuals or collects/processes their personal data (includes non-EU citizens located in EU).

Requirements: Identify personal data; data protection by design; individual rights on data usage (transparency, right to data erasure, right to data portability, etc); obtain proper consent (opt-in); withdrawn consent and the right to be forgotten (opt-out); breach notification; designate data have to designate a protection / privacy officer (DPO).

What does it mean for a student to have the right to be forgotten?

Penalty: Failing to report breaches within 72 hours maximum of 20 million euros or 4% of organizational annual revenue – whichever is greater.

Preparing for GDPR – key steps: Promote awareness; discover PII you hold; implement data protection by design; identify legal basis for processing personal data; review procedures for communicating personal data, individual data rights, data consent, guardian consent for minor’s data, data breach detection, response, and notificaton; designate data protection / privacy officer.

Definitions:
EU Indivdidaul – physically located in an EU member state, both EU citizens and non-EU citizens.
Personal Data – relating to identified natural person. name, ID number, location data, online identifier, address, email, passport, cookies, drivers license, etc
Consent: freely given, unambiguous indication of data subject’s wishes of subject’s wishes.

Question: does this include firewall logs? General agreement that it does.

Comment: This is subject to legal jurisdiction, and the thought that this is generally applicable to everything we do might not be correct.

GDPR Scenarios

Recruiting: NYU recruiter holding open house in Paris for EU people to find out about NYU. Recruiter gathers name, interests, and hands over wifi credentials. Need to give an explicit consent form, saying which elements are collected, what they’ll be used for, and how long they’ll be retained. Has to be provided in the native language. (Is your admissions prospecting software aware of and planning on how to handle GDPR? That’s institutional data – it’s an indemnification issue. What kind of contract language do you have?).

Admissions: Need name, national ID, country of origin, addresses, high school transcripts, etc. to make effective admissions decision. Also use that information for research (see Unizen). How is consent for data tracked through the various systems?  (Common App Organization GDPR adjustment ETA? – “early spring”).

Question – has anyone reached out to European universities on what they’re doing to prepare for GDPR?

Matriculation – example of alleged assault from student abroad. What happens if student exercises GDPR rights to not share data back to the US? Could contracts with partners abroad be affected if we don’t behave according to GDPR? Example of LMS vendor that is spinning up version of LMS in the EU specifically for GDPR – do we keep our data there for EU citizens?

Research – What about information about researchers kept on servers? Do legal federations with agreements help us? GEANT did a study on GDPR impact on Edugain. Emerging attribute release agreements help with GDPR compliance. GEANT is submitting a new code of conduct for GDPR – a way of publishing attributes in an open and transparent way. Coming out later this year. Transparency, documentation, and incident response are critical pieces.

Alumni and Benefactors – What data are collected and where is it? What if they want to be removed? Compliance might be viewed as a revenue issue. There is a notion in GDPR of “legitimate interest” but that isn’t a blanket clause.

Comment: We should follow advice of counsel on how to approach GDPR. It may not be worth a lot of worry at this point about how much this impacts us. Just because it’s over the Internet doesn’t make it different than any other issue between countries and how citizens are treated. We all need to decide what our risk posture will be.

How many campuses operate summer camps with people under 16 from EU countries?

If institutions are backing away from collecting citizenship data (from concern about undocumented people), does that impede our effort to comply?

Educause and GDPR: Trying to curate best resources – see page at: https://library.educause.edu/topics/policy-and-law/eu-general-data-protection-regulation-gdpr

Good to start with JISC resources. https://www.jisc.ac.uk/gdpr

Territoriality – we higher ed institutions generally have enough business links that we should surmise that GDPR might apply in some way.

Educause is working with other US higher ed groups (NACUA, etc) on GDPR guidance. It’s slow going, and all organizations are struggling with what advice to give members.

Notre Dame – Initial meeting with General Counsel (8/2017); Elevated to information governance committee (9/2017); Assigned to IT by institutional risk committee (10/2017); Compliance questionnaire circulated (11/2017); Questionnaire data aggregated and analyzed (1/2018) – Hard to collect data across the institution – will need help from general counsel in complying with collection. The vision is that data stewards will be accountable for the data in their areas. Impossible to collect every last piece of data, but important to show due diligence and have a process for dealing with issues.

NYU – Have hired external counsel – issuing questionnaires. In data collection mode, focused on central administrative entities. Don’t yet know what the institutional posture will be. General Counsel will advise. Will likely think of this as responsibility of business offices, who have been involved in discussions. IT is a key partner, knowing how things are connected together. First thing to focus on: Documenting identity data; movements of data between systems; prioritizing what to worry about first (biggest risk). Especially tricky areas are warehousing, analytics, and logs. Logs: operational logs (IP addresses, MAC addresses, authentication logs, DB logs, application logs) used for troubleshooting and trending. Can they even be made anonymous? Audit logs – understanding who has access, understanding really how long things need to be kept in identifiable form.

Nebrasksa: Bringing together multiple conversations around GDPR – General Counsel coordinating. Work in progress – expect to at least have posture before deadline. NACUA webinar was very helpful. Distance ed group started early on. Good test of relationships across campus – IT as implementer. Research group is interested in GDPR to help guide data governance. Indemnifaction – example of SaaS contract where vendor struck out “global standards.”

UMich – Started this past summer – taking a “cautious approach”. Concern about the extent to which regulations will apply to US institutions. Group led by General Counsel with representation across campus. Counsel has hired a consultant to help guide campus through the process. There’s enough gray areas that it’s unlikely that campuses will be held accountable in May. For state institutions, it may be the state that is accountable, not the institution. Might not be the case at Michigan.

Rice – Chief Compliance Officer leads a working group with the CISO. Creating an institutional web site for information.

UVa and Va Tech – very early in process. Conversations with General Counsel.  State AG’s office has hired counsel who should be issuing guidance for state higher ed.

Ron – IT is the only organization that touches every other organization in multiple domains – so it falls to us to be of service.

Minnesota – Counsel leading effort, still assessing impact and how much needs to be done.

Iowa – In due diligence approach, with Counsel taking lead. Will be naming a privacy officer. Creating a plan for operations that take place in the EU, which is a relatively small set.

CMU – very early on. Taking gap analysis approach.

Sharif – taking the approach that much institutional data is “legitimate interest” vs. asking specific consent. But that still requires transparency. How far does legitimate interest go?

Maybe this worry is overblown (like we did with CALEA)? It’s primarily targeting Googles and Facebooks, not higher ed.

Should we be reviewing cloud contracts for how GDPR is or isn’t covered? Could Educause help come up with a checklist for review? To what extent does it affect Net+ contracts? (e.g. LMS).  We could have an area on the CSG site for sharing information.

We may be likely to see something analogous in the US, so this won’t be wasted effort. Much of what we need to do for GDPR are just good enterprise data practices.

It’s not an IT project, it’s about institutional risk. Should be part of that regular assessment process.

 

Higher Ed Cloud Forum: Getting Harvard’s Enterprise Systems to the Cloud

Ben Rota, Harvard

How a crisis optimized our organizational structure\

Phase One Org Structures (as seen retroactively):

February 2015 – Trying to change culture as well as technology. People had expectations that were impossible to meet. Original cloud team was drawn from infrastructure and IdM – no developers or applications people.

May 2015 – Restructured to focus on migrating a single application and supporting Public Affairs.  Successful migration, but had tension between operational work and further migrations.

June 2015 – split into multiple, smaller scrum teams to support more simultaneous projects. Lacked cohesiveness, plus operations were still killing them.

Septemer – December 2015 – team was demoralized. operational work continued to be a problem – pile of cleanup work. No ability to reduce tech debt.

December 2015 – PeopleSoft group decided that 9.2 upgrade would be done in the cloud. Cloud team didn’t have enough resources available to help, but their consultant could help.

June 2016 – PeopleSoft team realizes consultant didn’t work out. Cloud program put .5 FTE on the project.

December 2016 — peoplesoft migration at significant risk – migration team created to respond to impending crisis.

Application Portfolio Teams – co-located, cross-functional group for portfolio migration projects. How’s it going? Migrations are accelerating. PeopleSoft and Oracle Financials, Grants Management are migrated. Close to Alumni Affairs and Development system. Troubleshooting migration problems has gotten easier – co-l;ocation smooths communication. Shared goals breaks down silos.

Organization of work around HUIT managed applications run the risk of neglecting the “long tail” of smaller applications and systems. Too many product owners in the kitchen – how do you prioritize work when you have competing interests? Operational work vs. migration work is still a challenge, but now it’s more about prioritization within a team rather than across silos.  DevOps still has too many definitions! Had a day-long workshop open to all of HUIT in a facilitated discussion about what they hope to get out of this effort.

Higher Ed Cloud Forum – Tools for predicting future conditions: weather & climate

Toby Ault, Marty Sullivan – Cornell

Numerical models of climate science. Most of fluid dynamics in models for weather and climate are physical equations and solvers are “Dynamical Cores” that tells you about the flows of fluid in 3d space.

Continuum of scale needs to be accommodated – done through parametrization. Want to be able to sample as many parameterization schemes as possible.

Interested in intermediate time scales (weeks to months) that have been difficult to model. There’s uncertainty arising from different models so having multiple models with multiple parameterizations that get averaged together with machine learning can have huge payouts in accuracy.

Are the most useful variables for stakeholders the most predictive?

Weather simulation in the cloud:

Infrastructure as code makes things much easier. Able to containerize the models (which include lots of old code), so people don’t need to know all the nuts and bolts. Using lots of serverless – makes running web interfaces extremely easy.

Democratization of science – offered through a web interface. People can configure and run models for their locations.

Lots of orchestration behind the scenes: Deploying with CloudFormation, using ECS, etc.

Higher Ed Cloud Forum: Desktop as a Service – Moonshot to production in 6 months

Deanna Mayer, Brady Phipps — UMUC University College

Primarily online programs, 90+ programs and specializations, 80k students worldwide, 140+ classroom and services locations in 20 countries. Heavily into IT outsourcing – started with a VDI vendor, but they couldn’t scale. Needed non-device specific VDI that didn’t require an install.

Student requirements: fully integrated, one-click classroom experience; access across program, not limited to single course; secure environment providing immersive experience; ability to scale; single sign-on; rich metrics and analytics. Huge spikes in usage on Sunday nights before assignments were due.

January – April 2016 did a RFP. No vendor met all requirements. Most vendors focused on a single image across a corporation. Partnered with Amazon in April, project approved in June. Flew local solutions architect to Seattle to sit with AWS side-by-side for three weeks. Ten people for project team in a room focused on the problem, due by October. Initial launch to 400 students in August. Cut cord with legacy vendor in May – moved over 60 courses. Now have over 10k students using it. 22.5 hours/month average usage, 25% drop in student support requests.

Launched AccelerEd, a new company with Aloft, a cloud services unit.

Higher Ed Cloud Forum: When can a computer improve your social skills?

Ehsan Hoque (University of Rochester)

Behavior mining -> Applications -> Deployment

Automated Prediction of Interview Performance -> My Automated Conversation Coach (MACH) -> ROCSpeak.com

MACH – My Automate Conversation coacH — originated from people with Asperger’s wanting help developing conversational skills.

Originally a research application, got a grant from Azure to develop a cloud version. As people use the framework, the data gets fed back into the model, which improves the performance.

At the end, it’s not the specific cloud functionality but the interaction with the people at the vendor that makes things work.

Higher Ed Cloud Forum: Epidemic Modeling in The Cloud: Projecting the Spread of Zika Virus

Matteo Chinazzi (Northeastern University)

MOBS lab — part of Network Science Institute at Northeastern, modeling contagion processes in structured populations, developing predictive computational tools for analysis of spatial spread of emerging diseases.

Heterogeneous interdisciplinary research group – physicists, economists, computer scientists, biologists, etc.

GLEAM – Global epidemic and mobility model – integrates different data layers – spatial, mobility, population data. For Zika, had to introduce mosquito data, temperature data, and economic data (living conditions).

Practical challenges:

  • unknown time and place of introduction of Zika in Brazil (Latin square sampling + long simulations (4+ years))
  • Parameters need to calibrated and estimated: prediction errors add stochasticity at runtime.
  • Intrinsic stochasticity to to epidemic and traveling dynamics
  • Need quick iterations between different code implementations

Each simulation takes 6-7 minutes, need > 200k simulations. each scenario generates about 25TB of data, needed in a day. Tried on-premise, but not enough compute cores, resources were shared and bursty, and there was no reliable solution to analyze data at scale.

Migration to GCP – prompt replies and assistance from customer support (“your crazy quota increase request has been approved”)

Compute Engine – ability to scale in terms of compute cores – up to 30k cores consumed simultaneously. Can keep data without saturating on-prem NFS partitions. Big Query – ability to scale in terms of data processing. In < 1 day can run simulations and analyze outputs.

Workflow steps: Custom OS images for each version fo mode;; startup scripts to initialize model parameters, execute runs, perform post-processing and move to bucket; Python script to launch VMs, check logs, run analysis on BigQuery, export data tables to bucket, and download selected tables on local cluster. Other scripts to create pdf with simulation results.

Numbers: has 750k+ instances, analyzed 300 TB of data, simulated 10M+ global epidemics, 110+ compute years

Lessons learned: Use preemptible VM instances (~1/5 of price, predictable failure rate); use custom machine types; run concurrent loading jobs on BigQuery; use Google Cloud Client Library for Python – from simulations to outputs with no human interventions; Be aware of API rate limits.