CSG Fall 2016 – ITIL and DevOps

Why is this important?

  • Does ITIL make sense in an era of continuous delivery and integration?
  • Will the volume of applications and sites overwhelm the management methodology?
  • Distributed IT is not well versed in ITIL
  • Does DevOps include formal review? Shouldn’t Tier 0 sites and apps get reviewed for changes?

Survey results

  • Almost all respondents have a formal Change process and board
  • Divided on if PaaS/SaaS need formal change reviews
  • Some said that changes are only managed for major changes
  • Most respondents not mature yet with DevOps practices
  • Some groups doing agile development, but not all

Harvard working on trying to reinvent ITIL in the cloud environment – since it’s all software now, release management practices are more appropriate than change management.

Would be good to have changes (even pre-approved ones) logged in ServiceNow so incidents could be correlated with changes.

In new cloud deployments people aren’t patching, but blowing machines away and deploying new ones. How does change process handle that?

Notre Dame trying to eliminate human access to the cloud console for production systems

Nobody in the room is doing continuous deployments to ERP systems

Cornell – with self-healing infrastructure they may not even know there’s an outage.


Tom Vachon, Harvard

Harvard’s cloud at a glance

  • 684 applications targeted for migration by 7/18, 300+ migrated already
    • Shutting down one on-prem data center
  • 1 VPC per account on average
    • Centrally Billed: 131 Accounts
    • 45 Accounts/VPCs on Direct Connect
    • Looking to make Cloud a University-wide strategic program
  • Cloud Shield – physical firewall
    • Kicked off 7/15 in response to a security breach
    • POC – 11/15 – 2/16
    • Started automation code 3/16
    • 15,000 lines of code
    • Production ready 7/16
    • Design goals
      • provide highly available and highly redundant AWS network access
      • Provide visibility of traffic into, out of, and between cloud applications
      • Provide next-gen firewall protections
      • Inline web filtering to simplify server configuration
      • Provide multicloud connectivity
    • Tech details
      • Diverse paths and POPs – Boston has 2 direct connects, and a POP in Equinix in Virginia with private network connection to campus
      • Primarily done for visibility
    • Actively discourage host-based firewalls
      • Use security groups instead
      • Don’t use Network ACLs
  • Will provision services with public IPs
    • They have overlapping private address spaces
  • Design manager of managers in Python
    • Create an ops & maintenance free architecture in Lambda
    • Provide REST API through AWS API Gateway
    • Isolate changes by segregating integrations in AWS Lambda
  • Leverage AWS DynamoDB for
    • Schemaless session cache
    • Dynamic reconfiguration
  • Challenges
    • Static DNS names
      • use ELB or ALB for applications
    • Everyone needs to be on Harvard IP space
      • Delegates six /16s for AWS
    • Legacy application stacks
      • Java has a “mostly hate” relationship with DNS
        • Lots of apps cache DNS forever
    • Reduced S3 visibility
    • Inability to do app-by-app identification
      • Grouping by data classifications
    • Items which are unknowingly locked down to AWS IP space
      • eg doing a yum update to AWS Linux from a non-AWS ip space
  • Virtual firewalls per VPC were going to cost >$4 million over three years, this model costs $1.6 million over five years
  • Most applications got faster when distributed across this model
    • Less switching in the way

Panel Discussion

  • Biggest technical challenges so far?
    • Georgetown  – have to run virtual firewalls in HA. Looking at replacing with TrendMicro
    • Harvard – lack of visibility in AWS
    • UNL – Vast offerings from vendors – how to wrap heads around it?
    • How to support on prem and burst out, especially for research instruments?
    • Cornell – Keeping up with the technology. Having people to manage and implement solutions. Encouraging lack of consistency in an effort to use the best new technology to solve problems.
    • Wisconsin – Have to worry about security a whole new paradigm in the cloud.
    • Notre Dame – pace of innovation. Do we prepare for a more rapid pace of change (and those costs) or learn to live with not implementing the latest?



CSG Fall 2016 – Security and Configuration in the Cloud, pt 1

Sarah Christen is introduces the workshop.

Bob Winding and Sharif Nijim – Notre Dame

  • Cloud first – even distributed groups going cloud first
  • VPCs: Share Services VPC with peering to Central Applications or Departmental VPCs; VPN Tunnels over I2 to campus; Be wary of implicit peering through campus routers.
  • 80% central IT, 20% distributed
  • Pauses to assess progress are built into the plan, with sprints to address issues. Inviting Mandian to campus to help establish 5 year security roadmap.
  • Export controlled data
    • 22 projects on campus dealing with this kind of data
    • Gov Cloud new initiative to support research
    • NIST 800-171 DFAR-7012 – looks a lot like PCI DSS
      • AWS covers 1/3 of security controls in GovCloud
      • Talked to a half-dozen PIs – experiments generate lots of data, then they move data to a local spot for analysis, or design work that happens locally with specific apps.
      • Developed a compliance matrix and quick start template in Cloud Formation
        • Quick start builds shared services and multi-tenant project VPC
      • Want to create an environment in GovCloud that is cloistered for the work until it goes back to the sponsor.
  • VDI – Using graphics intensive applications in the cloud
    • Looked at frame – delivers screen from remote desktop over video streams. Running pilot in US East
  • Look at the RDP gateway as the audit boundary – doesn’t include the end user device
  • Least privileges in IAM
  • Working with Purdue to look at SaaS providers for security monitoring and log analysis
  • AWS Security
    • Flipped IAM from least privilege to explicit deny of dangerous operations
      • Separation of control on IAM policy creation and application
      • Writing Lambda functions to undo changes that aren’t permitted
  • Organizing security groups
    • Setting standards for common functions, like sysadmin access
    • Engineers have a hard time keeping things simple
    • Databases use security groups for access control, which simplifies auditing
  • `Data security
    • Using Tripwire tuned precisely on systems with confidential information
    • Encryption at rest and backups
    • Replication of backups/snapshots to a separate account and region. If a credential is compromised can’t destroy both operational data and backup
  • Future
    • Cloudfront WAF
      • Want to fully leverage Amazon’s tools to gain advantage
      • Realize that this increases lock-in with the vendor
    • Host IDS for selected sensitive systems – looking for things that don’t cause choke points
    • Comment from Bruce – “we’re on the verge of a post-firewall world”
      • At AWS have to use IP-address based controls across VPCs and shared services
  • https://oit.nd.edu/cloud-first

Bob Turner – Wisconsin

  • Somewhere between cloud experimentation and cloud aware.
  • Trying to not yet deal with sensitive and restricted data in the cloud
  • security requirements for accounts and VPCs
    • Working off script based on risk management framework
    • Using it for onboarding people into cloud environments
    • Working on audits and attestations
  • Enforcing cloud controls (will also use for on campus environments)
    • Provisioning/De-provisioning
    • Going to try to use FEDRAMP checklist as a guide
    • Approval of risk by Executive able to accept on behalf of University
  • Automated Templates (consultancy model)
    • Create a new account or migrate existing account under master
    • Pre-provisioned equipment templates with logging enabled
    • Configured for Shibboleth
    • Moving towards Duo for MFA
    • Activate AWS Config
    • Use (future) cloud security tool for initial verification and continuous monitoring
  • Things to be concerned about
    • Holding on to root accounts and credentials
    • Challenges of CDM
    • Usual tools are not necessarily available
    • AWS tools have charges
    • Challenge of cloud vendors that don’t support SAML or federation
  • Account management
    • Group email per department, including Office of Cybersecurity Rep
  • Researcher accounts
    • must know their expected data (at present no Restricted or Sensitive data)
      • Google as a government service that has been pretty well vetted by US agencies

Sarah Christen – Cornell

  • Cloud first according to IT Strategic plan written in 2013
  • 54 accounts under master contract, hundreds outside
  • Cloudification services has been an opportunity for central IT to partner with campus
  • Requirements for being on master contract
    • Onboarding discussion
      • How billing works; unit responsibilities – how is this different than the data center?; Security and configuration requirements; Benefits; Discussion about joining tech community; central services available – Container service (will containerize and run code for fee), DevOps service
    • Attestation
      • Explicit agreement to policies
    • Shibboleth
    • Duo for MFA for console access
    • Activation for AWS Config and CloudTrail
    • CloudTrail logs sent to Security Office
  • Onboarding – create account, configure Shib and Duo, lockdown root account, standard AD groups (admin, cloud group, security), activate Config and CloudTrail and configure CloudTrail logs to be sent to Security office as well as the VPC owner; activate Cloudcheckr and schedule review of how to use.
  • CloudCheckr – allows those with accounts to see usage data; makes recommendations on how to save money; sends monthly invoices; runs continuous vulnerability scan; gives Security a view into all accounts
  • Standard VPC setup – blogs.cornell.edu/cloudification/2016/04/08
  • What about reseaerch accounts?
    • Easy onboarding without a lot of steps or complication
    • No intereference with research, no cost of performance overhead
    • Solutions for export controlled data and othe rcompliance requirements
    • Standard network config not always a good fit
    • Consultation and services – Docker, Data Storage, Training, Devops support

Mark Debonis – VaTech

  • Cloud Aware -> Moving into CLoud Experiment
  • One production VPC in AWS, five pre-production
  • Moving towards both AWS and Azure offerings
  • Manual provisioning process
    • Customer contacts CCS via Service Catalog for Cloud brokerage discussion
    • Difference in Azure (upfront) and AWS billing models – In Azure if you don’t use your commitment in a year you lose it
  • Logins to Azure portal with VT AD account, Redirect to VT ADFS, Login and use Duo, Primary contact manages other users through Azure Admin portal with VT AD accounts

Kevin Murphy – UNL Lincoln

  • Cloud first for SaaS
  • Experimentation for PaaS and IaaS: Rackspace, Azure, AWS
  • On VPC in Azure for disaster recovery (domain controllers, ADFS)
  • VPC in progress for AWS
  • Central IT is pushing cloud strategies, very little departmental participation. Research computing run by CS faculty, not interested in cloud computing.
  • Security requirements: Federated logins (ADFS with Duo) for Azure. Shipping everything from IaaS to Splunk on campus
  • Security requirements – manually creating accounts; No PII data in the cloud
  • Been doing Azure StoreSimple device – hybrid solution.
  • Moving PCI environment to the cloud with a managed service provider who will take the liability and run on AWS. “not extremely expensive”
  • Challenges: Moving current architecture to IaaS can be prohibitively expensive – people build for peak loads, need to use elastic capabilities. Exploring PaaS options such as Azure Web Apps and DB services. Billing is a challenge.

Bereket Amdemichael, Daniel Tamiru, Georgetown

  • Based their AWS cloud architecture on the work done at the CSG Cloud Architecture Working Group
  • Added a proxy layer.
  • IPSec VPN – Cisco
  • Users only have access to specific VMs – have to access across the VPN
  • VPC and group architecture is a “spirited discussion”
  • When do they (security) need to be alerted when something isn’t right?
  • Using Equinix for high speed transfer to AWS

CSG Fall 2016 – Next-gen web-based interactive computing environments part 2

NYU Sample Cases – Stratos Efstathiadis

Web-based interactive tools supported by NUI Data Services

Most popular tools include Quantitative, Geospatial, Qaulitative, Visualization. Courses, boot campus, etc.

Some tools are web-based (R Studio, ArcGIS  Online, CART, Qualtrics, Tableau, plotly)

Services provided for tools: Training, Consultations; Pedagogy; Data; Accounts

Geospatial example: ArcGIS online used by courses in radicalization & religion and ethnic conflict, art & politics in the street. Initial consultation and needs assessment, account creation, training, data gathering, in depth consultations, initial web publishing, training round 2, technical support, lessons learned.

Certificate class in Big Data. Structured around a textbook developed from a set of similar classes. Three options: For credit; certificate class meeting four times a semester for three days each; tailored for an agency. Includes non-NYU students, students will be able to access and analyze protected /confidential data. WOrk in teams sharing code and data in project spaces; provide substantial analytic and visualization capabilities where everyone in the class can work simultaneously. User experience is important.

Deployed two PoC environments: on-premise (short-term) and on AWS (long-term).

Built NYU Secure Research Data Environment – serve broad communities of NYU scholars and their collaborators including government agencies and privat sector; support a wide spectrum of data; provide access to powerful resources; enable collaboration; offer training; offer data curation and publications.

Part of Digital Repository Service: Band 1 (fast temporary storage); Band 2 (storage for ongoing activities); Band 3 (feature rich publication environment); Band 4 (secure data environment).

ARC Connect
Mark Montagues – Michigan

Enable easier access to HPC resources – researchers who had never used the command line. Texas shared code they wrote for the XSEDE portal. Added federated authentication with shib and mandatory multifactor. ITAR today, HIPAA on roadmap. Mandatory encryption for VNC sessions (no SSH tunnels needed). Web-enabled VNC viewer brings up a desktop. Encryption is enforced and mandatory. gsissh (part of the globus toolkit) enables authn between arcconnect web server and cluster node. Environment has RStudio and Jupyter. Researchers can install web apps in their home directory on the PC cluster.

To take advantage of the infrastructure, web apps need to be able to run nicely behind a reverse proxy. Hoping to automate the environment more in the future.

Charlie suggests that Galaxy is a piece of software that is worth looking at.

CSG Fall 2016 – Next-gen web-based interactive computing environments

After a Reuben from Zingerman’s, the afternoon workshop is on next gen interactive web environments, coordinated by Mark McCahill from Duke.

Panel includes Tom Lewis from Washington, Eric Fraser from Berkeley

What are they?  What problem(s) are trying to solve? Drive scale, lower costs in teaching. Reach more people with less effort.

What is driving development of these environments? Research by individual faculty drives use of the same platforms to engage in discovery together. Want to get software to students without them having to manage installs on their laptops. Web technology has gotten so much better – fast networks, modern browsers.

Common characteristics – Faculty can roll their own experiences using consumer services for free.

Tom: Tools: RStudio, Jupyter; Shiny; WebGL interactive 3d visualizations; Interactive web front-ends to “big data”. Is it integrated with LMS? Who supports?

What’s up at UW (Washington)?

Four patterns: Roll your own (and then commercialize); roll your own and leverage the cloud; department IT; central IT.

Roll your own: SageMathCloud cloud environment supports editing of Sage worksheets, LaTex documents, and IPython notebooks. William Stein (faculty) created with some one-time funding, now commercialized.

Roll your own and then leverage the cloud – Informatics 498f (Mike Freeman) Technical Foundations of Informatics. Intro to R and Python, build a Shiny app.

Department IT on local hardware: Code hosting and management service for Computer Science.

Central IT “productizes” a research app – SQLShare – Database/Query as a service. Browser-based app that lets you: easily upload large data sets to a managed environment; query data; share data.

Biggest need from faculty was in data management expertise (then storage, backup, security). Most data stored on personal devices. 90% of Researchers polled said they spend too much of their time handling data instead of doing science.

Upload data through browser. No need to design a database. Write SQL with some automated help and some guided help. Build on your own results. Rolling out fall quarter.

JupyterHub for Data Science Education – Eric Fraser, Berkeley

All undergrads will take a foundational Data Science class (CS+Stat+Critical Thinking w/data), then connector courses into the majors. Fall 2015: 100 students; Fall 2016 500 students; Future: 1000-1500 students.

Infrastructure goals – simple to use; rich learning environment; cost effective; ability to scale; portable; environment agnostic;  common platform for foundations and connectors; extend through academic career and beyond. Student wanted notebooks from class to use in job interview after class was done.

What is a notebook? It’s a document with text and math and also an environment with code, data, and results. “Literate Computing” – Narratives anchored in live computation.

Publishing = advertising for research, then people want access to the data. Data and code can be linked in a notebook.

JupyterHub – manages authentication, spawns single-user servers on demand. Each user gets a complete notebook server.

Replicated JupyterHub deployment used for CogSci course. Tested on AWS for a couple of months, ran Fall 2015 on local donated hardware. Migrated to Azure in Spring 2016 – summer and fall 2016. Plan for additional deployment to Google using Kubernetes.

Integration – Learning curve, large ecosystem: ansible, docker, swarm, dockerspawner, swarmspawner, restuser, etc.

How to push notebooks into student accounts?  Used github, but not all faculty are conversant. Interact: works with github repos, starting to look at integration with Google Drive. Cloud providers are working on notebooks as a service. cloudl.google.com/datalab, notebook.azure.com.

https://try.jupyterhub.org – access sample notebooks.

Mark McCahill – case studies from Duke

Rstudio for statistics classes, Jupyter, then Shiny.

2014: containerized RStudio for intro stats courses. Grown to 600-700 students users/semester. Shib at cm-manage reservation system web site, users reserve and are mapped to personal containerized RSTudio. Didn’t use RStudio Pro – didn’t need authn at that level. Inside – NGINX proxy, Docker engine, docker-gen (updates config files for NGINX), config file.

Faculty want additions/updates to packages about twice a semester. 2 or 3 incidents/semester with students that wedge their container cause RStudio to save corrupted state. Easy fix: delete the bad saved-state file.

Considering providing faculty with automated workflow to update their container template, push to test, and then push to production.

Jupyter: Biostats courses started using in Fall 2015. By summer 2016 >8500 students using (with MOOC).

Upper division students use more resources: had to separate one Biostats course away from the other users and resized the VM. Need to have limits on amount of resources – Docker has cgroups. RStudio quiesces process if unused for two hours. Jupyter doesn’t have that.


Shiny – reactive programming model to build interactive UI for R.

Making synthetic data from OPM dataset (which can’t be made public), and regression modeling against that data Wanted to give people a way that allows comparison of results to the real data.



CSG Fall 2016 – Large scale research and instructional computing in the Clouds, part 2

What is Harvard Doing for Research Computing?
Tom Vachon – Manager of Cloud Architecture at Harvard

Research Computing & Instructional Computing
Harvard AWS admin usage averages about $150k/month, research computing ~$90k. There’s Azure usage too.

They have a shared research computing facility with MIT and other campuses. Want to use cloud to burst capacity.

Cloud makes instructional computing easier, particularly in CS-centric classes.

How do you save money in the cloud? Spot Instances (if you can architect workload to survive termination with almost no notice); Auto-turndown (can base rules on tags); Provide ongoing cost data – if you don’t tell people how much they’re spending as they spend it they get surprised. How does region choice influence cost? Cheapest region may not be closest. Certain places might not have the features – e.g. high bandwidth infiniband only available in 2 regions in Azure. Understand cloud native concepts like AWS Placement Groups to get full bandwidth between instances. How do you connect to your provider – cannot be an afterthought. What speed? Harvard has 40 gb direct connect to AWS. What reliability? (had issues with Azure VPN appliances which disconnect every six minutes); Where do you do encryption? Network or application? They chose to require application encryption (including database connections), don’t encrypt their connections.

Cloud requires new tools. How will you handle multiple providers? They’re making golden images for each provider that has very little in it. Ideally have one config management product (they’re consolidating to Salt Stack). Using Terraform to run images on multiple vendors – worth buying the enterprise version. Bonus if you can use same toolset on-premise.

Research Computing in AWS at Michigan
Todd Raeker – Advanced Research Computing Technology Services

What are researchers doing? What kinds of projects?

At Michigan environment is dominated by the Flux cluster: HPC and HTC, 28k computational cores. Researchers aren’t looking to do large-scale compute in the cloud.

In 2015 AWS program to explore cloud. Received 20 x $500 AWS credits. Most were small scale projects. Was primarily used by web applications – easy to learn and use. Researchers working used AWS for data storage and compute. Easier to collaborate with colleagues at different institutions – researchers can manage their collaborations.

Pros and cons: Can be support intensive, need to train staff. Good for self-sufficient researchers (with savvy grad students). User setup can be made hard. Is it really cheaper?

Duke – 60% of research compute loads were using 1 core and 4 GB of RAM – ideal for moving to the cloud.

Asbed Bedrossian – USC, Jeremy Hallum, Michigan

Containerization can help with reproducibility of science results. That’s an area where IT can partner to make containers for researchers to distribute.  Science is a team sport now, complexity increasing rapidly. Challenge is to get attention of researcher to change how they work – need to meet person in the field (agricultural extension metaphor). Central IT can help facilitate that, but you can’t just hang out a shingle and hope they come. Ag Extension agents were about relationships.

Notre Dame starting by helping people with compliance issues in the cloud – preparing for NIST 800-171 using GovCloud.

Getting instructors to use modern tools (like checking content into Git and then doing the builds from there) can help socialize the new environments.

Harvard hopes to let people request provisioning in ServiceNow, then use Terraform to automatically fire off the work.  Georgetown looking at building self-service in Ansible Tower.

Research computing professionals will need to be expert at pricing models.



CSG Fall 2016: Large scale research and instructional computing in the Clouds


We’re at the University of Michigan in Ann Arbor for the fall CSG Meeting in the Michigan League. Fall semester is in full swing here.

Mark McCahill from Duke kicks off the workshop with an introduction on when and why the cloud might be a good fit.

The cloud is good for unpredictable loads due to the capability to elastically expand and shrink. Wisconsin example of spinning up 50-100k Condor cores in AWS. http://research.cs.wisc.edu/htcondor/HTCondorWeek2016/presentations/WedHover_provisioning.pdf

Research-specific, purpose-built clouds like Open Science Grid and XSEDE.

Is there enough demand on campus today to develop in-house expertise managing complex application stacks? e.g. should staff help researchers write hadoop applications?

Technical issues include integration with local resources like storage, monitoring, or authentication. That’s easier if you extend the data center network to the cloud, but what about network latency and bandwidth? There are issues around private IP address space, software licensing models, HPC job scheduling, slow connections, billing. Dynamic provisioning of reproducible compute environments for researchers takes more than VMs. Are research computing staff prepared for a more DevOps mindset?

New green field deployments are easier than enhancing existing resources.

Researchers will need to understand cost optimization in the cloud if they’re doing large scale work. That may be a place where central IT can help consult.

AWS Educate Starter – less credits than Educate, but students don’t need a credit card.

Case Studies: Duke large scale research & instructional cloud computing

MOOC course (Managing Big Data with MySQL) that wanted to provide 10k students with access to a million row MySQL database. Ended up with over 50k students enrolled.

Architecting for the cloud: Plan to migrate the workload – cloud provider choice will change over time. Incremental scaling with building-block design. Plan for intermittent failures – during provisioning and runtime. Failure of one VM should not affect others.

Wrote a Ruby on Rails app that runs on premise that maps user to their assigned Docker container and redirect them to the proper container host/port. Docker containers running Jupyter notebooks. Read-only access to MySQL for students. Each VM runs 140 Jupyter notebook containers + 1 MySQL instance. In worst case scenario only 140 users can be affected by a runaway SQL query. Containers restarted once/day to clear sessions.

At this scale (50-60 servers) – 1-2% failure rates. Be prepared for provisioning anomalies. Putting Jupyter notebooks into git made it easy to distribute new versions as content was revised. Hit a peak of ~7400 concurrent users. Added a policy of reclaiming containers that had not been visited for 90 days.

Spring 2016 – $100k of Azure compute credits expiring June 30. Compute cluster had all the possible research software on all the nodes, through NFS mounts in the data center. To extend it to Azure have to put a VPN tunnel in private address space. Provision Centos linux VMs then make repeated Puppet runs to get things set up, then mount NFS over the tunnel. SLURM started seeing nodes fail and then come back to life. Needed deeper monitoring that knows more than just nodes being up or down. The default VPN link into Azure maxes out at 100-200 Mbps, so they throttle the Azure VMs at the OS level so they don’t do more than 10 Mbps each. They limit the number of VMs in an Azure subscription to 20 and run workloads that do more compute and less IO. Provisioned each VM at 16 core with 112 GB RAM. Started seeing failures because there were no more A11 nodes available in the Azure East data center – unclear if/when there will be more nodes there. Other regions add latency. Ended up $96k used in one month. 80 nodes (16 cores and 112 GB RAM) in 4 groups of 20 nodes in several data centers. VPN tunnel for each subscription group.

(One school putting their Peoplesoft HR system in the cloud.)

Stratos Efstathiadis – NYU

– Experiences from running Big Data and Machine Learning courses on public clouds – Grad courses provided by NYU departments and centers. Popula courses with large number of students requiring substatial computing resources (GPUs, Hadoop, Spark, etc).

They have substantial resources on premise. Scheduled tutorialson R,MapReduce, Hive, Spark etc. Consultations with faculty, work closely with TAs. Why cloud? Timing of resources, ability to separate resources (courses vs. research), access to specific computing architectures, students need to learn the cloud.

Need a systamatic approach; Use case: Deep Learning class from the Center of Data Science. 40 student teams that needed access to NVidia K80 GPU boards. Each team must have access to identical resources to compete. Instructors must be able to assign resources and control. Required 50 AWS g2.2xlarge instances. Issues: Discounts/vouchers are stated per student, not teams. Need to enforce usage caps at various levels so instructor-imposed caps are not exceeded. Daily email notifications to instructors, TAs and teams providing current costs and details. Students were charged for a full hour every time they spun up an instance. AWS costs were estimated ~ $65k per class. On-prem solution was $200k.

Use case: Spatial data repository for searching, discovering, previewing and downloading GIS spatial data.  First generation was locally hosted – difficult to update, not scalable, couldn’t collaborate with other institutions; lack of in-house expertise; no single sign on. Decided to go to the cloud.

Use case: HPC disaster recovery
Datasets were available a few days after Sandy, but where to analyze them? Worked with other institutions to get access to HPC, but challenges included copying large volumes of data and different user environments and configurations. Started using MIT’s Star (Software Tools for Academics and Researchers), could also use AWS cfnCluster. Set up a Globus endpoint on S3 to copy data. Software licensing is a challenge – e.g. Matlab. Worked things out with Mathworks. Currently they’re syncing environments between NY and Abu Dhabi campus, but they’re investigating the cloud – looking at star/cfnCluster approach, but also might do a container based approach with Docker.


CSG Spring 2016: Common higher-ed logical data models and/or APIs


Satwinder Singh, Columbia

  • How do you get from the spaghetti diagram to a better place?
  • Foundation – Understand the systems of record, form working group with subject experts; identify user stories which then get translated into high level data models. Leads to:
  • Canonical model – abstracts systems of record to a higher level: logical data entities, data dictionary, metadata repository, leads to:
  • Generic Institution Models – are there generic Institution models for academia? Leads to:
  • Specifications and Platform: Specifications (Schema standards, APIs, data formats, security). Platform – iPaas/On-Prem
  • Platform – IPaaS/On-Prem
    • API Gateway – how does it connect back to systems of record? Uses:
    • ESB – provides decoupling
  • API and Data Governance – Lifecycle (building hundreds of successful APIs requires process to decide and maintain them), intake process (look first to see if it already exists or can be extended from something that exists), portal; Data (Master Data Management) quality and stewards.
  • Agility – Impart higher velocity (need to decouple what (“I need this data from …” and how (“I want a file”); Buld/use accelerators (e.g. pre-built connectors, or re-use things you’ve built already – the only way you can show ROI is by reuse); Use templates (interface specification; data mapping)
  • CoE@Columbia – Integration Principle (see Columbia’s document)

Current Practices: Data models
Sarah Christen, Cornell – Survey results

  • 54% have created canonical data models at their institution – data warehouse, students, person data, APIs and reporting tools
  • 85% have data governance priorities
  • 77% using data dictionaries or business glossaries for metadata (people looking at Data Cookbook)


  • Went looking to build conceptual person model
  • ERP systems have models, but not consistent nor share
  • Formed working group which formed subcommittees
  • Some common attributes (name, DOB, etc, known-by)
  • Location (can include online presences)
  • Event participation
  • Demographics (visa status, gender, ethnicity)
  • There’s discussion of the relationship of this person model and the people registries that have been built up in the identity management space. Keith Hazelton notes that if it’s just identity it’s a subset, but if it’s used access control needs more of the extended data to make access decisions.

Mike Fary, Chicago. Student Model

  • Built up representation of data objects used in Campus and Student Life units – took eight months.
  • Produced a data model that was good enough for the integrator to work with in the student system implementation project.
  • Realized there were some areas that needed further definition
  • Created a second working group to create definitions of FT/PT students, joint/dual degree programs, leave of absence/withdrawals. Took another six months
  • Went back afterwards to get reaction from people who had participated. Overwhelmingly people are positive.
  • In some cases agreed to disagree, but at least that’s now documented.
  • Value of the process is in the conversations and the realizations that different parts of the institution view the same data differently. Will need to continue those conversations as further models are built.

Jonathan Saperia – Harvard – VIVO Data Ontology

  • VIVO is a good example of something we can look at that has already been done.
  • Started out trying to solve a specific problem -wasnt looking to define an ontology, but one came out of it.
  • VIVO is “an integrated view of the scholarly record of the University” Information about research and reseachers.
  • Used semantic web technology. Uses RDF
  • Started at Cornell in 2003 with an NIH grant in 2009.
  • Currently a DuraSpace project with members, supporters, and investors
  • More than 28 public implementations, 80 implementations in progress.
  • Use extends beyond research reporting, but it’s not easy.
  • Rather than fleshing out a description of a domain, they were driven by wanting to enable communication and collaboration in scholarship.
  • VIVO take-aways: Start small and build a case; Sort out what is public and not important; stewardship of the project is critical – must be owned at the highest level of the institution; clearly identify where the home of the record will be and who is responsible for it; identify practices of the university; change management and governance are important. Practices across the institution vary and matter in different ways
  • Domains don’t exist in isolation  – lots of relationships
  • Danger is that models will become shelfware – will they evolve? You want something that’s part of a regular process to keep it alive.
  • Why do we separate the actual data about the mission of the university (research output, learning analytics) from the data about the business (student system, research funding)?

Current Practices: APIs

Standards at Yale

  • encouraging consistency, maintainability, and best practices across applications
  • URIs
  • API description (OAI, RAML, etc)
  • Common language, data dictionary and domain-specific ontoloties (Data CookBook)
  • Connections and Reusability (schema,org, JSON-LD, knowledge sources
  • API Descriptions – RAML, Open API (formerly known as Swagger), API BluePrint
  • DataCookbook – Provide for central database that stores admin systems data definitions and report specifications – standards for defining your own campus terms;
  • schema.org – adopted by major search engines; allows documenting/tagging content on website. Used in SIRI and Knowledge Graph NLQA
  • Ontologies – Linked data world, very domain specific knowledge.
  • Plan for reusability

Aaron Bucher – Data Governance

  • Roles – Data custodians; Data stewards (owners); Subject matter experts; Content developers
  • Enterprise Data Management And Reporting groups: Steering committee (owners); Data governance; Analytics collaborative (superusers sharing best practices); reporting and data specialists.
  • Data governance council – charged with establishing and maintaining data definitions; Meet monthly, vote on what definitions to establish.

Mike Chapple – Notre Dame

  • Have about 500 data definitions
  • Top lesson learned: It’s not about technology. It’s about collaborations and relationships.
  • Put together a process to adopt definitions by unanimous consensus.
    • Identify terms to work on; then identify stakeholders and roles
    • Stakeholders have to attend workshop where definitions are built – start with a draft, not a blank page

University of Michigan Medical School Data Governance

  • Hired a data governance manager 1.5 years ago
  • My data/our data – everyone thinks their data is their own
  • Data metrics on a dashboard to show how the program is proceeding
  • Will be headed into dealing with research data soon – there’s data there that should persist at the institution if a faculty member leaves, and there are compliance issues.

Jenn Stringer – users want agency over what data is collected about them – how do we provide that?


Sarah – 

  • API Specs – 31% RAML, 46% Swagger
  • APIs are built with: REST 50%, both REST and SOAP 50%
  • Who are API consumers: 7%,
  • Do you have an API intake process? Yes 46% No 54%

Lech – Yale developer portal (https://developers.yale.edu/)

  • About six months old, after students scraped a lot of web sites and a New York Times article
  • Portal and architecture built on Layer 7 technology (now part of CA)
    • API Listings – several dozen with documentation
    • Resources –
    • Getting started documentation
    • Tools – API Explorer and source samples
    • Events and training – student developer and mentorship program
    • API Keys & OAUTH2 tokens.
  • Two types of APIs – Public APIs and Secured APIs (requires API key and OAuth login)
  • SOA Gateway – requires a technical person to make changes. Supports SSL/TLS. EULA Terms of Use (generic with a provision for amendment by specific data owners); WADL or RAML documentation; Yale NetID access, looking to add social identities.
  • Metrics: 31 APIs; 120 registered users (mostly students, some IT staff); 80 apps registered (in 8 months); Can see top applications, top orgs, performance, latency.
  • Found that students are using the APIs for senior level projects (important in terms of when things need to be up)
  • API Explorer – moving to use postman (https://www.getpostman.com/ )
  • Engagement – student developer & mentorship program, weekly training, ice cream social, YHack, CS50 lectures, Office hours.
  • They include some vendor APIs on their portal – bus location, dining, ZipCar.
  • One student app is used by 3-4k students – has lots of useful information for students – busses, laundry status, etc.

Kitty – the demand for learning analytics is going to drive the need to understand how to safeguard and treat types of data where governance has not matured. Yale has developed a provostial data governance group with the Library, faculty, etc. There’s a smaller group that deals with identity data.

Standards and Tools

REST, JSON, SCIM, Swagger, RAML – Keith Hazelton, Wisconsin Madison

  • Internet2 TIER and the API / Data Model Question – redrawing functional boundaries, refining the interfaces.
  • The one big gap in open source id management space is the entity registry space.
  • Goal is to enable IAM infrastructure mashups – (homegrown, TIER, commercial)
  • In TIER starting from the infrastructure layer – APIs being built for developers building TIER applications.
  • APIs and Data structures – SQL or noSQL? Can we achieve non-breaking extensibility in data objects?
  • Entity representations in event-driven messaging and push/pull REST APIs. How different should data look if it’s a message rather than a response? Should they look any different? When is one appropriate and not the other?
  • REST-ful beats REST-less for loose coupling: implementation language agnostic; better for user-centric design
  • JSON – is XML’s bad rap deserved or not? JSON adopting more of XMLs functions.
  • SCIM: A standard for cloud-era Provisioning: Design centered for provisioning into the cloud. TIER has settled on use of SCIM where it fits – resource types of user and group. Where it doesn’t, extend using SCIM mechanisms. Think twice before doing RPC-like things.
  • Representing APIs with Swagger 2 (now Open API) – decided it’s normative.


CSG Spring 2016: IT Innovation through Student Innovation

Julian Lombardi – Duke

Student life outside of classroom – integrated with lots of IoT technology. There’s an opportunity for universities to support new levels of engagement with technology. Can help lower barriers to innovation – create kinds of playgrounds. Foster innovation that is difficult to predict. Led by Academic and Media Technologies team.

Student – an autonomous, mobile, eating, and printing unit – Mark McCahill

Increasing emphasis on experiential learning on campuses.

Duke has an Innovation & Entrepreneurship program – foster innovation across the university. Lean heavily on the Entrepreneurship side – year long intensive programs.

Central IT’s role – serving students who want to tinker and explore, or who want to have an effect on IT Services. Innovation Co-Lab – help students create the next wave of technology for the Duke community.

Evan  Levine, Michael Faber –

Co-Lab established in spring of 2013.

Have a lot of students that know what they want to create. With today’s technology students can jump in and start right away?

How do you harness student energy? Pizza and t-shirts – cash is even better.

Inaugural Co-Lab Challenge

  • One project (called Hack Duke) created an API for institutional information (courses, directories, etc) by scraping sites.

Duke Mobile Challenge

  • Wanted people to develop for the DukeMobile app. Was too specific. Learned that students are capable of building to meet their needs if you give them the infrastructure.

Did a 3D printing challenge – a drill-mounted centrifuge was one project. Somebody wrote software that could scan a key that could be printed and worked in a lock.

Co-Lab Innovation Grants are what they use now. About 30 projects have come through since Fall ’14. Timelines fairly lengthy (> 1 semester). Amounts vary widely – some people don’t even need money, but want support and project management. Amounts range from $0-~$10k.

BioMetrix – Ivonna Demanyan, Gabby Levac

  • Develop a system to mobile monitor injury risks. They have an ACL tear prevention system – performs predictive analytics. Started with a Co-Lab grant. Now has a team of 7 full time (she’s a senior). Started with an Arduino strapped to her foot to see how her foot rolled. Been featured in lots of media. Mental Floss named them two of the most influential women, received Google entrepreneur award.

Space is a challenge. Currently have 33 3d printers. Have a lot of other physical computing equipment.

Co-Lab is going into new OIT engagement space. Will house research computing staff, media studio. Bring innovative students together with faculty and researchers.

FreeSpace – occupancy system for study rooms in the library. Use IR sensors to detect whether someone is in room, plus app to view data. Needed to incorporate data from the room reservation system. Projects are spanning gap across IT and innovation. Project didn’t work – implementation of that particular sensor didn’t work. How to connect students to institutional data? Needed a full stack software development resource –

  • Code Sharing (gitlab.duke.edu);
  • VMs (vm-manager.oit.duke.edu);
  • API service (apidocs.colab.duke.edu) – hub where student devs can come for keys and tokens and calls can be managed. Put together syndication server with node.js;
  • Local enterprise iOS app store – appstore.colab.duke.edu

Students were frustrated at the lack of an iOS print client, so they wrote one. Was popular enough to overload the Raspberry Pi it was running on – drove the vendor to develop a real app.

Mark McCahill

Big mismatch between classic IT organization and student innovation. Students want to get things done within the semester. Can’t take time planning and enterprise level reliability – here you can afford to fail. Thought they’d need a couple of VMs for student servers. Sysadmins hated the idea that students would have root access. Public IPs (in Duke namespace, but totally separate part of network); Ubuntu linux in a “RHEL shop”; patching; overcommitting VMs. Up to 22 or more images – will do almost anything that’s asked for. If it’s hacked it’s the students problem. Have about 350 VMs reserved, but not all are student machines – finding that faculty have timelines similar to students and are less concerned than IT about reliability. Ended up with early adopter faculty using it for workshops – really self-sufficient. Let the faculty build the image and then create it as a template.

Impedance matching with the registrar – students want access to data, but registrar is nervous about letting it loose. Ended up putting together OAuth infrastructure – have tokens brokered so students can opt-in to release their information to a single application. Students don’t like trying to Shibbolize their web apps – want something to work this semester. New stuff wants to use OAuth, so they’ve backed into supporting that. Finding that medical center is interested in OAuth support for mobile apps.

Approach this with the idea that everything you know is wrong will  allow you to provide support not just to student innovation projects, but a swath of faculty and researchers who typically are outside your central IT support envelope.

How can we identify and nurture this community?

  • 1:1 – Office hours – staffed by student employees as well as staff.
  • Many:many – Studio Nights. Creating community. Get together once a week, buy pizza.
  • One : many – up to now was focused on people who knew what they were doing. Roots program – the training arm of the collab. Taugh 32 courses, Linux, HTML/CSS, Javascript, Rails, iOS, Python, UX Design, Web Accessibility, git, 3D printing and modeling, Rapid Prototyping, Arduino / Photon. Some popular, some not.

John Board

Duke SmartHome – a dorm for 10 students. Solar panels, green roof, LEED Platinum, $2M-ish. 6000 square foot “live in laboratory”.  Was a student senior project. Had an all student project management team.

  • Lesson 1: No coupling to faculty research agendas – was a mistake not to have faculty at least peripherally involved early.
  • Lesson 2: Working with companies is exciting. Students cultivated sponsors. Enormous complexity in legal agreements. One management shakeup at the sponsor and they’re all gone.
  • Lesson 3: Institutional memory in student organizations is fragile. Constant churn in communication tools and sites – not sense of preservation of history.
  • Lesson 4A: “Safety? It’s ok, I’m immortal”
  • Lesson 4B: Dorms are special creatures under law.
  • Lesson 5A: Being a donation magnet is a mixed blessing.
  • Lesson 5B: Free, but with added lawyers!
  • Lesson 6: Predicting student acceptance is a dark art.
  • Lesson 7: Being an on-campus funding agency is great, but…
  • Lesson 8: The semester lasts forever (if you’re an undergrad). After first two weeks students are overcommitted.
  • Lesson 9: Matching freshman exuberance with senior wisdom.
  • Lesson 10: IoT Hell – it’s worse than you imagine. Highest density of device security issues on campus.
  • Lesson 11: Goals are good.