Cloud Forum 2016 – Cornell’s BI move to the cloud

Jeff Christen – Cornell

Source Systems – PeopleSoft, Kuali, WOrkday, Longview. Dimensional data marts: finance, student, contributor relations, research admin. BI Tools – OBIEE and Tableau

They do data replication and staging of data for the warehouses. Nightly eplication to stage -> ETL -> Data Marts

Why replication/stage? Consistent view of data for ETL processing, protects production source systems; tuning for ETL performance.

Started journey to cloud 2 years ago. Were using Oracle streams – high maintenance, but met some needs. Oracle purchased a more robust tool and de-supported Streams. ETL tools challenge – were using Cognos Data Manager for 90% of their work, but IBM didn’t continue to support it. Replaced it with WhereScape RED, but requires rewriting jobs.  Apps were already moving off-premise. WorkDay for HR/Payroll, PeopleSoft to AT&T hosting; Kuali financials moving to AWS. Launched pilot project to answer “what would it take to run data warehouse environment in AWS?”

Small pilot – Kuali warehouse in AWS. Which existing tools will work? Desire to use AWS services such as RDS where possible; Testing of both user query performance and ETL performance.

Why Oracle RDS and not Redshift? Approximately 80% of the Kuali DW is operational reporting. Needs fine-grained security at the database level; A lot of PL/SQL in the current environment; Currently exploring Redshift for non-sensitive high volume data

Some re-architecting: Oracle Streams not supported with Oracle RDS (used Attunity). Oracle Enterprise Manager scheduler not supported with Oracle RDS – using Jenkins (so beautiful and simple); No access to OS on RDS databases – installed Data Manager on separate Linux EC2 instance; Using WhereScape to call Data Manager from the RDS database.

Need to be more efficient. On premise the KDW had two physical servers. Found some inefficiencies in ETL code and some dashboard queries were masked by large servers. Prioritization of ETL code conversion by long running areas helped get AWS within nightly batch window. Some updates made to dashboards to improve performance or offer better filter options. Hired database tuning consultant (2wk) to help with Oracle tuning.

Testing and User Perception. Started with internal unit testing. Internal query execution time comparisons between on premise and AWS. User testing of dashboards on premise versus AWS. Repoint of production OBIEE financial dashboards to AWS for a day (x3). Some queries came back faster, some slower. Went through optimization and tuning to get it comparable across the board.

Cutover to AWS. Cutover Sept. 8. Redirected all non-OBIEE ODBC client traffic in October. Agreed to keep the on premise KDW loading in parallel for two month end closings as a fall back.

Next Steps. Parallel Research Admin Mart already in AWS – expect cutover by end of CY. Need more progress on ETL conversion before moving student and contributor marts. Continue Big Data / non-traditional data investigation (Cloudera on AWS). Redshift for large non-sensitive data sets.

Lessons learned: Off premise hosting does not equal Cloud technology. Often hard to get data out of SaaS apps.


CSG Spring 2015 – The Data Driven University, part 2

Tom Lewis, Washington

Who are the traditional players? Institutional Research; Office of Educational Assessment; Data Warehouse Team (do good work, saw their client as being Finance).

Modern players & practices – Sources of Change: From Above (President, Provost, VPs, AVPs, Chancellors); From the middle (Deans, chairs, heads of admin units (especially those focused on undergrads); From below (staff doing work, faculty); From the outside (BI and analytics vendors).

Becoming Modern –

Course Demand Dashboards – Notify.uw. Enterprising students screen scraping registration system for notifying about openings in courses, charging other students. So built notify.uw – can notify when openings occur in class via email or SMS. Almost 25k subscribers. What else can be done with the data? Understanding course demand: Notify.UW knows what classes students want; student system knows about course offerings and utilization of capacity. Mashed them up to see where demand exceeded capacity.

The Cool stuff: Central IT BA’s and engineers pulled in a like minded colleague from the DW to do innovation work with data. Provost, deans, and chairs got excited; built out dashboards using Tableau.

The Great Civitas Pilot – Why Student Success Analytics? People don’t understand much about their students, when to do interventions, longtitudinal views of program efficacy and impacts. Tried to use Civitas – take data from student system, LMS, and data warehouse. Illume: Analyze key institution metrics, starting with persistence; view historical results and predictions of future. Inspire for Advisors

The Cool stuff: Admin heads looked to IT to help solve problem because of success of course dashboard. Faculty, teaching and program support staff are eager to get started.

Show Me the Data!

Assessment folks didn’t understand the value of giving access to data that hasn’t been analyzed. IT team interviewed people for data needs, then involved assessment people in building dashboards with Tableau to realize those needs.

Data Warehouse folks have gotten the religion – look at the UW Data & Analytics page.

Central IT is the instigator and change agent, but needs BAs with deep data analysis skills.

We all need to be hiring data scientists with deep curiosity – can’t keep having technical folks with answers of it takes too long to go through the data. Should partner with existing data science centers on campus. If we’re really going to data-driven universities IT will be at the center – we touch all the parts of the institution, we have the tools, and we know more about how data interacts.

Mark Chiang – UC Berkeley

Used to have to go to separate offices to get data, mash up into spreadsheets, do pivot tables, for every request.

Data Warehouse: Cal Answers – Students (applicants, curriculum, demographics, financials); Alumni; Finance; Research; HR; Facilities.

Built out high level dashboard for deans and chairs – answer questions about curricula. Enrollments, Offerings, instructor data, etc.  Facilitates discussions between deans and faculty and administrators. Effort was driven by CFO. Makes job much easier. Added substantial additional investment.

Can build out prototypes in a couple of weeks on top of live data to prove concepts before building the real enterprise work.


Will the data warehouse look significantly different in a few years? We don’t do a good job of understanding the way data security needs to change as data ages. There’s a place to incorporate new types of data like sentiment analysis on social media. Instructure is working on making Canvas data available via AWS Redshift. Much of the new thinking and activity about data is not coming from the traditional BI/DW teams, but those folks are more willing to partner now than they used to be.

CSG Spring 2015 – The Data-Driven University – part 1

DKelly Doney – Changing the Conversation at Georgetown

Getting lots of questions around data not collected in traditional ERP – how many times did you visit your advisor? What volunteer opportunities did you do? Who was your favorite professor?

Advancement needs to follow alumni every step of the way.

Provost asking question – process efficiency, quality of instruction, but also outcomes – what happens to graduates in first five years and beyond, relating those data back to experiences on campus.

Vice Provost for education sponsoring an effort – wants to measure cultural impact of Georgetown on students: learning to learn, well-being, empathy, etc. Creating embedded cultural practices to track that.

Using Enterprise BI + CRM for data analysis

Trying go break down silos of data ownership. Workday enabled some of this as shadow system owners realized they weren’t getting feeds from the new system. Went live with Finance and Student data warehouse this year.

Been partnering with Advancement to bring enterprise CRM to campus. Need to think about other sources too. Just finished first part of playbook project with Deloitte and Salesforce to create a playbook for higher ed institutions that want to take a look at CRM at an enterprise level. Talked to 20 different offices, identified 150 use cases for CRM. Have a high level Salesforce object model. Going to take on a pilot.  Needs to be refined by the community.

Phase 1 – Advancement and Requirements. Phase 2: Advancement and CRM Core. Future phases: CRM and larger engagement.

Salesforce licensing model is cost prohibitive for higher education – they’ve agreed to come to the table to discuss this.

User community always asks for lots of control and flexibility in reporting, but doesn’t make time to learn tools.

Debbie Fulton – VA Tech – Role of BI tool at VT

It’s not how you get there… unless you can’t get there. The perfect BI tool is not the goal and will not create a data-driven university. But if you have no viable tool, your goals may be unattainable.

VT’s journey – Any tool will do (almost). Needed to figure out what mattered to VT. They had Brio since the early 2000s, had a lot of limitations. Licensing, required desktop installation, browser problems, etc. Had a lot of standardized reports that required developers to create. Put out a RFP.

Was important that sponsors realized that getting a tool did not create the data-driven university. Brought in EAB to make recommendations on creating the data-driven university which added credibility.

Goals: Replace soon-to-be obsolete technology; leverage data warehouse (didn’t want to rebuild); position VT for future (unstructured data, mobile access, diversity of data sources); Address issues with current environment (inconsistent distribution and management of information; report development cycle is lengthy and process varies; lack of modern presentation and analytical functionality; inadequate licensing of legacy tools and product obsolescence).

RFP Requirements: Pixel Perfect Enterprise Reporting (not just SQR reports); Ad hoc reporting; analytics, visualization, and predictive modeling; scheduling and distribution; dashboards; mobile implementation; common data model (virtual data model, supporting a common data model regardless of reporting tool used).

Two vendors supported the data model concept: Attivio (search based), and denodo (which actually builds a data model). Both add a layer complexity that would’ve added to the timeline, and expensive. MicroStrategy added ability to build model that other tools could look at. That layer isn’t as robust as the dedicated tools, but was good enough.

Purchased Microstrategy.

Benefits realized and next steps: Site license for Microstrategy including admin and academic usage; have a tool with full functionality to support BIT; opportunity to jumpstart BI dialogue – questions have changed beyond complaining about lack of good tools; BI sponsorship and steering committee; data governance – beyond data stewards; BI leadership and evangelism.

Questions for consideration in achieving a data-driven university: How do we progress with all aspects of a BI implementation (data governance, evangelism, anlytics, etc.) that need to come together? Where does IT fit? could we learn from the evolution of learning systems for how we might create data analytics services, partnerships, and direction between IT and the university?

Business Intelligence Pain Points – Todd HIll, Notre Dame

Finding and acquiring BI talent – can’t pay what industry does. Some places use staff who were gradate assistants. Some use offshore resources, but that presents some challenges. 1 excellent BI person is worth 3 mediocre ones – invest wisely. Build your own BI skills internally. Develop BI competency center.

Tools – Notre Dame historically used Business Objects, but now moving towards Microsoft stack + Tableau. Found that over half of what they built didn’t get used, so needed to change the model. Build Personal BI, Team BI, Enterprise BI. Find what works in a less costly way before moving up the maturity level. Can’t go right from zero to enterprise. 1 month personal BI solutions – 1-2 customers, non refreshing data. Then add data governance, build for the team, then after that build in security at an enterprise level.

Assessment Framework: How well do your customers know what they want? How clean is the data? How clearly defined are the data elements’ How well understood ae data access and security; How technically savvy are your customers?

Create a data steward position; involve constituencies, show a RACI matrix; publish data definitions – BI portal. Notre Dame has a data governance seal of approval for data that’s been defined by the process.

Addressing Organizational Silos – co-locate when possible to promote teaming; have cross-departmental user stories; use sponsors to clear organizational silos. Deans are asking for dashboards that cross those silos – e.g. research, finance, HR.

Sometimes you can take advantage of new ERP implementations to change the model of (for example) data access.

Addressing BI Project Demand – Agile methodologies can help. Partner with app development teams; partner with tech savvy customers; build BI competency center.