Jeff Christen – Cornell
Source Systems – PeopleSoft, Kuali, WOrkday, Longview. Dimensional data marts: finance, student, contributor relations, research admin. BI Tools – OBIEE and Tableau
They do data replication and staging of data for the warehouses. Nightly eplication to stage -> ETL -> Data Marts
Why replication/stage? Consistent view of data for ETL processing, protects production source systems; tuning for ETL performance.
Started journey to cloud 2 years ago. Were using Oracle streams – high maintenance, but met some needs. Oracle purchased a more robust tool and de-supported Streams. ETL tools challenge – were using Cognos Data Manager for 90% of their work, but IBM didn’t continue to support it. Replaced it with WhereScape RED, but requires rewriting jobs. Apps were already moving off-premise. WorkDay for HR/Payroll, PeopleSoft to AT&T hosting; Kuali financials moving to AWS. Launched pilot project to answer “what would it take to run data warehouse environment in AWS?”
Small pilot – Kuali warehouse in AWS. Which existing tools will work? Desire to use AWS services such as RDS where possible; Testing of both user query performance and ETL performance.
Why Oracle RDS and not Redshift? Approximately 80% of the Kuali DW is operational reporting. Needs fine-grained security at the database level; A lot of PL/SQL in the current environment; Currently exploring Redshift for non-sensitive high volume data
Some re-architecting: Oracle Streams not supported with Oracle RDS (used Attunity). Oracle Enterprise Manager scheduler not supported with Oracle RDS – using Jenkins (so beautiful and simple); No access to OS on RDS databases – installed Data Manager on separate Linux EC2 instance; Using WhereScape to call Data Manager from the RDS database.
Need to be more efficient. On premise the KDW had two physical servers. Found some inefficiencies in ETL code and some dashboard queries were masked by large servers. Prioritization of ETL code conversion by long running areas helped get AWS within nightly batch window. Some updates made to dashboards to improve performance or offer better filter options. Hired database tuning consultant (2wk) to help with Oracle tuning.
Testing and User Perception. Started with internal unit testing. Internal query execution time comparisons between on premise and AWS. User testing of dashboards on premise versus AWS. Repoint of production OBIEE financial dashboards to AWS for a day (x3). Some queries came back faster, some slower. Went through optimization and tuning to get it comparable across the board.
Cutover to AWS. Cutover Sept. 8. Redirected all non-OBIEE ODBC client traffic in October. Agreed to keep the on premise KDW loading in parallel for two month end closings as a fall back.
Next Steps. Parallel Research Admin Mart already in AWS – expect cutover by end of CY. Need more progress on ETL conversion before moving student and contributor marts. Continue Big Data / non-traditional data investigation (Cloudera on AWS). Redshift for large non-sensitive data sets.
Lessons learned: Off premise hosting does not equal Cloud technology. Often hard to get data out of SaaS apps.