Case Study
Migration of data warehouse to AWS
For a leader in advanced technology provider in the oil and gas industry
PROBLEM
Client faced the following issues
- The client faced huge license costs of ETL and DW platforms
- On premise infrastructure not scaling up leading to job overruns and missed SLAs
- Adding new data sources into the DW ecosystem was a painful process
SOLUTION
UST implemented the following to address the business issues:
- Data extracts from multiple source using an Open Source ETL Tool Pentaho flows on to AWS S3 buckets
- AWS Data Pipeline Service triggers AWS EMR jobs to do data cleansing, ETL steps and to apply business logic
- For sources that do not support Change Data Capture, a full extract with a de-duplication logic is applied in EMR jobs
- At the trail end of the Data Pipeline, the data is loaded onto Amazon Redshift
- Data from all source systems are transformed into an intermediary generic data structure before loading to Redshift
IMPACT
- Enabled the client to start small scale and then expand to a multi-node cluster based on demand, using Amazon Redshift, a highly scalable Massively Parallel Processing (MPP) Architecture.
- Reduced license costs on expensive proprietary software, through extensive use of open source tools.
- Concurrent processing of large data volumes, increasing the throughput of the data load using AWS EMR.
- A generic intermediary data structure (domain model) enabled the organization to add any new source systems with minimal effort and less cost.
- High availability, fault tolerance and resilience using AWS EMR, Redshift and S3
Resources
Learn how UST and AWS offer proven, transformational value