Introduction to our Client
They are more than just a regular mobile payment app on the App Store and Play Store. It is proudly Malaysia’s homegrown lifestyle e-wallet app that is both secure and rewarding. Their mission is to provide everyone with a worry-free cashless mobile payment experience all-day, every day.
Our client was keen on Building Data lake on AWS S3 which will help them consolidate all their Mongo Sources at one central location, which will in turn help them do Advanced Analytics on their Customer Data. Note this project involved ingesting PII Data to AWS S3
1. Study was conducted on their 150 + Mongo Documents
2. Ingestion Server was set up on AWS
3. Glue Local End point was dockerised and was deployed on Ingestion server
4. Wrote a Custom SSL Python JDBC Connector on Glue Local DEV Endpoint to connect to Mongo
5. AWS DMS was configured to pull data from Mongo Servers
6. Data from Mongo in Deeply Nested JSON forms was pulled to AWS S3 data lake
7. Data was parsed to CSV using AWS GLUE out of box Spark code was used + Custom code on GLUE was developed with parsing logic
8. Data Was catalogued using AWS GLUE
9. External Hive tables were created using GLUE
10. These External Tables were then Exposed to Athena for Ad-Hoc Analysis
11. Data model will be designed, and the Data will be Ingested to Postgres RDS
12. ELT’d data will then be connected to Sagemaker to perform Descriptive and Prescriptive Analytics -- In Pipeline
13. Lake formation is used to control Athena Col level access as we are ingesting PII data
14. Custom One side Hashing Algorithm is built on GLUE using Python
3. AWS RDS Postgress
4. AWS DMS
AspireNXT helped our client take advantage of Redshift, that makes it simple and cost effective to run high performance queries on petabytes of structured data so that they could build powerful reports and dashboards using their existing business intelligence tools. Bringing together structured data from your data warehouse and semi-structured data such as application logs from your S3 data lake to get real-time operational insights on your applications and systems.
As the company continues to grow, it takes advantage of Amazon Redshift and Amazon EMR to run complex queries on large and growing data sets with improved performance.