The Challenge: Time-Consuming Data Analysis
A global telecommunications firm found itself analyzing multiple sets of data across the organization, which was time consuming and costly. The company needed to consolidate several major databases which contained disparate, complex data, resulting from the acquisition of several, smaller firms each of whom used a different system.
The Solution: A Unified Data Lake
The solution consisted of standing up a data lake, a storage repository that holds a vast amount of data. This data would be stored in its raw format, meaning it would look exactly the same way as it does in the source - whether the source of data is a file or a database. A proof-of-concept data lake was stood up in a development cloud environment with sample data to demonstrate how the architecture would function.
Our Process: HCDAgile
1904labs leveraged the Human-Centered Design practice to understand the reality of the client's world. The disparate data, various political verticals and lack of definition around their data science practice, yielded many opportunities for defining, aligning and developing a sustainable solution.
Outcomes: Enabling Data-Driven Decision Making
With the cloud being a few years out, a data lake was built on existing servers. Utilizing Nifi, 1904labs created processors that can accept data inputs from many types of sources and will automatically land data within the data lake, while running a reusable ingestion flow that standardizes some columns such as date, enables PII encryption and handles errors where/as needed. An initial user interface was created that allows a user to upload data from a file with the click of a few buttons and enables the user to monitor progress in the form of provenance data in Kibana and a custom-built data profiler. As data started populating in the lake, the team of data scientists began analyzing and understanding what more they need to become a successful organization for the company. With that, we continued to stay close to their findings and constantly planned for future capabilities. We built an efficient way for the data governance team to provide metadata (definition of a table and its columns) for the data sets. 1904labs automated the creation of a derived data set and maintain a raw data and clean data layer - allowing for the jump back to raw (source) data any time.
As the newly founded data science teams takes on its first few use cases, 1904labs will be there to continue to support their needs and work towards allowing them to quickly and efficiently take data, turn it into information, and with those insights, help drive business decisions and actions.
- Helped a large enterprise data science practice mature
- Rapid acquisition and integration of customer-related data across the organization
- Enabled faster delivery of insights to data customers
- Enabled the transition from reactive to proactive issue mitigation
- Efficient ingestion process allowing for rapid data availability from customer-facing functions
- Allow data scientists to access data quickly and efficiently enabling them to proactively draw insights that can save the business money/time/customer satisfaction.
“We began with the end in mind: helping our client become a data-driven enterprise. A fully automated, user interactive, transparent framework allows for easy access to data from multiple sources and facilitates critical decision-making.”
Chris Lundeberg, 1904labs