Optimizing Databricks for a Leading Financial Services Company

 | 
March 23, 2025
Mar 24, 2025
 | 
Read time: 
5
 min
Optimizing Databricks for a Leading Financial Services Company

Databricks offers many data engineering benefits – extensive data streaming support and features, fine grain access control, data discoverability and reporting features, and many others that lead to fewer data silos and improved data governance. The biggest challenge for organizations when implementing Databricks solutions is making sure every dollar spent is optimized to take full advantage of the Databricks feature set – without breaking the bank. In order to achieve this, a depth of data engineering expertise combined with extensive communication across all stakeholders is essential.

On a recent project, a leading financial services company enlisted the expertise of 1904Labs to validate the capabilities of Databricks in streaming, unifying, and querying the data at the scale required by their operations. This initiative was crucial to ensure that Databricks could meet the high demands of their data processing and security needs while not exceeding budgets – especially when compared to alternatives such as an on premise Hadoop cluster.

With extensive Databricks expertise, the team quickly determined the most pressing challenges facing an effective rollout of the solution in order to mitigate the risk as quickly as possible. The top challenges were:

  • Security Requirements: Due to the airtight data security from regulations and internal standards, many of the newest Databricks features were off limits. This required an extensive set of alternative features that still met the technical requirements (streaming performance, query performance, data discoverability), while adhering to the security rules in place.
  • Technical Effort to Change Technologies: Between the learning curve for internal technical team members, expansive solutions required to move from the existing infrastructure to a new platform, and the constraints of budget balanced with technical performance, moving to Databricks is a massive initiative.
  • High Costs: With a powerful feature set and support comes a high cost to develop in the Databricks ecosystem. In this initiative, 1904Labs quickly determined where inefficiencies were likely to develop and mitigate those as quickly as possible.  

The solution was two pronged:  

  1. Identifying Databricks optimizations through an extensive set of automated tests:
  • This program runs at a regular interval (every night) to identify common inefficiencies in the Databricks platform:  
  • Databricks clusters exceeding the side required for their jobs.
  • Expensive clusters being used for jobs that don’t require their capacity.
  • Spark code inefficiencies that are leading to unnecessary spending.
  1. Performance testing the Databricks platform:
  • Develop a robust data generator that modeled production data with high accuracy – This required a multiple month analysis to fine tune the data to match the complex structures of the financial services company.
  • Create a performance testing framework that provided reliable coverage for all use cases under scrutiny while also building trust that demonstrated the strengths and weaknesses of the Databricks platform while adhering to technical and business constraints.

After these efforts, stakeholders had confidence that they could make decisions with as much information as possible. They knew specific costs for different services required for their performance benchmarks and potential tradeoffs (higher compute costs for faster data ingestion) they would need to make in order to reach certain business needs. While those tradeoffs are never easy to grapple with, conversations always moved closer to the north star vision since concrete numbers were provided at every step of the way.

Throughout your assessment of data services and frameworks, focusing on Databricks is a powerful option to meet the needs of many organizations. While it may be the perfect fit, there are many considerations when assessing how to achieve the metrics and functionality your team needs. If you have any questions or want to talk through the problems that are keeping your data teams up at night, we’d love to give our perspective to see if our expertise can get you closer to the solution you’re looking for.

Reach out to us if we’d be able to help.