How do companies reach a point where their data – once manageable and visualizable – becomes completely chaotic and cumbersome?
It all starts with the best of intentions. As a decision maker at your company, you’ve spent years watching your company successfully grow into its current state, collecting data along the way as part of its natural operations. As the company grew, teams began to diverge on the types of technology they used. Eventually, the company ended up with several different flavors of SQL databases floating around – a MongoDB instance that one team swears by, and a distributed database like Cassandra that another team pushed for.
You long for the days of old when your data was a manageable size you could export, pop into excel, and make some simple charts. Now the data is too large to even fit on your machine, let alone be in a format Excel can handle.
It’s natural to utilize the best tools for the job for operational purposes. Often, however, analyzing and visualizing your data are secondary thoughts. If you don’t intentionally consolidate and clean your data sources for reporting purposes, your analysts and data scientists will spend a frustratingly large amount of time simply getting the data in a workable state. This prevents you from getting timely answers to simple questions about your company – or worse, it gives you inaccurate answers.
It’s for these reasons that the Data Engineering practice at 1904labs has created a Data Discovery and Analysis practice area, with direct experience in:
- Consolidating different data sources into a common data lake using Hortonworks and Cloudera Hadoop distributions.
- Accessing and transforming a wide variety of data for insertion into the data lake using Apache Nifi and Spark.
- Developing visualizations using a wide selection of tools including Tableau, Qlikview, Spotfire, Apache Nifi, Spark, and Superset.
We’re able to work with your organization to determine exactly what uses you want to get out of your data, and to work to consolidate and reshape your data into the best format for connecting to analytical tools or visualization platforms.
For instance, we’ve helped a client who wanted to build a complicated visualization using Tableau. They knew they had the data to feed the visualization, but the problem was it required a substantial amount of reshaping to get it in the appropriate format for that visualization. Using Apache Nifi, we were able to create specific tables in Hive that the client could use to connect to the data directly from Tableau.
Consolidating your data in one place is the first step toward enabling your company to effectively analyze and visualize your data, regardless of how fractured it currently is. The Data Discovery and Analysis practice area at 1904labs can help make this process easier by making data manageable again.