Waimak launch

Cox Automotive open sources new framework to simplify Apache Spark development

Cox Automotive is delighted to announce the release of Waimak, a new open-source framework that helps data teams specialise more efficiently. This new framework makes it easier to build, test and deploy complex data flows in Apache Spark, by abstracting away the more complex parts of Spark application development (such as orchestration) from the business logic.

According to Allison Nau, managing director of Cox Automotive Data Solutions, traditional approaches to data engineering tend to process large volumes of data in highly-dependent waves, meaning prior waves must finish before the next begin.

“This creates a problem on distributed big data systems as it leaves valuable resources sitting idle but locked,” commented Nau. “The more complex the flows of data, the worse the problem gets. And with an increasing number of interdependent data models and data flows, we were finding it took much longer to change data models, as the complexity had grown exponentially.”

The new open-source framework alleviates this problem by providing Spark functions that allow a complex data flow to be more easily broken-up into independent blocks within an application. These blocks are labelled to make reuse-without-repetition easier, and deployment to another environment, such as development or production, much simpler.

“Waimak has enabled us to automate much of the ‘rinse-and-repeat’ data pipeline and data model implementations, giving us time to focus on more critical data engineering tasks,” added Nau. “This makes collaboration between teams that use data, such as business intelligence and data science, and those that provide data, such as data engineering, less burdensome by encouraging compromise on a common set of Big Data tools.”

Data users give-up some freedom afforded by pure SQL interfaces to Hadoop, but gain ability to string together sets of data objects defined by Spark SQL and use more native Spark over time. Data engineers give-up some amount of ‘optimal computation’ but have the business logic (owned by those who understand it) on a platform where optimisation is easier to manage.

Waimak also helps organisations utilise compute resources to process more data, more cost effectively, by modularising steps in a data flow. This is a very powerful approach when combined with other tools that allow you to spin-up compute clusters dynamically on demand and then spin them down once processing has completed.

“Waimak allows us to maximise resource utilisation throughout a job’s lifecycle. Combined with tooling for on-demand clusters, we can free-up cloud resources quicker, reducing overall compute hours and therefore costs. We’ve also been able to significantly reduce the time it takes to go from prototype to production plus deliver maintainable production code faster. This therefore makes it easier for us to extract value from data in a cost-efficient manner, and we hope that other data teams will realise the same benefits and contribute to the framework,” concluded Nau.

Waimak is released under the Apache 2.0 open source license. You can find out more, download and contribute here.

Latest from Cox Automotive

17th June, 2019

Why relationship building is everyone’s responsibility!

In his latest blog, our CEO, Martin Forbes talks about a recent Q&A with Daksh Gupta, CEO of Marshall Motor Holdings, and the importance of building strong customer relationships.

Read more
9th June, 2019

CV Market Tracker - May 2019

The LCV wholesale market proved unpredictable in May as two bank holidays, and softer retail activity impacted buyer demand.

Read more
28th May, 2019

What and Where Is Van Nirvana?

As James Davis, Customer Insight and Strategy Director (Commercial Vans), travels from John O’Groats to Lands’ End in a plug-in hybrid van, read his latest blog post on his version of Van Nirvana.

Read more
23rd May, 2019

Modix appoints Darren Sinclair as Chief Customer Officer

Modix has strengthened its global leadership team by appointing Darren Sinclair as chief customer officer (CCO).

Read more
More articles