Waimak launch

Cox Automotive open sources new framework to simplify Apache Spark development

Cox Automotive is delighted to announce the release of Waimak, a new open-source framework that helps data teams specialise more efficiently. This new framework makes it easier to build, test and deploy complex data flows in Apache Spark, by abstracting away the more complex parts of Spark application development (such as orchestration) from the business logic.

According to Allison Nau, managing director of Cox Automotive Data Solutions, traditional approaches to data engineering tend to process large volumes of data in highly-dependent waves, meaning prior waves must finish before the next begin.

“This creates a problem on distributed big data systems as it leaves valuable resources sitting idle but locked,” commented Nau. “The more complex the flows of data, the worse the problem gets. And with an increasing number of interdependent data models and data flows, we were finding it took much longer to change data models, as the complexity had grown exponentially.”

The new open-source framework alleviates this problem by providing Spark functions that allow a complex data flow to be more easily broken-up into independent blocks within an application. These blocks are labelled to make reuse-without-repetition easier, and deployment to another environment, such as development or production, much simpler.

“Waimak has enabled us to automate much of the ‘rinse-and-repeat’ data pipeline and data model implementations, giving us time to focus on more critical data engineering tasks,” added Nau. “This makes collaboration between teams that use data, such as business intelligence and data science, and those that provide data, such as data engineering, less burdensome by encouraging compromise on a common set of Big Data tools.”

Data users give-up some freedom afforded by pure SQL interfaces to Hadoop, but gain ability to string together sets of data objects defined by Spark SQL and use more native Spark over time. Data engineers give-up some amount of ‘optimal computation’ but have the business logic (owned by those who understand it) on a platform where optimisation is easier to manage.

Waimak also helps organisations utilise compute resources to process more data, more cost effectively, by modularising steps in a data flow. This is a very powerful approach when combined with other tools that allow you to spin-up compute clusters dynamically on demand and then spin them down once processing has completed.

“Waimak allows us to maximise resource utilisation throughout a job’s lifecycle. Combined with tooling for on-demand clusters, we can free-up cloud resources quicker, reducing overall compute hours and therefore costs. We’ve also been able to significantly reduce the time it takes to go from prototype to production plus deliver maintainable production code faster. This therefore makes it easier for us to extract value from data in a cost-efficient manner, and we hope that other data teams will realise the same benefits and contribute to the framework,” concluded Nau.

Waimak is released under the Apache 2.0 open source license. You can find out more, download and contribute here.

Latest from Cox Automotive

12th April, 2019

Strong LCV volumes at Manheim despite Brexit uncertainty

Manheim, the UK’s number one CV auction company, recorded strong results for March, with volume up 4% year-on-year.

Read more
11th April, 2019

Market Tracker - March 2019

Part-ex volumes increased by 10% year-on-year at Manheim in March, but the Modix sentiment survey revealed that dealers saw a fall in demand when compared to March 2018.

Read more
25th March, 2019

Cox Automotive saves 8 million litres of water, and counting…

Cox Automotive has helped to reduce water wastage with an innovative water recirculation system at its Manheim Gloucester, Colchester and Leeds centres.

Read more
18th March, 2019

Why automotive has some catching up to do when it comes to diversity

Martin Forbes gives his take on the importance of diversity, and why automotive has some catching up to do.

Read more
More articles