UC Berkeley spin-out Databricks can sort 100TB of data faster than anyone else.

Databricks, a big data spin-out of University of California, Berkeley (UC Berkeley), has achieved a significant breakthrough in big data sorting. The company achieved the results in a standardised benchmark test.

Databricks broke the record previously held by Yahoo when it managed to sort 100 terabytes – equivalent to one trillion records – within 23 minutes on 206 Amazon EC2 nodes. The result means that the technology, Spark, is three times faster on ten times fewer machines than Yahoo’s 72 minutes on 2100 nodes.

The company further pushed their technology and demonstrated one petabyte – equivalent to ten trillion records – being sorted on 190 machines in less than four hours. Yahoo’s technology needed 3,800 machines and 16 hours.

Databricks’ investors include New Enterprise Associates and Andreessen Horowitz, following a $14m series A round in a September 2013 and a $33m in a series B in June 2014.

Ion Stoica, chief executive at Databricks, described his company in June saying: “We built Databricks Cloud to enable the creation of end-to-end pipelines out of the box while supporting the full spectrum of Spark applications for enhanced and additional functionality. It was designed to appeal to a whole new class of users who will adopt big data now that many of the complexities of using it have been alleviated.”