Databricks offers Spark 1.6 Preview, developers’ wish for an efficient memory usage finally fulfilled

Spark 1.6 is now fulfilling developers’ wish by making its memory usage more efficient. According to Java World, Apache Sparks’ earlier versions required users to plan in advance how to split the memory, as they used to “subdivide the memory into two partitions, one for data and one for execution.” The new version allows users to manage their memory way better since it allows both memory partitions (data and execution) to change their capacity as needed. “For many applications, this will mean a significant increase in available memory that can be used for operators such as joins and aggregations,” according to a Databricks blog post.

However, according to Java World, the current version still contains some drawbacks. “There are still some restrictions in the current implementation; while borrowed execution memory can be released if needed, borrowed storage memory is never released. For backward compatibility, 1.6 also includes a legacy, fixed-partition memory management mode”.

Spark's real-time data component, Spark Streaming which features state tracking API, shows that the new version is offering a better experience to users by updating various informations in real-time, which according to Databricks can improve the workload performance up to ten times. “The state tracking feature in Spark Streaming is used for problems like sessionization, where the information for a particular session is updated over time as events stream in. In Spark 1.6, the cost of maintaining this state scales with the number of new updates at any particular time, rather than the total size of state being tracked.”

Spark 1.6 will also feature a new Dataset API, a Spark DataFrame API extension that just like DataFrame uses the same runtime optimizer, but offers a better performance and supports static typing, the compile-time type checking and user functions that run directly on existing JVM types . Furthermore, Spark 6.1 API supports Pipeline persistence in Spark ML, offers new machine learning algorithms as well as R and Python APIs improvements.

Spark’s contributor Databricks is now offering a preview of Spark 1.6 so that anyone interested in what’s new with Apache Spark 1.6 can experience the latest version. If you want to try the new version you can do so by getting your own Spark 1.6 pre-release code directly from Apache, but Databricks suggests that until Apache Spark 1.6 is officially released, fully migrating onto the latest version is not the right thing to do, just yet.

Databricks offers Spark 1.6 Preview, developers’ wish for an efficient memory usage finally fulfilled

Silvae Technologies Ruse, Bulgaria

Silvae Technologies Brussels, Belgium