Apache Spark and Java 8 helping developers with machine learning and Big Data applications.

According to a survey by Typesafe, two-thirds of developers had already switched to Java 8 or were planning to switch soon as of October 2014. Nowadays, Java 8 is becoming more and more popular. As a matter of fact, Java is the number one most popular programming language at the moment as of TIOBE index for November 2015.

According to an article Joshua T. Fox, wrote for datanami.com on December 11, 2014, Apache Spark and Java 8 would be the ‘Big Data team for 2015.’ Hadoop which was designed over 10 years ago for search-engine indexes, “quickly emerged as the leading computing engine for Big Data more generally” and “set the standard for batch-processing unstructured documents,” Fox wrote. Due to the lack of a few essential services, which developers had to create themselves every time they used an application, Hadoop in a way ‘forced’ or created the need for open-source projects that could complement it. Fox said that all Hadoop services were ‘a cumbersome mess.’ and that ‘It’s so hard to set up that developers often download a pre-configured virtual machine that bundles the open-source modules, resulting in a heavyweight and inflexible system.’

On the contrary, Spark compared to Hadoop, is more convenient and more powerful. “Spark moves around data with a convenient abstraction called Resilient Distributed Datasets (RDDs), which are transparently pulled into memory on-demand from a variety of data stores such as the Hadoop Distributed File System (HDFS), NoSQL databases such as Cassandra, and more,” continued Fox. By using the RDDs, Spark “can execute sophisticated streaming and other parallel processing tasks behind the scenes on behalf of application developers,” as well as “run multistep algorithms in memory,” which for most developers is way more convenient and efficient compared to Hadoop.

Furthermore, Spark is easier to work with and requires less effort from developers since it offers “a simple programming API with powerful idioms for common data processing tasks that require less coding.” For each step, including basic tasks, Hadoop requires a lot of work, several Java classes, and repetitive boilerplate code. “In Spark, developers simply chain functions to filter, sort, transform the data, abstracting away many of the low-level details and allowing the underlying infrastructure to optimize behind the scenes. Spark’s abstractions reduce code size in an average Java application by about 30% as compared to Hadoop, resulting in shorter development times and more maintainable codebases,” Fox added.

Since Spark is written in Scala language, its data processing idioms are able to skip processing huge data and “ build sophisticated processing flows from basic building blocks,” Moreover, since Java 8 supports the functional style cleanly and directly thanks to “lambdas” addition, the gap between Java and Scala for developing applications on Spark becomes even smaller, while developers with no experience in Big Data can now use Spark and Java 8 to develop machine learning and other Big Data applications.

(Picture Source: LinkedIn)

Apache Spark and Java 8 helping developers with machine learning and Big Data applications.

Silvae Technologies Ruse, Bulgaria

Silvae Technologies Brussels, Belgium