Spark


Apache Spark is an open-source computing framework. It was originally developed at the University of California, Berkeley's AMPLab in 2009 and donated to the Apache Software Foundation. It's part of a greater set of tools, along with Apache Hadoop and other open-source resources, which are used in today’s analytics community.

Advantages

  • Spark has Lighting Fast Processing - Spark enables applications in Hadoop clusters to run up to 100x faster in memory, and 10x faster even when running on disk
  • Support for Sophisticated Analytics - Spark supports SQL queries, streaming data, complex analytics such as graph algorithms out of the box, and machine learning. Users can combine all these capabilities in a single workflow
  • Real time Stream Processing
  • Ability to Integrate with Hadoop and Existing HadoopData
  • Active and expanding community

Disadvantages

  • Data arriving out of time order is a problem for batch-based processing
  • Batch length restricts Window-based analytics - Data is often of poor quality, some records might be missing and streams can arrive with data out of time order
  • It offers limited performance per server according to stream processing standards these days. It scales out large numbers of servers to gain overall system performance
  • Writing stream processing operations from scratch is not easy - Spark streaming offers limited binaries of stream functions

Components

  • Cluster Manager Types:

                   - Standalone: Cluster manager included in order to set up a cluster easier
                   - Apache Mesos: A cluster manager that can run server applications
                   - Hadoop YARN: resurce manager in Hadoop 2.0

  • Shipping code to the cluster – adding dynamically new files to be sent to executors
  • Monitoring – offers information about running executors and tasks
  • Job Scheduling – Control over resource allocation both on across and within applications is permitted

Development tools

  • IntelliJ
  • Eclipse

Versions

  • 0.x
  • 1.x
  • 2.0

 


Have a project in mind?

Get in touch with us for your Spark development needs!