Spark


Apache Spark is an open-source computing framework. It was originally developed at the University of California, Berkeley's AMPLab in 2009 and donated to the Apache Software Foundation. It's part of a greater set of tools, along with Apache Hadoop and other open-source resources, which are used in today’s analytics community.

Advantages

  • Spark has Lighting Fast Processing - Spark enables applications in Hadoop clusters to run up to 100x faster in memory, and 10x faster even when running on disk
  • Support for Sophisticated Analytics - Spark supports SQL queries, streaming data, complex analytics such as graph algorithms out of the box, and machine learning. Users can combine all these capabilities in a single workflow
  • Real time Stream Processing
  • Ability to Integrate with Hadoop and Existing HadoopData
  • Active and expanding community

Disadvantages

  • Data arriving out of time order is a problem for batch-based processing
  • Batch length restricts Window-based analytics - Data is often of poor quality, some records might be missing and streams can arrive with data out of time order
  • It offers limited performance per server according to stream processing standards these days. It scales out large numbers of servers to gain overall system performance
  • Writing stream processing operations from scratch is not easy - Spark streaming offers limited binaries of stream functions

Components

  • Cluster Manager Types:

                   - Standalone: Cluster manager included in order to set up a cluster easier
                   - Apache Mesos: A cluster manager that can run server applications
                   - Hadoop YARN: resurce manager in Hadoop 2.0

  • Shipping code to the cluster – adding dynamically new files to be sent to executors
  • Monitoring – offers information about running executors and tasks
  • Job Scheduling – Control over resource allocation both on across and within applications is permitted

Development tools

  • IntelliJ
  • Eclipse

Versions

  • 0.x
  • 1.x
  • 2.0

 


Have a project in mind?

Get in touch with us for your Spark development needs!

 

 



Recent posts on our blog
Nginx - top contender in the web server game
May 17, 2018, by Dragos
Nginx is a well-known open-source web server that has undergone a rapid growth in popularity. Even at this point it looks like the growth is sustainable, so this software will keep on catching up with both Apache and IIS. Additionally, more and more web applications (websites), especially those with a high traffic, are switching to Nginx.... read more
Top 5 Best Visual Studio Extensions
Mar 28, 2018, by Cristian
One of the strengths of Visual Studio is represented by the powerful customization options (both design and functionality). ... read more
Tips on how to choose the right outsourcing partner
Jan 11, 2018, by Cristian
More and more companies choose to outsource their software development operations to service providers in order to reduce costs and shorten production times.... read more
Press  |  Site Map  |  Technologies  |  Terms of Use  |  Privacy Policy
© 2018 SBP Romania. All rights reserved.