Apache Hadoop

Apache Hadoop is an open-source software library that supports the distributed processing of large data sets across computer clusters. The framework is designed to scale up from single servers to multiple machines, each offering local computation and storage. At the same time, all modules of Hadoop are designed with the basic assumption that hardware failures are common and should be automatically handled by the framework.

The core of the Hadoop framework consists of a storage path known as HDFS (Hadoop Distributed File System), and a processing part called MapReduce. Hadoop splits files into large blocks and distributes them across nodes in a cluster. In order to process the data, the packaged code is transferred by Hadoop to nodes, so that the packages are processed in parallel, according to the data that needs to be processed.


  • Highly scalable storage platform - can store and distribute large data sets across hundreds of parallel operated servers
  • Cost effective - is designed as a scale-out architecture that can store all data of a company for later use, at an affordable price
  • Flexible - enables businesses to easily access new data sources and tap into different types of data to generate value from it
  • Fast - the storage method is based on a distributed file system that basically "maps" data whenever it is located on a cluster
  • Resilient to failure - data sent to an individual node is also replicated to other nodes in the cluster, meaning that in the event of failure, there is another copy available for use


  • Security concerns - Hadoop is missing encryption at storage and network levels
  • Vulnerable - being written almost entirely in Java, it has been heavily exploited by cyber-criminals
  • Not fit for small data - due to its high capacity design, it lacks the ability to efficiently support the random reading of small files
  • General limitations - Hadoop misses the ability to improve the efficiency and reliability on data collection, aggregation and integration


  • Hadoop Common
  • HDFS
  • MapReduce
  • YARN

Development tools

  • Hadoop Development Tools (HDT) plugin
  • Eclipse IDE
  • HUE web interface
  • Azkaban
  • Hortonworks Sandbox


  • Version 0.20
  • Version 0.22
  • Version 1.0.0
  • Version 1.2.0
  • Version 2.0.0
  • Version 2.2.0
  • Version 2.5.0
  • Version 2.6.0
  • Version 2.7.0


Have a project in mind?

Get in touch with us for your Hadoop development needs!



Recent posts on our blog
New alternatives for mainstream server-side programming
Aug 30, 2018, by Cristian
We take a look at the advantages of 5 up and coming programming languages, namely Ruby, Node.js, Go, Scala, and Rust.... read more
Nginx - top contender in the web server game
May 17, 2018, by Dragos
Nginx is a well-known open-source web server that has undergone a rapid growth in popularity. Even at this point it looks like the growth is sustainable, so this software will keep on catching up with both Apache and IIS. Additionally, more and more web applications (websites), especially those with a high traffic, are switching to Nginx.... read more
Top 5 Best Visual Studio Extensions
Mar 28, 2018, by Cristian
One of the strengths of Visual Studio is represented by the powerful customization options (both design and functionality). ... read more
Press  |  Site Map  |  Technologies  |  Terms of Use  |  Privacy Policy
© 2019 SBP Romania. All rights reserved.