Hadoop or Apache Hadoop is an open-source software platform designed to deal with data intensive computations between dispersed computer nodes and applications. It employs commodity computing –where you use a myriad of available simple nodes to undertake parallel processing of a task, thereby reducing the latency period between task querying and response by the main node. Hadoop is composed of two component application systems:
First, The
Hadoop Distributed File System (HDFS): is a form of parallel-clustered file
system where a file system is concurrently mounted on a multiplicity of
servers, which allows simultaneous data processing, faster computation
performance and backup of data.
Second,
MapReduce: is computational framework that divides a task into smaller work
fragments to be executed by distributed computational processers. The principle
at action here is that a mammoth task divided into several smaller tasks, each
of which is assigned to a different computer –termed worker node, can be
accurately processed faster. The answer provided by the worker nodes is then
compiled into a finished task by the main node. The first part of the process
where the application is divided and spread is termed the Map function. While
the second part where finished smaller tasks are recombined into one finished
the job by the central node is termed the Reduce function.
The benefits
of Hadoop API are as follows:
· Reduced
computational processing times by providing a much faster way of handling data
processing in large data volume scenarios. The HDFS and MapReduce applications
allow the processing of large data sets in much faster time. This benefit is
seen in response times for internet search engines and the websites of large
online traders where a query is computed in a short time.
· It
provides redundancies for data and applications. By spreading an application
and the associated file systems it creates an environment where data and
application services are backed up. This prevents entire system failure in case
of an error in any one of the various nodes that make up the system.
· It
provides a basis for engineering data analytics especially on social media where
the data content created is enormous. Data analytics is a subject that
specializes in the analysis of generated data to provide user trends and
information specifically for business purposes.
· It
provides a formal structure for the organization, processing and manipulation
of large databases generated in any environment in an efficient manner.










