Book recommendation: Hadoop: The Definitive Guide

Hadoop: The Definitive Guide  (Fourth Edition)
by Tom White

   Hadoop provides a storage and analysis platform for big data, and it is widely used in the industry nowadays. Hadoop: The Definitive Guide is a good reference to understand Hadoop technology and learn how to use it. The book explains the Hadoop concept, how to use it, operations, and other tools that work with it. Hadoop installation, filesystem operation with Java interface are described. Since Hadoop was written in Java, libraries can be easily applied to write programs in Java.     

   MapReduce and YARN are key technologies that are used in Hadoop. MapReduce examples and task scheduling with YARN, building and maintaining Hadoop clusters are also explained.
   Projects, based on Hadoop or related to it such as Sqoop, Pig, Hive, Spark, HBase and etc. are also covered in this book.


Keywords:
Hadoop is a framework for distributed processing of large data sets in clusters of computers.

HDFS (Hadoop Distributed Filesystem) allows to store huge files and works with streaming data access pattern.

MapReduce is a programming model for processing large data sets. The process is divided into mapping and reducing phases.

YARN (Yet Another Resource Negotiator) is a cluster resource management system that provides APIs for requesting and working with clusters resources in Hadoop.








No comments:

Post a Comment