Computer Science Related Others Courses AvailableThe Best Codder.blogspot.com

Design and Principal of Hadoop

 Design and Principal of Hadoop

Hadoop is a distributed computing framework that is designed to store and process large datasets distributed across a cluster of commodity hardware. It was created by Apache Software Foundation and is open-source software. Here are the design principles of Hadoop:


Distributed File System (HDFS):

 Hadoop uses a distributed file system called HDFS that stores data across multiple nodes in a cluster. HDFS provides fault tolerance by replicating data blocks across multiple nodes, ensuring that data is available even if a node fails.


MapReduce: 

Hadoop uses a programming model called MapReduce to process large datasets in parallel across a cluster of machines. The MapReduce model is based on two functions: Map and Reduce. The Map function processes data in parallel and generates key-value pairs. The Reduce function takes the output of the Map function and performs further processing to generate the final output.


Commodity Hardware: 

Hadoop is designed to run on commodity hardware, which is relatively inexpensive and widely available. This makes it possible to build large clusters of machines to store and process big data at a lower cost.


Scalability:

 Hadoop is designed to scale horizontally, meaning that you can add more nodes to the cluster as needed to handle larger datasets or increase processing capacity.


Open-Source: 

Hadoop is an open-source software, which means that anyone can download, modify and use the code without any licensing fees. This has led to a large and active community of developers contributing to the Hadoop ecosystem.


Extensibility: 

Hadoop is designed to be extensible, which means that you can add new data sources, processing frameworks, and other components to the system. This has led to the development of a large ecosystem of tools and frameworks that work with Hadoop, including Apache Spark, Hive, Pig, and HBase.

Post a Comment

© Big Data Analytics. The Best Codder All rights reserved. Distributed by