Computer Science Related Others Courses AvailableThe Best Codder.blogspot.com

HDFS Features and Goals

 

HDFS Features and Goals

The Hadoop Distributed File System (HDFS) is a distributed file system. It is a core part of Hadoop which is used for data storage. It is designed to run on commodity hardware.

Unlike other distributed file system, HDFS is highly fault-tolerant and can be deployed on low-cost hardware. It can easily handle the application that contains large data sets.

  1. Scalability: HDFS is designed to scale horizontally, which means that it can handle large datasets by adding more commodity hardware to the cluster. HDFS can support petabytes of data and thousands of nodes in a single cluster.

  2. Fault Tolerance: HDFS is designed to provide high availability and fault tolerance. HDFS achieves fault tolerance by replicating data across multiple nodes in the cluster. If a node fails, the data can be accessed from other nodes that have a copy of the data.

  3. Data Locality: HDFS is designed to provide data locality, which means that it tries to store data on the same node where the computation is happening. This reduces network traffic and improves performance.

  4. Streaming Data Access: HDFS is optimized for streaming data access, which means that it is designed for high throughput of large files rather than low-latency access to small files. This makes it well-suited for batch processing of large datasets.

  5. Block-Based Storage: HDFS stores data in blocks, which are typically 64 MB or 128 MB in size. This allows for efficient storage and retrieval of large files.

  6. Rack Awareness: HDFS is rack-aware, which means that it is designed to place replicas of data blocks on different racks in the cluster to improve fault tolerance and reduce network traffic.

  7. Command-Line Interface: HDFS provides a command-line interface (CLI) for managing the file system, including commands for creating, deleting, moving, and copying files.

Let's see some of the important features and goals of HDFS.

Features of HDFS

  • Highly Scalable - HDFS is highly scalable as it can scale hundreds of nodes in a single cluster.
  • Replication - Due to some unfavorable conditions, the node containing the data may be loss. So, to overcome such problems, HDFS always maintains the copy of data on a different machine.
  • Fault tolerance - In HDFS, the fault tolerance signifies the robustness of the system in the event of failure. The HDFS is highly fault-tolerant that if any machine fails, the other machine containing the copy of that data automatically become active.
  • Distributed data storage - This is one of the most important features of HDFS that makes Hadoop very powerful. Here, data is divided into multiple blocks and stored into nodes.
  • Portable - HDFS is designed in such a way that it can easily portable from platform to another.

Goals of HDFS

  • Handling the hardware failure - The HDFS contains multiple server machines. Anyhow, if any machine fails, the HDFS goal is to recover it quickly.
  • Streaming data access - The HDFS applications usually run on the general-purpose file system. This application requires streaming access to their data sets.
  • Coherence Model - The application that runs on HDFS require to follow the write-once-ready-many approach. So, a file once created need not to be changed. However, it can be appended and truncate.

Post a Comment

© Big Data Analytics. The Best Codder All rights reserved. Distributed by