Integrating LVM with Hadoop and providing Elasticity to DataNode Storage on AWS

GAURAV Jangid
5 min readMar 14, 2021

What is LVM?

Logical volume management (LVM) is a form of storage virtualization that offers system administrators a more flexible approach to managing disk storage space than traditional partitioning. … The goal of LVM is to facilitate managing the sometimes conflicting storage needs of multiple end users.

What is hadoop?

Hadoop is an open source, Java based framework used for storing and processing big data. The data is stored on inexpensive commodity servers that run as clusters. … Cafarella, Hadoop uses the MapReduce programming model for faster storage and retrieval of data from its nodes.

Hadoop is a software technology designed for storing and processing large volumes of data distributed across a cluster of commodity servers and commodity storage. … Hadoop consumes data from MongoDB, blending it with data from other sources to generate sophisticated analytics and machine learning models.

Elasticity :

Elasticity is defined as “the degree to which a system is able to adapt to workload changes by provisioning and de-provisioning resources in an autonomic manner, such that at each point in time the available resources match the current demand as closely as possible”.

About AWS :

Amazon Web Services (AWS) is a subsidiary of Amazon providing on-demand cloud computing platforms and APIs to individuals, companies, and governments, on a metered pay-as-you-go basis. These cloud computing web services provide a variety of basic abstract technical infrastructure and distributed computing building blocks and tools.

AWS EC2 (Elastic Compute Cloud) :

Amazon Elastic Compute Cloud (Amazon EC2) is a web service that provides secure, resizable compute capacity in the cloud. It is designed to make web-scale cloud computing easier for developers. Amazon EC2’s simple web service interface allows you to obtain and configure capacity with minimal friction. It provides you with complete control of your computing resources and lets you run on Amazon’s proven computing environment.

AWS EBS ( Elastic Block Storage) :

Amazon Elastic Block Store (EBS) is an easy to use, high performance block storage service designed for use with Amazon Elastic Compute Cloud (EC2) for both throughput and transaction intensive workloads at any scale. A broad range of workloads, such as relational and non-relational databases, enterprise applications, containerized applications, big data analytics engines, file systems, and media workflows are widely deployed on Amazon EBS.

Hadoop Setup Deployed on AWS:

AWS AutoScalingGroups

An Auto Scaling group contains a collection of Amazon EC2 instances that are treated as a logical grouping for the purposes of automatic scaling and management. An Auto Scaling group also enables you to use Amazon EC2 Auto Scaling features such as health check replacements and scaling policies. Both maintaining the number of instances in an Auto Scaling group and automatic scaling are the core functionality of the Amazon EC2 Auto Scaling service.

Hadoop Cluster:

‣ 1 Master/HadoopNameNode

‣ 1 Hadoop Client

‣ 3 Slaves/HadoopDataNode

Hadoop DataNode Elasticity using LVM:

source: redhat.com

Creating PV(Physical Volume) :

Creating VolumeGroup(VG) :

VolumeGroupDisplay

Creating LogicalVolume(LV) and mount on Hadoop DataNode Dir :

Logical Volume Display & mount

LV (Logical Volume) Extend:

Extending LV Partition Size

Increase flow of data onto the DataNode can be facilitated by extending the LV Partition Size.

LV (Logical Volume) Reduce:

Reducing the partition size of LV

To free up lv space when not in use can be beneficial to utilize the storage efficiently by dedicating the free storage to other LV.

VG (Volume Group Exhaust):

Free Space in VG Exhausted

If the VG runs out of free space, allocating storage to LV’s would’nt be possible.

For this matter, we can create more PV’s and extend the size of VG with the help of vgextend.

source: aws mgmt console

--

--