Traditional Relational Database Management Systems (DBMS) vs Hadoop (Serial vs Parallel Processing)

Traditional database management systems (DBMS) have been used for over two decades to manage data. However, in the past few years, datasets have become so large, that traditional DBMS’s cannot effectively manage these datasets. Traditional databases, such as data warehouses and relational databases, use Structured Query Language (SQL) in centralized database architecture to store and maintain data in fields in a file or a fixed format (Lay, 2022). Apache Hadoop is am open source framework that is used to store and process large datasets ranging in size from gigabytes to petabytes of data. In contrast to traditional databases that use one large computer to store and manage data, Hadoop uses multiple clustered computers to analyze these massive datasets in parallel, quicker than with traditional databases (Gilmour et al., 2021).

In old school serial processing, a processor takes one task at a time, which has proven to be inefficient when dealing with big data and large datasets (Gerencer, 2019). In contrast, parallel processing is like cloning this single processor 3 to 5 times, performing many tasks at the same time. Parallel computing takes multiple computers and attacks several operations at once (Gerencer, 2019). Simply put, parallel computing allows the processing of data multiple times faster than with traditional RDMS.

According to Gilmour et al., (2021), to allow parallel processing, Hadoop has four main modules:

1.      Hadoop Distributed File System (HDFS) – a distributed file system that runs on standard to low-end hardware that provides better data throughput than traditional file systems, along with high fault tolerance and a native support for large datasets

2.      Yet Another Resource Negotiator (YARN) – monitors and manages cluster nodes and resource usage, along with scheduling jobs and tasks

3.      MapReduce – helps programs perform the parallel computation of data, it takes input data, then converts it into a dataset that can be computed into key value pairs

4.      Hadoop Common – provides common Java libraries that can be used across all modules

Taken from GeeksforGeeks (2022), the below table compares and contrasts the differences between traditional RDBS and Hadoop:

Number

RDBMS

Hadoop

1

Traditional row-column based databases, basically used for data storage, manipulation and retrieval.

Open-source software used for storing data and running applications or processes concurrently.

2

In this structured data is mostly processed.

In this both structured and unstructured data is processed.

3

It is best suited for OLTP environment.

It is best suited for BIG data.

4

It is less scalable than Hadoop.

It is highly scalable.

5

Data normalization is required in RDBMS.

Data normalization is not required in Hadoop.

6

It stores transformed and aggregated data.

It stores huge volume of data.

7

It has no latency in response.

It has some latency in response.

8

The data schema of RDBMS is static type.

The data schema of Hadoop is dynamic type.

9

High data integrity available.

Low data integrity available than RDBMS.

10

Cost is applicable for licensed software.

Free of cost, as it is open-source software.

 

References

Difference between RDBMS and Hadoop. GeeksforGeeks. (2022, July 11). Retrieved February 20, 2023, from https://www.geeksforgeeks.org/difference-between-rdbms-and-hadoop/

Gerencer, T. (2019, October 30). Parallel computing and its modern uses. HP® Tech Takes. Retrieved February 20, 2023, from https://www.hp.com/us-en/shop/tech-takes/parallel-computing-and-its-modern-uses#:~:text=The%20advantages%20of%20parallel%20computing,more%20resources%20to%20the%20table.

Gilmour, J. B., Lui, A. W., & Briggs, D. C. (2021). EMR. Amazon. Retrieved February 20, 2023, from https://aws.amazon.com/emr/details/hadoop/what-is-hadoop/

Lay, B. (2022, January 11). Big Data vs Traditional Data. Aversan. Retrieved February 20, 2023, from https://www.aversan.com/big-data-vs-traditional-data/#:~:text=In%20traditional%20database%20system%2C%20such,manage%20and%20access%20the%20data.


Comments