Traditional Relational Database Management Systems (DBMS) vs Hadoop (Serial vs Parallel Processing)
Traditional
database management systems (DBMS) have been used for over two decades to
manage data. However, in the past few years, datasets have become so large,
that traditional DBMS’s cannot effectively manage these datasets. Traditional
databases, such as data warehouses and relational databases, use Structured
Query Language (SQL) in centralized database architecture to store and maintain
data in fields in a file or a fixed format (Lay, 2022). Apache Hadoop is am
open source framework that is used to store and process large datasets ranging
in size from gigabytes to petabytes of data. In contrast to traditional
databases that use one large computer to store and manage data, Hadoop uses multiple
clustered computers to analyze these massive datasets in parallel, quicker than
with traditional databases (Gilmour et al., 2021).
In
old school serial processing, a processor takes one task at a time, which has
proven to be inefficient when dealing with big data and large datasets (Gerencer,
2019). In contrast, parallel processing is like cloning this single processor 3 to 5 times, performing many tasks at the same time. Parallel computing takes
multiple computers and attacks several operations at once (Gerencer, 2019). Simply put,
parallel computing allows the processing of data multiple times faster than
with traditional RDMS.
According
to Gilmour et al., (2021), to allow parallel processing, Hadoop has four main
modules:
1.
Hadoop
Distributed File System (HDFS) – a distributed file system that runs on standard
to low-end hardware that provides better data throughput than traditional file
systems, along with high fault tolerance and a native support for large datasets
2.
Yet
Another Resource Negotiator (YARN) – monitors and manages cluster nodes and
resource usage, along with scheduling jobs and tasks
3.
MapReduce
– helps programs perform the parallel computation of data, it takes input data, then converts it into a dataset that can be computed into key value pairs
4.
Hadoop
Common – provides common Java libraries that can be used across all modules
Taken
from GeeksforGeeks (2022), the below table compares and contrasts the differences
between traditional RDBS and Hadoop:
Number |
RDBMS |
Hadoop |
1 |
Traditional row-column
based databases, basically used for data storage, manipulation and retrieval. |
Open-source software used
for storing data and running applications or processes concurrently. |
2 |
In this structured data is
mostly processed. |
In this both structured and
unstructured data is processed. |
3 |
It is best suited for OLTP
environment. |
It is best suited for BIG
data. |
4 |
It is less scalable than
Hadoop. |
It is highly scalable. |
5 |
Data normalization is
required in RDBMS. |
Data normalization is not
required in Hadoop. |
6 |
It
stores transformed and aggregated data. |
It stores huge volume of
data. |
7 |
It has no latency in
response. |
It has some latency in
response. |
8 |
The data schema of RDBMS is
static type. |
The data schema of Hadoop
is dynamic type. |
9 |
High data integrity
available. |
Low data integrity
available than RDBMS. |
10 |
Cost is applicable for
licensed software. |
Free of cost, as it is open-source
software. |
References
Difference between RDBMS and
Hadoop. GeeksforGeeks. (2022, July 11). Retrieved February 20, 2023, from
https://www.geeksforgeeks.org/difference-between-rdbms-and-hadoop/
Gerencer, T. (2019, October
30). Parallel computing and its modern uses. HP® Tech Takes. Retrieved February
20, 2023, from
https://www.hp.com/us-en/shop/tech-takes/parallel-computing-and-its-modern-uses#:~:text=The%20advantages%20of%20parallel%20computing,more%20resources%20to%20the%20table.
Gilmour, J. B., Lui, A. W.,
& Briggs, D. C. (2021). EMR. Amazon. Retrieved February 20, 2023, from
https://aws.amazon.com/emr/details/hadoop/what-is-hadoop/
Lay, B. (2022, January 11). Big
Data vs Traditional Data. Aversan. Retrieved February 20, 2023, from
https://www.aversan.com/big-data-vs-traditional-data/#:~:text=In%20traditional%20database%20system%2C%20such,manage%20and%20access%20the%20data.
Comments
Post a Comment