Big Data as a Weapon
Big
data is one of the latest buzz words in technology. I attended the first annual
Air Force Association Warfare Symposium yesterday in Arora, Colorado and was
not surprised that when General Glen D. VanHerck, Commander, U.S. Northern
Command was asked about protecting our homeland from evolving threats, a
portion of his answer was directed at how utilizing big data and big data analytics
played an integral role in defeating our enemies. Decision Support Systems (DSS),
artificial intelligence (AI), and the need for automation were additional buzz
words that were thrown around by the top brass of the U.S. Air Force and Space
Force speaking at yesterday’s symposium. All of these technologies are made
possible and supported by big data; tons of data collection, organization, and analysis.
Not too many years ago, when we talked about data, we talked about kilobytes,
and megabytes, but in today’s world, we talk in terabytes, and petabytes (Software
Testing Help, 2023).
Big
data and volumes of information are meaningless unless we have the ability to
effectively synthesize very large data sets. Fortunately for us, there are many
tools available that offer just this ability. Some of these tools are better
than others, and it is important that data scientists identify and utilize the
right tool for their need to effectively be able to find the patterns in the
data.
Tool
#1 – Apache Hadoop is a software framework used for handling big data and
clustered file systems. This tool is written in Java, provides cross-platform
support, and processes big data sets by using the MapReduce tool. Hadoop is the
topmost big data tool; over half of the Fortune 50 companies use it, including
Amazon Web Services (AWS), IBM, Microsoft, Facebook, Intel, etc. Its advantages
include: its Hadoop Distributed File System (HDFS) can hold all data types
(video, images, XML, JSON, and plain text) over the same file system, highly beneficial
in R&D purposes, offers quick access to data, highly scalable, and is highly
available when using a cluster of computers. Its disadvantages include: occasional
disk issues and its I\O operations could have been optimized to help improve
performance. Apache Hadoop is free for use under the Apache license (Software
Testing Help, 2023).
Tool
#2 – MongoDB is a document oriented, NoSQL database, that’s written is
JavaScript, C, and C++ programming languages. It supports multiple operating
systems including Windows Vista OS and newer, Linux, Solaris, FreeBSD, and OS X.
Its major customers are Facebook, eBay, Google, MetLife, and many others. Its
advantages include user friendliness, ease of use, it supports multiple
platforms and technologies, there are no issues with installation and
maintenance, and it is low cost and reliable. Some disadvantages include its
limited analytical abilities and in certain use cases, it can be slow. MongoDB
is a free, open source program, but its SMB and enterprise versions are paid
for and the pricing is available upon inquiry (Software Testing Help, 2023).
Tool
#3 – R is one of the most inclusive statistical analysis platforms. It is
written in Fortran, C, and R programming languages. It is widely used by data miners,
data scientists, and statisticians for data calculation, analysis, manipulation,
and its graphical display abilities. Its largest advantage is its limitlessness
of the package ecosystem and its unparalleled charting and graphical benefits. R’s
disadvantages include its slower speed, low security, and poor memory
management. Lastly, the R studio is a free, open source program available to
all who wish to use it (Software Testing Help, 2023).
In
a recent series of data reports, Data Never Sleeps, by DOMO, one solitary
internet minute has more than 400,000 hours of Netflix video streaming, 500
hours of streaming YouTube videos, and nearly 42 million messages shared
through WhatsApp – all of this in only 1 internet minute! Internet users have
reached 4.5 billion worldwide, approximately 63% of the total world population,
and is expected to continue to increase. These different types of structured,
semi-structured, and unstructured data are known as big data. Big data
analytics is the science behind turning these mass amounts of data into useful information
(Pathak, 2021); information today that our military’s top brass are eyeing as a
potential weapon in defense against our enemies.
References
Pathak, R. (2021, January
26). Top 10 big data analytics tools. Analytics Steps. Retrieved March 7, 2023,
from https://www.analyticssteps.com/blogs/top-10-big-data-analytics-tools
Top 15 big data tools (big
data analytics tools) in 2023. Software Testing Help. (2023, February 17).
Retrieved March 7, 2023, from
https://www.softwaretestinghelp.com/big-data-tools/
Comments
Post a Comment