Big Data as a Weapon


Big data is one of the latest buzz words in technology. I attended the first annual Air Force Association Warfare Symposium yesterday in Arora, Colorado and was not surprised that when General Glen D. VanHerck, Commander, U.S. Northern Command was asked about protecting our homeland from evolving threats, a portion of his answer was directed at how utilizing big data and big data analytics played an integral role in defeating our enemies. Decision Support Systems (DSS), artificial intelligence (AI), and the need for automation were additional buzz words that were thrown around by the top brass of the U.S. Air Force and Space Force speaking at yesterday’s symposium. All of these technologies are made possible and supported by big data; tons of data collection, organization, and analysis. Not too many years ago, when we talked about data, we talked about kilobytes, and megabytes, but in today’s world, we talk in terabytes, and petabytes (Software Testing Help, 2023).

Big data and volumes of information are meaningless unless we have the ability to effectively synthesize very large data sets. Fortunately for us, there are many tools available that offer just this ability. Some of these tools are better than others, and it is important that data scientists identify and utilize the right tool for their need to effectively be able to find the patterns in the data.

Tool #1 – Apache Hadoop is a software framework used for handling big data and clustered file systems. This tool is written in Java, provides cross-platform support, and processes big data sets by using the MapReduce tool. Hadoop is the topmost big data tool; over half of the Fortune 50 companies use it, including Amazon Web Services (AWS), IBM, Microsoft, Facebook, Intel, etc. Its advantages include: its Hadoop Distributed File System (HDFS) can hold all data types (video, images, XML, JSON, and plain text) over the same file system, highly beneficial in R&D purposes, offers quick access to data, highly scalable, and is highly available when using a cluster of computers. Its disadvantages include: occasional disk issues and its I\O operations could have been optimized to help improve performance. Apache Hadoop is free for use under the Apache license (Software Testing Help, 2023).

Tool #2 – MongoDB is a document oriented, NoSQL database, that’s written is JavaScript, C, and C++ programming languages. It supports multiple operating systems including Windows Vista OS and newer, Linux, Solaris, FreeBSD, and OS X. Its major customers are Facebook, eBay, Google, MetLife, and many others. Its advantages include user friendliness, ease of use, it supports multiple platforms and technologies, there are no issues with installation and maintenance, and it is low cost and reliable. Some disadvantages include its limited analytical abilities and in certain use cases, it can be slow. MongoDB is a free, open source program, but its SMB and enterprise versions are paid for and the pricing is available upon inquiry (Software Testing Help, 2023).

Tool #3 – R is one of the most inclusive statistical analysis platforms. It is written in Fortran, C, and R programming languages. It is widely used by data miners, data scientists, and statisticians for data calculation, analysis, manipulation, and its graphical display abilities. Its largest advantage is its limitlessness of the package ecosystem and its unparalleled charting and graphical benefits. R’s disadvantages include its slower speed, low security, and poor memory management. Lastly, the R studio is a free, open source program available to all who wish to use it (Software Testing Help, 2023).

In a recent series of data reports, Data Never Sleeps, by DOMO, one solitary internet minute has more than 400,000 hours of Netflix video streaming, 500 hours of streaming YouTube videos, and nearly 42 million messages shared through WhatsApp – all of this in only 1 internet minute! Internet users have reached 4.5 billion worldwide, approximately 63% of the total world population, and is expected to continue to increase. These different types of structured, semi-structured, and unstructured data are known as big data. Big data analytics is the science behind turning these mass amounts of data into useful information (Pathak, 2021); information today that our military’s top brass are eyeing as a potential weapon in defense against our enemies.

 

References

Pathak, R. (2021, January 26). Top 10 big data analytics tools. Analytics Steps. Retrieved March 7, 2023, from https://www.analyticssteps.com/blogs/top-10-big-data-analytics-tools

Top 15 big data tools (big data analytics tools) in 2023. Software Testing Help. (2023, February 17). Retrieved March 7, 2023, from https://www.softwaretestinghelp.com/big-data-tools/


Comments