What is Big Data?

 

        What is big data? How is it different from conventional data? Big data is a new buzz word in today’s technology field. Everyone is using it, but do they really know what it is and what makes if different than everyday data? According to the Treehouse Tech Group, 2021, the term big data actually does not refer to its size, but more of how it is handled. While traditional data is based on a centralized database architecture, big data uses a distributed architecture. Big data is made more scalable than traditional because its computation is also distributed among several computers in a network. We have been storing and processing data for decades; however, the rate that we have been generating data has accelerated greatly in recent years (think about cell phone photos, videos, emails, media content, etc.). We have all increased our daily data generation, and we are almost afraid to delete this data. This means it has to be stored and accessed for later use.

        The term big data can refer to a complex and large data set, along with the methods we use to process this data (Pure Storage, n.d.). Big data has four main characteristics, known as, “the four V’s”:

·        Volume – Big data isn’t always distinguished by its size, but also can be very high volume in nature.

·        Variety – Big data sets typically contains structured, semi-structured, and unstructured data.

·        Velocity – Big data is generated quickly and is often times processed in real-time.

·        Veracity – Big data isn’t better than traditional data but its accuracy is extremely important. Anomalies, biases, and noise can greatly impact the overall quality of big data.

        Many companies believe that they have to collect their own data but that simply is not true, there are tons of datasets online and available for public download (Marr, 2022). Five globally interesting datasets available for download are:

1.      Data.gov – The US government pledged to make all government data freely and available online. This dataset includes interesting information on anything from crime to climate change: http://data.gov

2.      US Census Bureau – this dataset includes information on the lives of US citizens including geographic data, population data, and education: http://www.census.gov/data.html

3.      Socrata is a dataset that also explores government related data and has some built in visualizations: https://www.tylertech.com/products/data-insights

4.      The European Union Open Data Portal also provides government type data but based on European Union Institutions: http://open-data.europa.eu/en/data/

5.      Data.gov.uk provides data from the UK government that includes the British National Bibliography – metadata from all UK books and publications since 1950: http://data.gov.uk/


References

Big Data vs. traditional data: What's the difference? Treehouse Tech Group. (2021, May 20). Retrieved January 18, 2023, from https://treehousetechgroup.com/big-data-vs-traditional-data-whats-the-difference/#:~:text=While%20traditional%20data%20is%20based,better%20performance%20and%20cost%20benefits.

Marr, B. (2022, October 12). Big data: 33 brilliant and free data sources anyone can use. Forbes. Retrieved January 18, 2023, from https://www.forbes.com/sites/bernardmarr/2016/02/12/big-data-35-brilliant-and-free-data-sources-for-2016/?sh=28483c2cb54d

Pure Storage. (n.d.). Big Data vs. Traditional Data. THE BEGINNERS GUIDE TO BIG DATA. Retrieved January 18, 2023, from https://www.purestorage.com/knowledge/big-data/big-data-vs-traditional-data.html


Comments