Big data has been around for a long time. One of the earliest examples is the Great Library of Alexandria in Egypt, established between 285-246 BC and destroyed during the Palmyrene invasion between 270 and 275 AD. In the 21st century, we collect, manage, and analyze data faster and more complexly than ever before. This is where big data becomes significant.
What is Big Data?
Big data refers to large sets of structured, semi-structured, and unstructured data. It arrives in high volumes, at high speeds, in various formats, and from many sources. The term ‘big data’ was coined in the late 1990s by NASA researchers Michael Cox and David Ellsworth. In their 1997 paper, they described the challenge of processing and visualizing large amounts of data from supercomputers.
In 2001, data expert Doug Laney published a paper that established the three primary components of big data: Volume (size of data), Velocity (speed of data growth), and Variety (types and sources of data). These components are still used to describe big data today.
The History of Big Data
The history of big data is long and storied. Many technological advancements during World War II were made for military purposes. Over time, these advancements became useful to the commercial sector and eventually to the general public. Personal computing became a viable option for everyday consumers.
1940s to 1989 – Data Warehousing and Personal Desktop Computers
The origins of electronic storage trace back to the development of the world’s first programmable computer, the Electronic Numerical Integrator and Computer (ENIAC). Designed by the U.S. Army during World War II, it solved numerical problems like calculating the range of artillery fire. In the early 1960s, IBM released the first transistorized computer, TRADIC, which helped data centers move beyond military use to serve general commercial purposes.
The first personal desktop computer with a Graphical User Interface (GUI) was Lisa, released by Apple in 1983. Throughout the 1980s, companies like Apple, Microsoft, and IBM released a range of personal desktop computers. This led to a surge in people buying personal computers and using them at home. Thus, electronic storage became available to the masses.
1989 to 1999 – Emergence of the World Wide Web
Between 1989 and 1993, British computer scientist Sir Tim Berners-Lee created the key technologies for the World Wide Web. These included HyperText Markup Language (HTML), Uniform Resource Identifier (URI), and Hypertext Transfer Protocol (HTTP). In April 1993, the decision was made to make the underlying code for these web technologies free, forever.
This decision allowed individuals, businesses, and organizations with internet access to go online and share data with other internet-enabled computers. As more devices connected to the internet, there was a massive increase in the amount of information people could access and share.
Read more: Big Data Outsourcing: Benefits and Challenges
2000s to 2010s – Controlling Data Volume, Social Media, and Cloud Computing
In the early 2000s, companies like Amazon, eBay, and Google generated large amounts of web traffic and a mix of structured and unstructured data. In 2002, Amazon launched a beta version of Amazon Web Services (AWS), opening its platform to all developers. By 2004, over 100 applications had been built for it.
In 2006, AWS relaunched, offering cloud infrastructure services like Simple Storage Service (S3) and Elastic Compute Cloud (EC2). The public launch of AWS attracted customers like Dropbox, Netflix, and Reddit, who wanted to become cloud-enabled and partnered with AWS before 2010.
Social media platforms like MySpace, Facebook, and Twitter led to a rise in unstructured data. This included images, audio files, animated GIFs, videos, status posts, and direct messages.
With the rapid generation of unstructured data, platforms needed new ways to collect, organize, and make sense of it. This led to the creation of Hadoop, an open-source framework for managing big data sets, and the adoption of NoSQL databases, which made it possible to manage unstructured data. With these technologies, companies could collect large amounts of disparate data and extract meaningful insights for better decision-making.
2010s to Now – Optimisation Techniques, Mobile Devices, and IoT
In the 2010s, big data faced new challenges with the rise of mobile devices and the Internet of Things (IoT). Millions of people worldwide had small, internet-enabled devices in their hands, allowing them to access the web, communicate wirelessly, and upload data to the cloud. According to Domo’s 2017 “Data Never Sleeps” report, we were generating 2.5 quintillion bytes of data daily.
The increase in mobile and IoT devices led to the collection of new types of data, including:
- Sensor Data: Collected by internet-enabled sensors, providing real-time insights into machinery.
- Social Data: Publicly available data from social media platforms like Facebook and Twitter.
- Transactional Data: Data from online stores, including receipts, storage records, and repeat purchases.
- Health-related Data: Data from heart rate monitors, patient records, and medical histories.
With this information, companies could gain deeper insights into areas like customer buying behaviour and machinery maintenance.
The Future of Big Data Solutions
The future of big data is still uncertain, but current trends give us some clues. The most prominent big data technologies are AI (Artificial Intelligence) and automation. These technologies streamline database management and data analysis, making it easier to turn raw data into insights for decision-makers.
Big data analytics tools help companies manage the rapid growth of data. They convert meaningless data into valuable information, aiding in decision-making and predicting future outcomes.
Ethical concerns are a major challenge for big data. Legislation like GDPR (General Data Protection Regulation) sets standards for data collection and use. It emphasises customer privacy, making it crucial for companies to take data privacy seriously to operate legally and avoid fines. Using the latest tools designed to comply with such regulations helps companies protect sensitive customer and employee data.