Table of Contents:
- a. The Beginnings of Quantification
- b. Arabic Numerals and Mathematics
- c. The Rise of Quantification
- d. The Discovery of Statistical Correlation (and Some Theory)
- a. Computers and Data Collection (and Storage)
i. The Variety of Data
ii. The Volume of Data
- b. Data Analysis
i. Statistical Correlations
ii. Messy Data
iii. Mining Correlations for Insights
- a. Google Flu Trends
- b. Google Translate
- c. Google Books
- d. Google Profits
- a. Amazon.com and the Beginnings of Targeted Advertising
- b. Target
- a. In Factories and Refineries
- b. Outside of the Factory
- a. Farecast
- b. Decide.com
- a. Big Data in the Hospital
- b. Big Data in Genomics
- a. Big Data and the CPI
- b. Big Data and Public Safety
i. Infrastructure Maintenance: Manhole Threats in NYC
ii. Fire Threats in NYC
- c. Open Data
- a. The End of Privacy
- b. Big Data Profiling
Statistical information, or data, has long been recognized to be a potentially rich and valuable source of knowledge. Until recently, however, our ability to render phenomena and events in a quantified format, store this information, and analyze it has been severely limited. With the rise of the digital age, though, these limitations are quickly being eroded. To begin with, digital devices that record our movements and communications, and digital sensors that record the behavior of inanimate objects and systems have become widespread and are proliferating wildly. What’s more, the cost of storing this information on computer servers is getting cheaper and cheaper, thus allowing us to keep much more of it than ever before. Finally, increasingly sophisticated computer algorithms are allowing us to analyze this information more deeply than ever, and are revealing interesting (and often counter-intuitive) relationships that would never have been possible previously. The increasing datification of the world, and the insights that this is bringing us, may be thought of as one grand phenomenon, and it has a name: Big Data.
The insights that are emerging out of big data are spread out over many areas, and are already impacting several aspects of society. To begin with, big data is helping established businesses to run more efficiently and safely. For example, big data is being used to streamline assembly lines and also to catch quality control problems in the factory. But the benefits of big data go well beyond the factory. For example, the courier company UPS has used big data to help it map out more efficient trucking routes. The resulting improvements have allowed UPS to shave 30 million miles and 3 million gallons of fuel per year from their routes (loc. 1352). The more efficient trucking routes have also led to less traffic accidents. Meanwhile, car companies are beginning to use data from sensors in automobiles to understand which parts are causing problems, and also to understand where and why accidents are happening, so that they may be lessened.
In addition to helping already established businesses, big data is also allowing for new business opportunities that were never possible before. For example, the business prodigy Oren Etzioni used big data to set up a business called Farecast that predicts the cost of airfare tickets. When his business was bought by Microsoft for $110 million, Etzioni used big data again to set up a related business that predicts the cost of all manner of consumer goods. His very profitable business, Decide.com, saves consumers on average $100 per product (loc. 1867).
Outside of the business world, big data is also being used by governments to help reduce costs and make society safer. For example, in 2009 Google was able to apply big data to search terms to help identify how the H1N1 virus was spreading through communities in real time. This method of tracking disease pandemics holds great promise for allowing public health organizations to know when pandemics are beginning, and also to keep better track of how they are unfolding, in order that they may better contain them. In addition, big data is being used to help identify where potentially dangerous infrastructural problems are occurring, and also to identify trouble spots for fire hazards, in order that they may be addressed.
Big data also has significant potential uses in health care. Indeed, our increasing ability to monitor and record everything from our vital signs to the health of our systems to our individual genomes promises to inaugurate an age of personalized medicine that will allow doctors to more easily diagnose our ailments and tailor treatments to our individual bodies.
While big data may already be bringing us impressive benefits, Viktor Mayer-Schonberger and Kenneth Cukier argue that the bulk of the benefits are yet to come. Indeed, for the authors, businesses and governments are only just now waking up to the incredible potential of Big Data. And as they direct more attention to recording and analyzing data streams, the potential uses of the information will only multiply.
On the negative side, big data also carries substantial potential dangers. Most notably, as more and more information about us is recorded, kept and used, our privacy is increasingly threatened. For the authors, a good deal of oversight will be needed in order to ensure that the potential abuses of big data are curbed. In their new book Big Data: A Revolution That Will Transform How We Live, Work, and Think Mayer-Schonberger and Cukier explore both the benefits and the dangers of big data.
Here is Kenneth Cukier discussing the potential impact of big data:
What follows is a full executive summary of Big Data: a Revolution That Will Transform How We Live, Work, and Think by Viktor Mayer-Schonberger and Kenneth Cukier.
a. The Beginnings of Quantification
Big data relies on phenomena and events being quantified, and therefore, the roots of big data trace back to the beginning of measurement itself. And as the authors report, measurement is as old as civilization: “Basic counting and measurement of length and weight were among the oldest conceptual tools of early civilizations. By the third millennium B.C. the idea of recorded information had advanced significantly in the Indus Valley, Egypt, and Mesopotamia. Accuracy increased, as did the use of measurement in everyday life. The evolution of script in Mesopotamia provided a precise method of keeping track of production and business transactions… Together, measuring and recording facilitated the creation of data. They are the earliest foundations of datafication” (loc. 1190).
Over the millenia, civilizations would graduate from measuring length and weight to measuring area, volume and time as well (loc. 1196). Still, mining data for deep insights depends largely on applying mathematics to the numbers (loc. 1212), and at this early stage the methods of recording numbers were still somewhat crude, and did not lend themselves easily to mathematical analysis and calculation (loc. 1199).
b. Arabic Numerals and Mathematics
This state of affairs would change in the 1st century A.D., though, when a new and much more sophisticated method of recording numbers was invented in India (loc. 1206). In time this method was modified and improved upon, and it eventually spread all over the world. As the authors explain, “an alternative system of numerals was developed in India around the first century A.D. It traveled to Persia where it was improved, and then passed on to the Arabs, who greatly refined it. It is the basis of the Arabic numerals we use today. The crusades may have brought destruction on the lands the Europeans invaded, but knowledge migrated from East to West, and perhaps the most significant transplant was Arabic numerals… By the twelfth century Arabic texts describing the system were translated into Latin and spread throughout Europe” (loc. 1206).
*For prospective buyers: To get a good indication of how this (and other) articles look before purchasing, I’ve made several of my past articles available for free. Each of my articles follows the same form and is similar in length (15-20 pages). The free articles are available here: Free Articles