WHAT IS BIG DATA AND WHY IS IT IMPORTANT?

J. EDUCATIONAL TECHNOLOGY SYSTEMS, Vol. 43(2) 159-171, 2014-2015 WHAT IS BIG DATA AND WHY IS IT IMPORTANT? HARRY E. PENCE State University of New Yo...

Author: Corey Banks

3 downloads 0 Views 103KB Size

Report

Download PDF

Recommend Documents

WHAT IS MATHEMATICAL THINKING AND WHY IS IT IMPORTANT?

What Is Fluent Reading and Why Is It Important?

BIODIVERSITY WHY IS IT IMPORTANT?

Health Literacy: What is it, What to do about it, Why is it Important?

CMS what is it and why is it relevant?

What is big data software? How is it different than non-big-data software?

Command Presence What is it? Why is it important? How do we measure it?

Why PBIS? (and what is it anyway?)

Subitizing: What Is It? Why Teach It?

School Vouchers and Charter Schools in Illinois What is School Choice and Why is It Important?

Why is it important to eat healthily?

What is Music, Health, and Wellbeing and Why is it Important?

What Is Social Capital and Why Is It Important to Public Policy?

What is literacy and why is it important for young learners?

INTRODUCTION WHAT IS USER EXPERIENCE (UX) FOR THE WEB AND WHY IS IT IMPORTANT?

So what is the Silk Road, and why is it important? First, examine all the things it is not

Is It Important What We Wear?

WHY IS THIS IMPORTANT?

COOPERATION WHAT IS IT AND WHY DO IT?

WHAT IT IS AND WHY WE DO IT

CARBON DUE DILIGENCE: WHAT IS IT AND WHY DO IT?

What is the New Evangelization? Why is it important for priestly formation? Ralph Martin

What is literature review? Why is it important? How do you proceed to do literature review?

J. EDUCATIONAL TECHNOLOGY SYSTEMS, Vol. 43(2) 159-171, 2014-2015

WHAT IS BIG DATA AND WHY IS IT IMPORTANT?

HARRY E. PENCE State University of New York at Oneonta

ABSTRACT

Big Data Analytics is a topic fraught with both positive and negative potential. Big Data is defined not just by the amount of information involved but also its variety and complexity, as well as the speed with which it must be analyzed or delivered. The amount of data being produced is already incredibly great, and current developments suggest that this rate will only increase in the near future. Improved service should result as companies better understand their customers, but it is also possible that this data will create privacy problems. Thus, Big Data is important not only to students who hope to gain employment using these techniques and those who plan to use it for legitimate research, but also for everyone who will be living and working in the 21st Century.

INTRODUCTION The headline for a recent article in Forbes magazine [1] reads, “What’s the next big thing in big data?” then answers by saying, “Bigger data.” The author goes on to write that this is still the early days for Big Data. In 2013, only 5% of the potential digital data in this country was analyzed, even though 22% of digital information is a candidate for analysis. Several different consulting firms concur with some variation of the label that, “Big Data: The Next Frontier for Innovation, Competition, and Productivity” [2]. Bernard Marr has enumerated ten examples of using Big Data that run the gamut from improving healthcare to decreasing urban traffic problems [3]. Patrick Tucker looked at the long-term future for Big Data 159 Ó 2015, Baywood Publishing Co., Inc. doi: http://dx.doi.org/10.2190/ET.43.2.d http://baywood.com

160 / PENCE

and wrote that, “In the next two decades, we will be able to predict huge areas of the future with far greater accuracy than ever before in human history, including events long thought to be beyond the realm of human inference” [4]. It seems clear that Big Data has become the latest buzz word in the world of technology, but beyond all the rhetoric, what is Big Data and why is it important? HOW IS BIG DATA DEFINED? The name Big Data (sometimes called business analytics) clearly implies that size is important, and this is only one of the defining characteristics of the problem. Most computer users are accustomed to thinking of data in terms of megabytes and gigabytes, but Big Data requires much larger units. Data is stored in a computer in binary locations that can have one of two possible values, one or zero, and a single one of these locations is called a Bit. The basic unit of digital storage is a byte, which is usually thought of as consisting of eight bits. A Megabyte is normally defined to be 1,000,000 bytes, although computer makers sometimes define it as being the number 2 raised to the 20th power. This gives a value slightly different from the one million bytes, but the difference is small and for the purposes of this discussion the more common definition of a megabyte as one million bytes will be used. By the same convention, a Gigabyte is 1000 Megabytes. At one time this was considered to be a large amount of computer memory, but now it is possible to buy a one Gigabyte flashdrive for only a few dollars. As seen in Table 1, the next step up is the data measure called a Terabyte, which is 1000 Gigabytes, followed by the Petabyte, which is 1000 Terabytes. The Petabyte is clearly of a size that can make it appropriate to use Big Data tools. The entire works of humankind written in all languages since the beginning of history is estimated to be about 50 petabytes. Using a more modern basis for comparison, Google is estimated to process over 20 petabytes of data each day. Information scientists today are talking about Exabytes (1000 Petabytes) and even Zettabyte (1000 Exabytes). At this point, if not earlier, it becomes difficult for the human mind to visualize how much information is being discussed. David Weinberger points out that according to researchers at the UC-San Diego, Americans consumed about 3.6 Zettabytes of information in 2008 [5]. Weinberger offers a vivid analogy to visualize the size of this much data. He points out that War and Peace is about 1296 pages in print, which would be equivalent to 2 megabytes in digital format. This means that one Zettabyte equals 5 × 1014 (that is, 5 with 14 zeroes after it) copies of War and Peace. If each physical book is 6 inches thick, it would take a photon of light traveling at 186,000 miles per second 2.9 days to go from the top to the bottom of a stack of this book equivalent to one Zettabyte. This is, indeed, an unimaginable amount of data, but there is already discussion of the need for even larger data measures, like Yottabytes (1000

WHAT IS BIG DATA AND WHY IS IT IMPORTANT? /

161

Table 1. Definitions of Units for Measuring Big Data Data Unit

Approximate size

Approximate comparison

1 Terabyte (TB)

1000 Gigabytes

The U.S. Library of Congress holds about 10 Terabytes of written information.

1 Petabyte (PB)

1000 Terabyes

1 Petabyte is equivalent to 250,000 DVDs.

1 Exabyte (EB)

1000 Petabytes

It is estimated that all the words ever spoken by human beings in history would be about 5 Exabytes.

1 Zettabyte (ZB)

1000 Exabytes

5 × 1014 copies of War and Peace.

1 Yottabyte (YB)

1000 Zettabytes

Storing a Yottabyte of data on terabyte-size hard drives would require one million data-centers, each the size of a city block and would cover an area as large as the states of Delaware and Rhode Island combined.

Zettabyes) and even Brontobytes (1000 Yottabytes). A Brontobyte would be 1 with 27 zeroes after it or 1 × 1027 bytes. Although the amount of data is one obvious aspect of Big Data, there are other factors that may require the use of Big Data tools for analysis. IBM summarizes the characteristics of Big Data by saying that there are three V’s. The sheer volume of stored data is exploding; IBM predicts that there will be 35 Zettabytes stored by 2020. This data comes in a bewildering variety of structured and unstructured formats. And the velocity of data depends on not just the speed at which the data is flowing but also the pace at which it must be collected, analyzed, and retrieved. Beyond these three Vs, Big Data is also about how complicated the computing problem is. The cost of a personal genome is dropping rapidly and some are predicting a $100 dollar cost soon. Knowing an individual’s genome should allow medical treatment to be customized to the individual. Forrester principal analyst Mike Gualtieri [6] points out that the data from one individual’s sequenced DNA is only about 750 MB, but it would require 222 Petabytes for storage for the entire population of the United States. Even if the goal was just to analyze the genome for one person in order to find disease indicators, the complexity of the interactions among this data set would represent a massive computing problem that would require Big Data tools. Thus, in addition to the three V’s identified by IBM, it would also be necessary to take complexity into account (see Figure 1).

162 / PENCE

Figure 1. Big Data is a combination of volume, velocity, variety, and complexity.

Several factors have contributed to the current interest in Big Data: • Computer memory has become much cheaper and easier to search. The cost per gigabyte of data storage provided by a USB flash drive has fallen from more than $8,000 when they were first introduced about 10 years ago, to 94 cents today [7]. • Social networks collect what formally had been ephemeral conversations. Twitter generates more than 7 Terabytes (TB) a day; Facebook more than 10 TBs, and some enterprises already store data in the petabyte range [8]. • As business has become more globalized, coordinating what is happening at multiple locations has become more complicated but also more important. Big Data tools can help with this problem. One key factor that may explain much of the current hype is the expectation that Big Data analysis will allow more accurate identification of customer characteristics than traditional marketing methods. An apocryphal quote attributed to John Wanamaker (the founder of the eponymous department store) as early as 1919 says, “I know that half of my advertising doesn’t work. The problem is I don’t know which half” [9]. The persistence of this quote suggests a basic truth about many businesses in this country. Although most businesses collect terabytes of information about customers, employees, and their enterprise, it is a

WHAT IS BIG DATA AND WHY IS IT IMPORTANT? /

163

common complaint that leaders can’t access their information fast enough, and even when it is accessed it doesn’t give leaders what they need to know. It is hoped that Big Data will allow companies to understand better what consumers are thinking. This could result in more focused business analytics that will improve the ability to predict what the consumer wants and how to reach her. Many businesses, like Amazon and Netflix, use correlation analysis to suggest products that might interest current customers, that is, suggesting possible purchases based on the similar purchase by other customers. It is critical to remember that correlation does not almost imply causation. Just because one buys a book about World War I doesn’t necessarily mean that you also want to buy a book about World War II even though many people may be buying books in both categories; even more important, when correlation does, the results may be counterproductive. One of the most famous misuses of this practice was when Target sent coupons for baby products to a teenage female customer who was predicted to be pregnant because she was buying unscented lotion, mineral supplements, and cotton balls [10]. The result was more negative than had been envisioned by Target, since the analysis was accurate, but the young lady had not yet shared the news with her father. Ellenberg uses the problem at Target as a starting point for a much broader discussion of how computer algorithms can be used to predict human activities [10]. He argues that human behavior is not like problems in Newtonian physics, where the answer can be quite precise if enough information is available, but rather people behave more like the weather, where chaos always makes the results somewhat unpredictable. He goes on to ask who is responsible if a computer algorithm makes a mistake in identifying people who are likely to not pay their debts, or be involved in terrorism. BIG DATA TOOLS One of the simplest (and cheapest) ways to use Big Data is to track the number of people who are searching a given term each day by using a tool like Google’s Big Query [11]. This is a free service that allows a user to query terabytes of data in seconds using a Structured Query Language (SQL) interface. (SQL is a special-purpose programming language designed for managing data held in a relational database management systems.) These results can be extremely powerful, as demonstrated by the fact that by tracking the search term “flu symptoms” this technique was able to detect regional outbreaks of the flu a week to 10 days before they are reported by the U.S. Centers for Disease Control and Prevention [12]. In the age of social networking, public opinion is as likely to be shaped by popular bloggers or those with many followers on Twitter as it is by traditional experts who write for newspapers or magazines. People who have the

164 / PENCE

power to shape the opinions and decisions of others because of their knowledge, relationship, or authority are called influencers. Marketers wish to identify the influencers in order to communicate with them and change their minds; social scientists view them as early indicators of future public opinion. One of the simplest tools used to identify influencers is a free service called Klout which measures the size and engagement of a user’s social media network based on their activity on Twitter, Facebook, Google+, LinkedIn, Foursquare, and Instagram data to arrive at a social influence or Klout score. These scores measure the overall influence of a user and range between 1 and 100, with 40 being the average. The disadvantage is that Klout does not allow you to isolate those who have influence around a specific topic. Marketers and social scientists wish to search for influencers who rank the highest for a specific topic or product can accomplish this using tools like Little Bird, Inkybee, or Cyfe [13]. Moving to a more sophisticated level of analysis requires the use of the data-processing tools that have been developed to handle the rapid growth in size of the World Wide Web. Search engines, like Yahoo and Google, were the first companies to work with datasets that were too large for conventional methods. In order to power its searches, Google developed a search strategy called MapReduce. The software distributes a task onto a multitude of processors which process the input. Traditional data warehouses use a relational database (like Excel rows and columns); Search engines need to handle non-relational databases, sometimes called NoSQL. The most popular software to handle NoSQL databases is called Hadoop, and several different versions are available as freeware. Hadoop is designed to collect data even if it doesn’t fit nicely into tables, distribute a query across a large number of separate processors, and then combine the results into a single answer in order to deliver results in almost real time. Hadoop jobs have traditionally been written in Java, but recently interfaces are being developed that make the process easier for lessexperienced operators. BIG DATA AND HIGHER EDUCATION Big Data is attracting attention among educators for several reasons. In an era when many students attend college with the hopes that this experience will lead to a job after graduation, there are various predictions that there will be a rosy future for graduates who understand how to use Big Data. In a recent report entitled, “Big Data: The Next Frontier for Innovation, Competition, and Productivity,” McKinsey & Company, a global management consulting firm, predicts that Big Data will become a key basis of competition and growth for individual firms. They predict that by 2018 the United States could face a shortage of 140,000 to 190,000 people with deep analytical skills as well as 1.5 million managers and analysts who can use big data [2]. Not to be outdone,

WHAT IS BIG DATA AND WHY IS IT IMPORTANT? /

165

Gartner, a leading information technology research and advisory company, says that 4.4 million IT jobs will be created by 2015 to support Big Data [14]. Even if these estimates are optimistic, it would appear that in the near future there will be a significant demand for people who have these skills. Colleges and universities have an opportunity as well as an obligation to give their students the skills needed to compete for these jobs. Big Data analytics is also playing an ever increasing role in science research. As sensors become cheaper and smaller, more scientific experiments depend on collecting so much data that it can only be analyzed by Big Data techniques. For example, the Square Kilometre Array (SKA) under development in Australia and South Africa will consist of 36 small antennas spread over more than 3000 km to simulate a single giant radio telescope [15]. Multiple radio-frequency or optical telescopes can be linked together to create a telescope array known as an interferometer, which can be more powerful than a single unit. The Square Kilometre Array will be one of the world’s largest and most sensitive interferometers. Ultimately, the project is expected to collect one Exabyte of data per day and so will certainly Big Data analysis. Construction of Phase 1 of the project is expected to begin in 2018. Another example of a major science project that requires Big Data analytics is the 17 mile diameter Large Hadron Collider (LHC) at CERN in Switzerland. The LHC Data Centre processes about one petabyte of data every day, which must be stored, processed, and analyzed. The Large Hadron Collider has 150 million sensors delivering data 40 million times per second, and the resulting data is stored on 83,000 physical disks. At peak rates, 10 gigabytes of data may be transferred from its servers every second. Sverre Jarp, Chief Technology Officer at CERN, says, “We are able to get results out as soon as we have the data. This is what big data analysis is all about, extracting the value from data immediately. We have open source tools; anyone can use them, which help us find the nuggets in the tsunami of data we face” [16]. Big Data is also causing a revolution in research in the social sciences. King points out that since 1995 there has been an onslaught of new and more informative social science data produced by the revolution in computer technology, the transformation of many records from analog to digital format, and the competition among governments to share data [17]. He contends that new types of research data about human behavior and society are being made possible by the availability of a wealth of information that previously had to be gathered manually. He also reminds his colleagues that there is a need for better ways to share data and protect privacy than so-called anonymous surveys, since in many cases it is relatively easy to identify the participants. King contends that social scientists need to form alliances with archivists and the legal community to find better ways to share and store data. It is imperative that social scientists begin asking critical questions about what all this data means, who gets access to it, how it is deployed, and to what ends.

166 / PENCE

FUTURE USES OF BIG DATA Several factors make it likely that the current rate of data accumulation will continue, if not accelerate, in the future. As microprocessors and sensors have become smaller and cheaper, they are being incorporated into many common devices, such as appliances, cars, and even light bulbs. According to a recent Pew Report, a major contributor to Big Data will be The Internet of Things (IoT), where, “A global, immersive, invisible, ambient networked computing environment built through the continued proliferation of smart sensors, cameras, software, databases, and massive data centers in a world-spanning information fabric known as the Internet of Things” [18]. The Internet of Things is the network of physical objects that contain embedded technology to communicate and sense or interact with their internal states or the external environment. Gartner predicts that the Internet of Things, excluding PCs, tablets, and smartphones, will consist of 26 billion units by 2020, an almost 30-fold increase from 0.9 billion in 2009 [19]. The Internet of Things will soon produce a massive volume and variety of data at unprecedented velocity [20]. It is the hope that the Internet of Things will eventually include continual information access and personalized content to everyone wherever and whenever he or she wishes. This goal is called ambient intelligence, where a network of low-power, light-weight, and low-cost devices create an environment that anticipates an individual’s needs and wirelessly creates personalized responses. Ambient intelligence will create a different type of Big Data problem, where it is necessary to sort through a wealth of data that is being supplied, take appropriate action where it can be automated, and prioritize the need for response where a personal action is required. Another potential source of large amounts of data is Unmanned Aerial Vehicles (UAVs), commonly called drones. The public has become aware of the military uses for drones which can stay aloft for days at a time and can include powerful sensors that collect large amounts of data or even carry weapons. According to an article by Ackerman, the U.S. Air Force is planning to create improved software tools that can integrate the video feeds from drones with imagery obtained from satellites, piloted spy planes, and other sources [21]. In addition, the U.S. Navy is developing its own set of drones that will perform underwater in ways that are similar to what the Air Force drones do in the sky. A big difference is that the navy drones will be self-powered and draw energy from the ocean’s thermocline, a pair of layers of warm water near the surface and chillier water below. This will allow them to move underwater for very long periods of time and be almost undetectable [22]. Use of drones by private parties and companies in the United States has been slowed by the failure to develop regulations about allowed uses, but despite this there is already considerable interest in using drones for public services and private information gathering. There is already a new generation of smaller

WHAT IS BIG DATA AND WHY IS IT IMPORTANT? /

167

and cheaper models that are becoming available for civilian applications. The kind of information drones can collect will be very valuable to corporate and commercial interests that deal with video and imagery. As the legal status of drones becomes clearer, their use is expected to expand rapidly. Matt Mikell, from Dell, predicts that, “drones will change the way businesses spend advertising dollars, survey land, harvest crops, speed traffic, reduce mining waste and protect more with less. The potential applications are endless and when matched with data from phones, cars, trucks, planes, trains and machinery, could lead to information evolution and a new industrial revolution” [23]. All these applications will require Big Data tools. The medical profession’s continuing shift to Electronic Health Record (EHR) systems represents another area that will require Big Data techniques to become truly effective [24]. EHRs may include a range of data on individual patient histories, including personal statistics, medical history, medication records, allergies, test results, immunization status, etc. as well as summary data that provides information about demographic groups. The American Medical Association (AMA) estimated that developing this capability along with improved medical records could save as much as $450 million in health care costs, but Big Data techniques would be necessary since the AMA says that current electronic health record systems lack the sophistication to manage the storage and retrieval of this much information [24]. BIG DATA AND PRIVACY Some applications of Big Data have already created controversy because of privacy concerns. Monsanto’s FieldScripts program combines a map of every corn field in the United States with climate information on these areas, and crop yields for various types of seeds in those locations. This can be used to create what is called Precision Planting, where a given field is planted with various types of seeds at different depths to produce the maximum possible yield. Despite early indications that yields are, indeed, increased, some farmers are protesting this practice because they feel that that they are losing control of their data and because the plan may lock them into a single seed provider [25]. The use of Big Data can be a serious threat to personal privacy, since every phone call, digital search, social network post, and tracked cell phone geolocation can be combined to create an unprecedented window into the private life of each citizen. The recent revelations by Edward Snowden have shown that the U.S. National Security Agency (NSA) and its international partners have been engaged in a massive surveillance program that covered both U.S. and foreign citizens as well as diplomats from other countries [26]. This is a major news story, which continues to unfold as this article is being written. In addition, social network sites, search engines, and most corporations collect vast amounts of information about the public.

168 / PENCE

Mark Sullivan writes that, “Virtually every piece of personal information that you provide online will end up being bought and sold, segmented, packaged, analyzed, repackaged, and sold again” [27]. In addition, public records, such as birth data, real estate records, criminal records, and political affiliation, have been scanned, digitized, and are being combined with our online personal data. Sullivan concludes that, “A child born in 2012 will leave a data footprint detailed enough to assemble a day-by-day, even a minute-by-minute, account of his or her entire life, online and offline, from birth until death.” This makes it hard to protect individual privacy, and recent experiences suggest that there are now so many public datasets available for cross-referencing that it is difficult to assure that any Big Data records can be kept private. In a number of cases, information from self-proclaimed “anonymous studies” has been tracked back to identify individuals and even their families. In 2006, Zimmer demonstrated that he could identify the “unnamed northeastern college” in a study that examined the Facebook profiles of 1,700 students to measure how their interests changed with time. The next step of identifying the individual students surveyed would have been relatively easy [28]. Similarly, in 2013 researchers reported that they had not only identified five people that were randomly selected from a 1,000 person “anonymous” genetics study but also found their entire families, even though the relatives had no part in the study—identifying nearly 50 people [29]. It is questionable if there is any such thing as a truly anonymous study in the age of Big Data. As companies gain new ways to capture information about the public, it becomes increasingly important to define ethical standards for Big Data. In 2014, the Data & Society Institute and the White House Office of Science & Technology Policy held a meeting to discuss “The Social, Cultural, & Ethical Dimensions of ‘Big Data’” [30]. The participants raised concerns that, “Algorithmic accountability is an uncertain enterprise, forcing people to try to reverse engineer what’s happening when something seems amiss. These dynamics leave many rightfully skittish, especially given that there’s a long history of discrimination in the United States.” In particular, the participants expressed a fear that marginal populations may be subjected to increased surveillance, and if they are found to be “at risk” because of predictive algorithms, the result may be to further marginalize these people. Who is accountable for the negative outcomes resulting from an algorithm, the implementers, the designers, or the organization that uses it? Despite reports that Facebook CEO Mark Zuckerberg does not believe in privacy [31], there is strong evidence that many of his users are concerned with privacy issues. As an example of these concerns, in 2014 a European high court ruled that Google and other search providers must remove links to material that the person feels infringes on their privacy. This is an issue where European and American traditions are diametrically opposed. In Europe a convicted criminal who has served his time and been rehabilitated can object to the publication of the

WHAT IS BIG DATA AND WHY IS IT IMPORTANT? /

169

facts of his conviction, but in America the publication of someone’s criminal history is considered to be protected by the First Amendment [32]. It remains to be seen how these conflicting attitudes might be reconciled. CONCLUDING THOUGHTS Big Data is already a part of our lives. Even though Big Data sounds like an esoteric topic only of interest to specialists, most people already use it every day when they do a Google search, plan a trip on Expedia, buy a book from Amazon, or ask their broker about a potential stock purchase. Because these actions are so transparent to the user (and often are free), it is easy to forget that Big Data tools are working in the background. Even less visible, but equally important, are cases where companies or governments are scrutinizing day-to-day operations to identify potential customers, terrorists, possible criminals, or just people with influence. The better that everyone recognizes this intrusion and works to understand the operations of the wizard behind the curtain, the more chance there will be to truly maintain control of our lives. In closing, it is important to remember that Big Data may also have a positive impact. danah boyd, a Principal Researcher at Microsoft Research, observed that data is becoming like the air we breathe; it can be a source of both pollution and sustenance. She outlined several important questions that must be dealt with before Big Data reaches its full potential [33]. Despite these concerns, she concludes that the application of Big Data tools will create a fundamental change in the way knowledge is created. Ultimately, the value of Big Data comes from the valuable patterns of knowledge that can be derived by making connections between pieces of data about an individual, among groups of individuals, or simply about the structure of information itself. boyd asserts, Just as Ford changed the way we make cars—and then transformed work itself—Big Data has emerged as a system of knowledge that is already changing the objects of knowledge, while also having the power to inform how we understand human networks and community.

boyd goes on to say that Big Data will even change the meaning of learning and asks, “. . . what new possibilities and limitations may come from these systems of knowing?” REFERENCES 1. M. Lev-Ram, What’s the Next Big Thing in Big Data? Bigger Data, Forbes, pp 233-238, 2014. 2. Big Data: The Next Frontier for Innovation, Competition, and Productivity. Retrieved June 2, 2013 from http://www.mckinsey.com/insights/business_technology/big_data_ the_next_frontier_for_innovation

170 / PENCE

3. B. Marr, The Awesome Ways Big Data Is Used Today To Change Our World. Retrieved June 3, 2014, from https://www.linkedin.com/today/post/article/2013111306515764875646-the-awesome-ways-big-data-is-used-today 4. P. Tucker, The Naked Future: What Happens in a World That Anticipates Your Every Move? Current (Penguin) Publishing, New York, 2014. 5. D. Weinberger, Too Big to Know: Rethinking Knowledge Now That the Facts Aren't the Facts, Experts Are Everywhere, and the Smartest Person in the Room Is the Room Basic Books, New York, p. 7, 2012). 6. M. Gualtieri, Is 750 MB Big Data? Retrieved June 4, 2014, from http://blogs. forrester.com/mike_gualtieri/12-12-05-is_750mb_big_data 7. M. J. Perry, Chart of the Day: The Falling Price of Memory. Retrieved June 9, 2014, http://www.aei-ideas.org/2013/04/chart-of-the-day-the-falling-price-of-memory/ 8. C. Eaton, T. Deutsch, D. Deroos, G. Lapis, and P. Zikopoulos, Understanding Big Data, McGraw-Hill, New York, p. 5, 2012. 9. John Wanamaker on advertising. Retrieved June 6, 2014, from http://message. snopes.com/showthread.php?t=46568 10. J. Ellenberg, What’s Even Creepier Than Target Guessing That You’re Pregnant? Retrieved June 9, 2014, from http://www.slate.com/blogs/how_not_to_be_wrong/2014/ 06/09/big_data_what_s_even_creepier_than_target_guessing_that_you_re_pregnant. html 11. What is BigQuery? Retrieved June 9, 2014, from https://developers.google.com/ bigquery/what-is-bigquery 12. M. Helft, Google Uses Searches to Track Flu’s Spread. Retrieved June 9, 2014, from http://www.nytimes.com/2008/11/12/technology/internet/12flu.html?_r=0 13. S. Olenski, Influencer Marketing Tools That Will Power Your Content in the Future. Retrieved June 9, 2014, from http://marketingland.com/influencer-marketing-toolswill-power-content-2014-86014 14. Gartner Says Big Data Creates Big Jobs: 4.4 Million IT Jobs Globally to Support Big Data By 2015. Retrieved June 2, 2014, from http://www.gartner.com/newsroom/ id/2207915 15. Square Kilometre Array. Retrieved June 11, 2014, from https://www.skatelescope.org/ 16. R. Barnes, Big Data Issues? Try Coping with the Large Hadron Collider. Retrieved June 12, 2014, from http://www.marketingmagazine.co.uk/article/1185012/big-dataissues-try-coping-large-hadron-collider 17. G. King, Ensuring the Data-Rich Future of the Social Sciences, Science, 331, 719-772, 2011. 18. J. Anderson and L. Raine, The Internet of Things Will Thrive by 2025. Retrieved June 14, 2014, from http://www.pewinternet.org/2014/05/14/internet-of-things/ 19. J. Rivera and R. van der Meulen, Gartner Says the Internet of Things Installed Base Will Grow to 26 Billion Units By 2020. Retrieved June 11, 2014, from http:// www.gartner.com/newsroom/id/2636073 20. M. Walker, The Internet of Things, Data Science and Big Data. Retrieved June 4, 2014, from http://www.datasciencecentral.com/profiles/blogs/the-internet-of-things-datascience-and-big-data 21. S. Ackerman, Welcome to the Age of Big Drone Data. Retrieved June 6, 2014, from http://www.wired.com/2013/04/drone-sensors-big-data/

WHAT IS BIG DATA AND WHY IS IT IMPORTANT? /

171

22. M. Thompson, The Navy’s Amazing Ocean-Powered Underwater Drone. Retrieved June 6, 2014, from http://swampland.time.com/2013/12/22/navy-underwater-drone/ 23. M. Mikell, Drones: Big Data App in the Making? Retrieved June 6, 2014, from http://en.community.dell.com/dell-blogs/dellsolves/b/weblog/archive/2014/03/17/ drones-big-data-app-in-the-making.aspx 24. J. Starren, M. S. Williams, and E. P. Bottinger, Crossing the Omic Chasm: A Time for Omic Ancillary Systems, The Journal of the American Medical Association, 309(12), 1237-1238, 2013. 25. J. Schumpeter, Digital Disruption on the Farm, The Economist, p. 64, May 24, 2014. 26. Global Surveillance Disclosures (2013-present). Retrieved June 7, 2014, from http:// en.wikipedia.org/wiki/Global_surveillance_disclosures_(2013%E2%80%93present) 27. M. Sullivan, Data Snatchers! The Booming Market for Your Online Identity, 2012. Retrieved June 26, 2012, from http://www.pcworld.com/article/258034/data_ snatchers_the_booming_market_for_your_online_identity.html 28. M. Zimmer, More on the “Anonymity” of the Facebook Dataset—It’s Harvard College (Updated). Retrieved June 4, 2014, from http://www.michaelzimmer.org/2008/10/ 03/more-on-the-anonymity-of-the-facebook-dataset-its-harvard-colleg/ 29. M. Gymrek, A. L. McGuire, D. Golan, E. Halperin, and Y. Yaniv Erlich, Identifying Personal Genomes by Surname Inference. Science, 339, 321-324, 2013. 30. Event Summary: The Social, Cultural, & Ethical Dimensions of “Big Data. Retrieved June 8, 2014, from http://www.datasociety.net/pubs/2014-0317/BigDataConference Summary.pdf 31. E. Van Buskirk, Report: Facebook CEO Mark Zuckerberg Doesn’t Believe In Privacy. Retrieved June 8, 2014, from http://www.wired.com/2010/04/report-facebook-ceomark-zuckerberg-doesnt-believe-in-privacy/ 32. J. Rosen, The Right to Be Forgotten. Retrieved June 14, 2014, from http://www. stanfordlawreview.org/online/privacy-paradox/right-to-be-forgotten?em_x=22 33. d. boyd and K. Crawford, Six Provocations for Big Data. Retrieved June 14, 2014, from http://softwarestudies.com/cultural_analytics/Six_Provocations_for_Big_Data.pdf

Direct reprint requests to: Dr. Harry Pence State University of New York and Oneonta 108 Ravine Pkwy. Oneonta, NY 13820 e-mail: penceheoneonta.edu