Skip to Main Content

Copyright & Intellectual Property Tutorial

Big Data

A large number of our day-to-day activities involves the use of technology, it is important to consider the implications of our digital footprints and understand how to protect our data privacy.

This lesson provides an overview of the concept of big data and how it is used across sectors of society.

What is 'big data' and where does it come from?


Watch a 2-minute overview of how big data is created, analyzed, and used.

Any time you interact with a computer system, you create a data trail that records the details of your day-to-day activity.

This data trail is comprised of metadata - information about you and your activities as captured by the devices and networks with which you interact. These devices are part of the Internet of Things - networks of personal devices, medical devices, home and travel automation devices, card readers, license plate readers, and cameras linked with facial recognition software.

'Big data' refers to the statistical analysis of this personal data to identify individual and population trends.

Big data and metadata

In 2013, leaders of the Senate Select Committee on Intelligence defended a controversial government surveillance program that collects phone call records by stating, "This is just metadata." The term 'metadata' sounds technical and impersonal, and they made the point that the content of phone conversations was not collected under this program.

However, the large-scale collection and analysis of personal metadata over time -- in other words, big data -- can reveal many details about one's private life. Matthew Harwood, a senior writer and editor for the American Civil Liberties Union, used a data visualization tool to map his Gmail contacts and email timestamps. In his 2013 post to the ACLU's Free Future blog, Harwood writes:

"When visualized and analyzed over time, my data reveals my family members--who are all tightly grouped and linked together--and those people who I am, or was, closest to in each phase of my private and professional life....

The data visualization also shows potential discord over time.... (If someone were to target individuals to gather dirt on me, my guess is they would start with people who had a history of communication with me and then suddenly trailed off or fell off a cliff, visually speaking....) This is why mathematician and former Sun Microsystems engineer Susan Landau told the New Yorker's Jane Mayer that metadata is 'much more intrusive than content.'

Big Data in the Private Sector

In the private sector, big data can be big business. Information about individuals can be:

Collected (recorded) during transactions and activities

Aggregated with information about other individuals

Analyzed statistically and mined for individual and population trends

Sold as business intelligence, often to marketing and retail companies.

Examples of big data in the private sector include: retail advertising, workplace monitoring, and personalized medicine.

Retail Advertising

Marketing and retail companies use big data about individuals' shopping habits to produce targeted advertising, such as online ads or coupons printed at the register.

Watch this two-minute news clip which reveals how Target stores discovered that a teenage shopper was pregnant before her own family knew, based on her shopping habits:


In an interview about Target's marketing analytics program, statistician Andrew Pole stated:

"If you use a credit card or a coupon, or fill out a survey, or mail in a refund, or call the customer help line, or open an e-mail we've sent you or visit our Web site, we'll record it and link it to your [Target] Guest ID. We want to know everything we can."

Workplace Monitoring

Increased use of technology, both personal and professional, has created new opportunities for employers to monitor their employees' work attendance, productivity, and activities outside of the office.

"Workforce management" systems allow employers to monitor, collect and analyze employees'

  • attendance
  • activity
  • keystrokes
  • Internet use
  • communications over phone, email, and instant messenger
  • biometric data like tone of voice and blood pressure.

Such systems are often implemented to control costs associated with inefficient staffing or unproductive employees.

Retail point-of-sale systems track cashiers' idle time between transactions, as well as their success in convincing customers to upgrade or supersize orders, or sign up for the store credit card or customer loyalty discount card.

This point-of-sale data is utilized to create optimized staff schedules based on business conditions; however, automated scheduling often fails to take into account employees' availability, family commitments, and need for predictable work schedules and reliable income when shifts are added, dropped, or changed on short notice. While attendance systems are implemented to reduce time theft - arriving late, leaving early, or taking extended breaks - by employees, it can also result in wage theft by employers, such as when managers require staff to clock-out early but continue working until the end of a shift, or clock employees out for breaks they didn't actually take.

Fulfillment and delivery companies like Amazon and UPS use workforce monitoring services that incorporate GPS location data to monitor performance indicators regarding on-time delivery of packages, the best delivery routes, and fuel consumption.

While workplace monitoring can result in process improvements for greater efficiency, there can be unintended consequences, including occupational injury, lapses in safety practices, and employee stress and dissatisfaction.

Personalized Medicine

Another private sector big data application is personalized medicine. Health information from a patient's electronic health records, family history, insurance claims, and fitness devices can be combined with population health data to identify trends, predict disease, and optimize treatment. IBM has partnered with medical researchers and healthcare providers to develop IBM Watson Health, a cloud-based artificial intelligence system that can proactively monitor an individual's health and suggest medications or other treatment options.

Video on IBM Watson Health

However, the digitization of personal health information and networking of medical devices, from pacemakers to MRI machines, can leave health care systems vulnerable to hacking, data breaches, or misuse.

Big Data in the Public Sector

Big data in the public sector is used for surveillance to support counter-terrorism and law enforcement efforts.


Federal agencies gained new domestic surveillance capabilities with the passage of the USA PATRIOT Act, written in response to the September 11, 2001 terrorist attacks. The USA PATRIOT Act, or United and Strengthening America by Providing Appropriate Tools Required to Intercept and Obstruct Terrorism Act, amended the Foreign Intelligence Surveillance Act (FISA). It enabled federal authorities to monitor communications and to search business records, without obtaining warrants, for counter-terrorism purposes.

Under the USA PATRIOT Act, business records include things many people presume to be private, like medical records, educational records, credit card purchases, and Internet searches. Created by everyday technology-mediated activities, these records are considered "third party records" held by the service providers, such as medical offices, insurance companies, educational institutions,  education technology companies, credit card companies, Internet service providers (like Google), and phone service providers (like Verizon).

Critics of the USA PATRIOT Act say that it weakens constitutional protections from warrantless and "unreasonable searches and seizures" under the Fourth Amendment, commonly called the right to privacy.

In June 2015, then President Obama signed the USA FREEDOM Act of 2015 into law. This act (AKA as Uniting and Strengthening America by Fulfilling Rights and Ensuring Effective Discipline Over Monitoring) introduces some constraints on government surveillance. It requires federal intelligence agencies to discontinue bulk record collection; the government can only collect data about a specific person, account, or device, and would have to demonstrate how that entity is associated with a foreign or terrorist threat. (Phone companies maintain these records according to data retention policies governed by other laws, turning them over to intelligence agencies with a FISA court subpoena.)

In December 2015, President Obama signed the Consolidated Appropriations Act, 2016 into law.  This bill includes the Cybersecurity Information Sharing Act, or CISA. CISA's purpose is to facilitate information sharing about cybersecurity threats between companies and government agencies. Some critics claim that CISA provides inadequate protections for personal information, increases the government's surveillance capabilities, and provides immunity for companies with lax cybersecurity practices without actually improving cybersecurity.

Predictive Policing - Law Enforcement

Predictive policing applies big data methodologies of combining information from sources like crime statistics, surveillance camera footage and social media content to anticipate and prevent crimes by maintaining a proactive law enforcement presence in high-risk areas at high-risk times.

While predictive policing methods have successfully reduced crime rates in some cities, there is a concern that they can contribute to discriminatory profiling, reduced reasonable suspicion standards, and increasingly intrusive surveillance practices that violate Fourth Amendment privacy protections. This segment from VICE News discusses some of these concerns:


Big Data in Higher Education

Learning Analytics

The application of big data in education is called learning analytics.  Learning analytics stakeholders include: researchers, institutions, instructors, students, governments and policy makers. A common application of learning analytics is to identify those students at risk of failing or withdrawing from a course to connect them with educational support services, like subject tutoring. Learning analytics can also be used to predict a student's likelihood of success in a course, or to personalize the experience of an online course to a student's strengths, learning styles, and interests. On an institutional level, learning analytics can suggest intervention initiatives to increase student retention or to increase the on-time graduation of students.

While learning analytics can be applied to improve the teaching and learning environment, there are other technological and ethical implications of the large-scale collection of information about students. There are no established guidelines for students to correct or remove learning analytics records, for educational data to be anonymized, or for researchers to acquire informed consent from students whose data they are using. Inaccurate algorithms used to profile at-risk students can incorrectly stereotype learners. Accessing learning management systems or textbook publishers' online platforms from a personal device can result in the collection of non-educational data, including location tracking and other data. Students should be aware and have access to data collection, management, transfer and retention policies and have the ability to opt-out of data collection for analysis purposes.

Colleges can purchase demographic, economic, and performance information about high school students from standardized testing companies to develop admissions recruitment strategies. This can result in discriminatory recruitment practices. The market for student profiles includes test-prep companies, scholarship organizations, and businesses who want to advertise products - including credit cards, fast food, technology, and fashion - to college students. In 2014, a University of Montana student sued the institution over allegations of privacy violations when the university shared personally identifying information, including names, addresses, and social security numbers, of students with a company that is criticized for marketing debit cards with predatory fees to college students. The University of Montana claims that student information disclosures are allowable under U.S. Department of Education regulations, including FERPA (Family Educational Rights and Privacy Act, discussed in the Privacy lesson).