Data Profiling and the Anonymization Myth

Four credit card transactions, the merchant, the amount, the date, are enough to identify nine out of ten people in a dataset stripped of names, account numbers, and every other obvious identifier. This is not a vulnerability. It is how the system was built.

In 2015, researchers at MIT published a study in Science demonstrating that 90% of individuals in an anonymized dataset of 1.1 million credit card records could be re-identified using only four data points: location, approximate date, and transaction amount, with no name or account number required. The de Montjoye et al. finding did not produce new legislation. It produced a press release.

The word "anonymized" does substantial rhetorical work in the data economy. It appears in privacy policies where it functions as an assurance, a signal that the company in question has performed due diligence before selling your behavioral record to a downstream broker. What it actually describes is the removal of overt identifiers from a dataset whose remaining structure makes re-identification trivial. Names are noise. The pattern of your behavior is signal.

The mechanics of data profiling are banal enough to describe plainly. A data broker acquires purchase history from retail loyalty programs, location pings from mobile apps that share SDK data, and browsing records from ad-tech intermediaries. None of these sources contains your name. Together they constitute a fingerprint more precise than your signature. The broker then packages this as an "audience segment" and sells it to insurers, employers, and political campaigns. The insurer does not know your name. They know that you visited a rheumatology clinic twice in six months and stopped refilling a prescription.

The broker does not know your name. They know that you visited a rheumatology clinic twice in six months and stopped refilling a prescription.

In January 2025, Gravy Analytics, one of the larger location data brokers, disclosed a breach that exposed the precise movement history of tens of millions of people. The same month, the FTC issued a final order prohibiting Gravy Analytics and its subsidiary Venntel from selling sensitive location data, defined as visits to medical facilities, religious sites, political gatherings, and similar destinations. This is the regulatory ceiling in 2025: a prohibition on selling the most obviously harmful subcategory, applied to one company, after a breach.

That FTC order arrived four months after the agency published a September 2024 staff report characterizing the data collection practices of nine major platforms, including Meta, TikTok, and YouTube, as "vast surveillance." The report documented the commercial trade in behavioral data, the use of that data to build inferences about mental health and political affiliation, and the absence of meaningful consent mechanisms. It did not trigger enforcement. It is cited here because it confirms, from within the government's own investigative apparatus, the basic picture that researchers have been describing since 2015: the infrastructure is operating exactly as designed.

The Employment Machine

The same profile that sells you a medication advertisement can screen you out of a job. A lawsuit filed in February 2024 against Workday, the human resources software company, alleged that its AI-powered screening tools systematically excluded older applicants and those with disabilities from consideration. In July 2024, Judge Rita Lin of the Northern District of California allowed the plaintiff's "agent" theory to proceed, the argument that Workday functioned not as a neutral tool but as an agent of employment discrimination, exercising its own discretionary judgment. The case was certified as a collective action in May 2025.

A parallel question concerns the models upstream of any particular hiring platform. In October 2024, researchers at the University of Washington released a study examining how large language models handled resume screening. The study found that when resumes were identical except for the name at the top, LLMs favored white-associated names 85% of the time and ranked resumes with Black men's names last in zero percent of tested scenarios, meaning the models never once placed a Black male applicant at the top of a shortlist when a comparable white-associated name was available.

The standard industry response to studies of this kind is that the model is trained on historical data, and historical data reflects historical bias, and this is a problem the industry is actively working to address. This explanation is accurate and irrelevant. The model is already in use. The resume was already filtered. The applicant already did not get a call. The feedback loop operates faster than the remediation cycle.

What Profiling Actually Produces

A profile is not a record of who you are. It is a statistical prediction of what you will do next, assembled from observations of people who resemble you in the dimensions the model was trained to recognize. The distinction matters because the error mode of a profile is not noise but inherited bias, reproduced at scale and applied to decisions you cannot see or contest.

You did not consent to being modeled. You consented to a loyalty card program, a free app, a terms-of-service agreement written to be unread. Each of those transactions fed a supply chain that ends in an algorithm making consequential decisions about your creditworthiness, your insurability, or your employability. The original data was "anonymized." The decision at the end was made about you specifically.

The Gravy Analytics order prohibits the sale of location data showing visits to specific sensitive locations. It does not prohibit the collection of that data, the use of it internally, or the sale of aggregate behavioral models derived from it. The structure that made the breach harmful remains in place. The prohibition is on distribution, not architecture.

We are not unmoved by this. We are simply noting that the MIT finding from 2015, four transactions and ninety percent re-identification, described a fundamental property of behavioral data. Not a bug. Not a vulnerability that can be patched. A property. The field of "privacy-preserving data analysis" has made genuine technical progress in the decade since. The market for behavioral data has grown faster.

Sources

Yves-Alexandre de Montjoye et al., "Unique in the Shopping Mall: On the Reidentifiability of Credit Card Metadata," Science, January 2015. MIT News coverage ↗

FTC, "FTC Finalizes Order Prohibiting Gravy Analytics, Venntel From Selling Sensitive Location Data," January 14, 2025. FTC press release ↗

FTC Staff Report, "A Look Behind the Screens: Examining the Data Practices of Social Media and Video Streaming Services," September 2024. FTC press release ↗

Mobley v. Workday, Inc., No. 3:23-cv-00770 (N.D. Cal.). July 12, 2024 order allowing agent theory; May 2025 collective action certification. Case summary ↗

University of Washington, "AI Resume Screening Shows Racial and Gender Bias, UW Study Finds," October 31, 2024. UW News ↗