Online Profiling

on data doubles and social sorting

Martin Degeling (degeling@cs.cmu.edu)
02/09/2017

My Topics Today

  • theoretical considerations about privacy and and profiling as a technique of liquid surveillance
  • an analysis of the data doubles created by online profiling on the web

What is Profiling?

A Common Use Case:

Personalization

Profiling is applying algorithms for segmentation, categorization and classification of individuals and groups.

Profiling - a Definition

Profiling is a technique to automatically process personal and non-personal data, aimed at developing predictive knowledge from the data in the form of constructing profiles that can subsequently be applied as a basis for decision-making.

Ferraris et al. 2013.

Profiling is great!?

  • personalization simplifies our world in times of information overload. Personalized news feed, shopping suggestions, identification of potential terrorists
  • many profiling systems even work without PII - so no privacy concerns!?

Personalization vs. Filter Bubble

Pariser, Eli. 2011. The Filter Bubble: What The Internet Is Hiding From You. Penguin UK.

The Echo Chamber

Targeting vs. Discrimination

Datta, Tschantz, Datta. 2015. „Automated Experiments on Ad Privacy Settings: A Tale of Opacity, Choice, and Discrimination“.

Reduction of Risks vs. Exclusion

Profiling and Privacy?

There might be issues if personal data is involved but is profiling in general a privacy problem?

What is Privacy?

Dimensions of Privacy

  • Locational ~ the right to be let alone
  • Informational ~ the right to know what, why, when and by whom
  • Decisional ~ the right to make decisions on your own
Rössler, Beate. 2005. The Value of Privacy. Translated by R. D. V. Glasgow. English ed. Cambridge, UK ; Malden, MA: Polity.

locational privacy

My home is my castle - but rooms can also be virtual

informational privacy

in Germany:

The right to informational self-determination guarantees* the power for each individual to decide on her own what personal data she releases and for what purpose. (Supreme Court of German, 1983)

*limitations apply

decisional privacy

the freedom to live your life, act and behave like you want requires privacy. It enables you to make decisions without unwanted influences.

Example: the pregnancy prediction score

Privacy and the liberal Hypothesis

Privacy is not an absolute value but one pillar of liberal, democratic societies. It is not just a right, but a necessity for autonomous individuals who are the central element of modern societies.

Profiling and Privacy

profiling is violating privacy as it is acting on a cybernetic hypothesis rather than a liberal one

the connection can be made through theories on power and surveillance

Theories of Power

a complex system of power relations manages the balance between

  • individual vs. societal interests
  • privacy vs. security

how this power relations manifest is changing

Disciplinary Societies

in modern societies there is no longer a higher force that executes (arbitrary) punishment, but by hierarchies that are immanent to social and physical structures

Examples are schools, factories or prisons..

Foucault, Michel. 1977. Discipline and Punish: The Birth of the Prison. Vintage Books.

The Panopticon

  • an architecture that relies the internalization of power
  • symbol for modern, disciplinary society - in contrast to those that rely on punishment
Foucault, Michel. 1977. Discipline and Punish: The Birth of the Prison. Vintage Books.

Next: Societies of Control

the control of populations no longer focuses on confining or soul-training individuals into conformity and obedience, but rather encourages their mobility, consumption, and connectivity

Deleuze, Gilles. 1992. “Postscript on the Societies of Control.” October 59 (January): 3–7.

from disciplinary to control societies

at universities: the move from conformity to diversity: though we have more freedom in how you can look, the ideals of hard work are still in place

instead of a cane:

  • self directed learning that requires self-control
  • competition instead of punishment ("be creative!")

From Credit Scoring to Social Scoring

the shift from redlining to "social credit"

is a move from insurmountable discrimination to a dynamic system that (ostensibly) values individuals

Liquid Surveillance

Societies of control requires continuous data flows and the creation of data doubles

Surveillance is

  • invisible but implicit
  • evolving

Profiling is one example of technology that does both.

Bauman, Zygmunt, and David Lyon. 2012. Liquid Surveillance: A Conversation. 1st ed. Cambridge, UK ; Malden, MA: John Wiley & Sons.

Towards a Cybernetic Hypothesis

  • a shift in the conception of individuality from the liberal idea of autonomous individuals to dividuals that can be governed by data
  • from training for obedience to controlling the future
Tiqqun. 2012. The Cybernetic Hypothesis. translation collective.

Example: Cambridge Analytica

Is the age of liberal democracies over?

no

  • we do not have to trust the algorithms
  • the technology needs to be demystified

Online Profiling

Inferring high level personality profiles out of ordinary web tracking

Online Profiling as Liquid Surveillance

various companies create profiles from web tracking leading to data doubles consisting of information about

  • interests
  • demographic information
  • political association
  • habits
  • future behavior

Who is doing it?

Google

assigns out of ~1200 Interests in 24 Categories

People & Society - Social Issues & Advocacy - Privacy Issues
People & Society - Family & Relationships - Family - Parenting - Babies & Toddlers - Diapering & Potty Training

Facebook

a recent crowd sourcing study found ~30,000 different items in 26 categories that can be assigned to a user's profile

Recently controversial: Ethnic Affinity

Angwin, Julia, Terry Jr. Parris, and Surya Mattu. 2016. “Facebook Doesn’t Tell Users Everything It Really Knows About Them.” ProPublica. December 27.

Bluekai

user specific; based on opt-out tracking and secondary data

bluekai.com/registry

Alexa

site-specific; based on opt-in tracking and surveys

alexa.com/siteinfo/economist.com

Quantcast

site-specific; based on opt-out tracking and surveys

quantcast.com/economist.com

Are they really able to do it?

How to study the blackbox

  1. simulate a web session (selenium, generate traffic from reddit)
  2. visit websites and observe the outcomes

1. Iteration - Google

Who tracks us on the web

What do Profiles by Google look like

2. Iteration - Bluekai

Bluekai

Does Online Profiling work?

  • Profiles are inherently noisy
  • They are also unreliable. Visiting the same websites leads to only 77% overlap between profiles

3. Iteration - Obfuscation

Can we influence it?

Can we influence it?

Does obfuscation work?

  • 5% obfuscation traffic adds 40% extra interests
  • pre-post obfuscation comparison has a 55% overlap

The Takeaways

Profiling Techniques

  • operate under a different hypothesis than what is the basis for privacy
  • are not as good as they promise

Embrace your data double!

https://addons.mozilla.org/en-US/firefox/addon/tricktracking/

Thanks