- Environmental Sciences - May 24
Intel invests in UK institute to create Global Centre for Research in Sustainable Connected Cities - Literature - May 24
Queen Victoria's personal journals put online - Agronomy - May 24
Diagnostic labs analyze anything from bugs to toenails - Medicine - May 24
UCLA launches first face transplantation program in western U.S - Environmental Sciences - May 24
Road2Science: Researching Stronger, Safer, Smarter Infrastructure - Physics - May 24
Get ready for the transit of Venus! - Medicine - May 24
Hormone Plays Surprise Role in Fighting Skin Infections - Business - May 24
Engineering a better society - Law - May 24
Latest UT/Texas Tribune Poll: Tax Pledge Issue Reveals Conservative Divide - Medicine - May 24
Device may inject a variety of drugs without using needles - Medicine - May 24
Stopping drug- induced liver injury - Medicine - May 24
Penn Offers Benefits- tax Offset to Same- sex Couples - Environmental Sciences - May 24
Lighting control system at U-M saves energy and costs - Life Sciences - May 24
UC San Diego Receives $7 Million from DOD for Innovative Neural Research - Social Sciences - May 24
Better response plans needed for children exposed to domestic violence - Physics - May 24
Exotic particles, chilled and trapped, form giant matter wave
By category
AdministrationChemistry
Physics
Computer Science
Environmental Sciences
Earth Sciences
Life Sciences
Medicine
Business
Literature
History
Psychology
Social Sciences
» » more
Not so anonymous
17 October 2011 - HARVARD
Latanya Sweeney challenges outdated policies with a quantitative, computational approach to privacy
De-identified prescription data: is it really anonymous? Latanya Sweeney aims to make personal data more secure and to provide recourse for people who are harmed by privacy breaches. Photo courtesy of Flickr user Dan Buczynski.
When you visit the pharmacy to pick up your antidepressants, your cholesterol medication, or your birth control pills, you may expect a certain measure of privacy.
In reality, prescription information is routinely sold to analytics companies for use in research and pharmaceutical marketing. That information might include your doctor’s name and address, your diagnosis, the name and dose of your prescription, the time and place where you picked it up, your age and gender, and an encoded version of your name.
Under federal privacy law, this data sharing is perfectly legal. As a safeguard, part of the Health Insurance Portability and Accountability Act (HIPAA) requires that a person "with appropriate knowledge of and experience with generally accepted statistical and scientific principles and methods" must certify that there is a "very small" risk of re-identification by an "anticipated recipient" of the data.
But Latanya Sweeney (A.L.B. ’95), a Visiting Professor of Computer Science at Harvard’s School of Engineering and Applied Sciences (SEAS), warns that loopholes abound. Even without the patients’ names, she says, it may be quite easy to re-identify the subjects.
Sweeney suspects this to be the case because she herself is an expert at matching "anonymous" data with other public records, exposing security flaws. Keeping data private, she insists, involves far more than just the removal of a name--and she’s eager to prove it.
In 2000, Sweeney analyzed data from the 1990 census and revealed that, surprisingly, 87% of the U.S. population could be uniquely identified by just a Zip code, date of birth, and gender. Given the richness of the secondary health data sold by pharmacies and analytics companies, she says, it should be quite easy to determine patient names and strike upon a treasure trove of personal medical information.
Not that she’s particularly interested in whether you’re taking Lipitor or Crestor .
Instead, Sweeney, the founder and director of Harvard’s Data Privacy Lab , aims to expose the weaknesses in existing privacy laws and security mechanisms in order to improve them. By challenging current policy, she hopes to demonstrate that stronger, more complex algorithmic solutions are necessary for the effective protection of sensitive data.
A fair trade
A computer scientist at heart, Sweeney has teamed up with colleagues at the Center for Research on Computation and Society at SEAS, investigating a range of new economic models for data sharing and protection.
With SEAS faculty David Parkes (Gordon McKay Professor of Computer Science) and Stephen Chong (Assistant Professor of Computer Science), as well as Alex Pentland at MIT, Sweeney advocates a "privacy-preserving marketplace" in which society can reap the benefits of shared data in (especially) the scientific and medical arenas, while also protecting individuals from economic harm when those data are shared beyond their original intended use.
"We don’t want data to be locked away and never used, because we could be doing so much more if people were able to share data in a way that’s trustworthy and aligned with the intentions of all the participants," says Parkes (pictured at right).
Medical data, genetic data, financial data, location data, purchasing histories: all of these are extremely valuable pieces of information for social science research, epidemiology, strategic marketing, and other behind-the-scenes industries. But if one database can be matched up with another--and, as Sweeney has demonstrated, it often can--then an interested party can easily generate a detailed picture of a specific individual’s life.
This can be both useful and damaging, as when participants in a genomic study help advance science but then find themselves unable to obtain life insurance.
Other harms are easy to imagine, says Sweeney:
"They might know that they have cancer and all of a sudden their credit card debt is going crazy; or they may not get that promotion at work; or they may get fired because all of a sudden now little Johnny has this very expensive heart disease and they’re a big liability."
De-identified prescription data: is it really anonymous? Latanya Sweeney aims to make personal data more secure and to provide recourse for people who are harmed by privacy breaches. Photo courtesy of Flickr user Dan Buczynski.
When you visit the pharmacy to pick up your antidepressants, your cholesterol medication, or your birth control pills, you may expect a certain measure of privacy.
In reality, prescription information is routinely sold to analytics companies for use in research and pharmaceutical marketing. That information might include your doctor’s name and address, your diagnosis, the name and dose of your prescription, the time and place where you picked it up, your age and gender, and an encoded version of your name.
Under federal privacy law, this data sharing is perfectly legal. As a safeguard, part of the Health Insurance Portability and Accountability Act (HIPAA) requires that a person "with appropriate knowledge of and experience with generally accepted statistical and scientific principles and methods" must certify that there is a "very small" risk of re-identification by an "anticipated recipient" of the data.
But Latanya Sweeney (A.L.B. ’95), a Visiting Professor of Computer Science at Harvard’s School of Engineering and Applied Sciences (SEAS), warns that loopholes abound. Even without the patients’ names, she says, it may be quite easy to re-identify the subjects.
Sweeney suspects this to be the case because she herself is an expert at matching "anonymous" data with other public records, exposing security flaws. Keeping data private, she insists, involves far more than just the removal of a name--and she’s eager to prove it.
In 2000, Sweeney analyzed data from the 1990 census and revealed that, surprisingly, 87% of the U.S. population could be uniquely identified by just a Zip code, date of birth, and gender. Given the richness of the secondary health data sold by pharmacies and analytics companies, she says, it should be quite easy to determine patient names and strike upon a treasure trove of personal medical information.
Not that she’s particularly interested in whether you’re taking Lipitor or Crestor .
Instead, Sweeney, the founder and director of Harvard’s Data Privacy Lab , aims to expose the weaknesses in existing privacy laws and security mechanisms in order to improve them. By challenging current policy, she hopes to demonstrate that stronger, more complex algorithmic solutions are necessary for the effective protection of sensitive data.
A fair trade
A computer scientist at heart, Sweeney has teamed up with colleagues at the Center for Research on Computation and Society at SEAS, investigating a range of new economic models for data sharing and protection.
With SEAS faculty David Parkes (Gordon McKay Professor of Computer Science) and Stephen Chong (Assistant Professor of Computer Science), as well as Alex Pentland at MIT, Sweeney advocates a "privacy-preserving marketplace" in which society can reap the benefits of shared data in (especially) the scientific and medical arenas, while also protecting individuals from economic harm when those data are shared beyond their original intended use.
"We don’t want data to be locked away and never used, because we could be doing so much more if people were able to share data in a way that’s trustworthy and aligned with the intentions of all the participants," says Parkes (pictured at right).
Medical data, genetic data, financial data, location data, purchasing histories: all of these are extremely valuable pieces of information for social science research, epidemiology, strategic marketing, and other behind-the-scenes industries. But if one database can be matched up with another--and, as Sweeney has demonstrated, it often can--then an interested party can easily generate a detailed picture of a specific individual’s life.
This can be both useful and damaging, as when participants in a genomic study help advance science but then find themselves unable to obtain life insurance.
Other harms are easy to imagine, says Sweeney:
"They might know that they have cancer and all of a sudden their credit card debt is going crazy; or they may not get that promotion at work; or they may get fired because all of a sudden now little Johnny has this very expensive heart disease and they’re a big liability."
After-the-fact protections for some of these types of discrimination do exist, but mechanisms to compensate for these harms fairly--or to prevent them entirely--are weak.
Sweeney, Parkes, Chong, and Pentland theorize (in a working paper) that if one were able to accurately quantify the risk of leaky data, a privacy-preserving marketplace could compensate participants at a level according to that risk. In other words, if we put aside the expectation of 100% anonymity and security, a more trustworthy system might take its place.
As it so happens, techniques in computer science and statistics (such as differential privacy, a specialty of Salil Vadhan and others at Harvard) do allow us to quantify the risk of harm.
A remaining question, then, is whether the average individual is capable of understanding a 4% risk versus a 14% risk and acting rationally upon it.
Rethinking policy
"The role of privacy policy, in a system where individuals are going to exercise autonomy, is to make sure they don’t shoot themselves in the foot," asserts Sweeney.
"One experiment after another has shown that people will make poor decisions about anything that involves their privacy. They want the new utility, they want the new shiny thing, because we tend to discount that any harm is going to happen to us, even when we’re told that it could."
Sweeney and her colleagues suggest a marketplace where computational and cryptographic techniques guarantee a certain measure of privacy, subjects are compensated according to the level of risk they incur by participating, and government policy backs the whole thing up--perhaps by mandating insurance against major losses.
"Generally, the problem with policy is that it can’t be very nuanced," explains Parkes. "But maybe you can use policy to regulate the way marketplaces work, and then let the market solve the optimization problem."
Most federal privacy regulations, which Sweeney calls "sledgehammer" policies for their lack of finesse, were written in an era without digital records, without the Internet, and without fast computers.
Weakly anonymous data did not pose much of a threat 30 years ago; no one was likely to pore through millions of records by hand to find patterns and anomalies. Now, everything has changed.
"Here we are," observes Sweeney, "right in the middle of a scientific explosion in both social science data and genomic data, and these kinds of notions from HIPAA and 1970s policies are an ill fit for today’s world of what we might call ’big data,’ where so many details about us are captured."
For example, she says, the "fair information practices" spelled out in the 1974 Privacy Act allow you to view your records and challenge the content, but you don’t get to decide who can report to them or who else gets to see them. The hospital makes you sign to affirm that you’re aware of the institution’s data sharing policy, but you can’t really opt out of it if you want medical treatment. And while data released by entities covered by HIPAA are required to be de-identified, Sweeney believes that the standards for anonymity are too vague.
A vocal advocate of change at the national level, Sweeney backs up her assertions with real technological solutions , including original software that identifies risks in data sets. And the authorities are listening. In 2009 she was appointed to the privacy and security seat of the federal Health Information Technology Policy Committee , and in 2011 her work was cited in a high-profile U.S. Supreme Court case (see sidebar).
"She has a very deep knowledge in policy issues and is developing a very interesting network of people in industry that are able to advise her agenda and inform it," notes Parkes. "I think Latanya, more than anyone else in this space, is able to figure out the really important questions to ask because of this network and the expertise she’s built up over the years."
Can computer scientists, policy experts, privacy advocates, and corporations produce a system that simultaneously allows the productive sharing of data while guaranteeing some degree of privacy?
"Right now we’re seeing lawsuits, with courts able to provide only a coarse response to these important questions," says Parkes.
Unless lawmakers and institutions thoroughly rethink privacy protections, Sweeney warns, "either we’re going to have no privacy because they’re ineffective, or we’re going to lose a tremendous resource that these data have the potential to provide."
Links
Harvard UniversityLast job offers
- Civil Engineering - 24.5
Wissensch. Assistent/in MINERGIE® Agentur Bau (80–100 %) - Agronomy - 22.5
Wissenschaftliche Mitarbeiter/in Koordination Agrar-Umweltindikatoren - Social Sciences - 21.5
wissenschaftliche Mitarbeiterin/ wissenschaftlicher Mitarbeiter - Electroengineering - 21.5
Sektionsleiter/in - Electroengineering - 21.5
Elektroingenieur/in FH - Life Sciences - 17.5
Hochschulabsolventen (m/w) Fachrichtungen Biologie, Mikrobiologie, Bio-Informatik... - Medicine - 25.5
Chair of Paediatrics (Associate Professor-Professor) - Earth Sciences - 24.5
2012-05-24 at the Department of Geological Sciences. Reference number SU 612-1718-12. Deadline for applications:... - Pedagogy - 24.5
Professur für Erziehungswissenschaft (Allgemeine Pädagogik) - Pedagogy - 24.5
Schulpädagogik (mit dem Schwerpunkten Schulforschung und Allgemeine Didaktik) - Medicine - 24.5
Chair in Bacteriology - YMS360A - Business - 24.5
Associate Professor in Operations Management - Business - 23.5
Full, Assoc, or Asst. Professor in Marketing - Life Sciences - 23.5
Open Rank Professor - Pathology & Lab Med



» Share this page: