Research Explores Data Mining, Privacy
Welcome to TMCnet.com
TMC Launches New Web Sites: Cable WiMAX  |  Satellite  |  Robotics  |  IT | IVR |   ITEXPO West begins in:   REGISTER NOW!
Columnists:
E-mail this page to a friend Order reprints online Print this page Bookmark this page Free magazines Free newsletters RSS-XML alerts
Digg this article!

[June 17, 2006]

Research Explores Data Mining, Privacy

(AP) Research Explores Data Mining, Privacy
By BRIAN BERGSTEIN
CAMBRIDGE, Mass.

As new disclosures mount about government surveillance programs, computer science researchers hope to wade into the fray by enabling data mining that also protects individual privacy.

Largely by employing the head-spinning principles of cryptography, the researchers say they can ensure that law enforcement, intelligence agencies and private companies can sift through huge databases without seeing names and identifying details in the records.



For example, manifests of airplane passengers could be compared with terrorist watch lists -- without airline staff or government agents seeing the actual names on the other side's list. Only if a match were made would a computer alert each side to uncloak the record and probe further.

"If it's possible to anonymize data and produce ... the same results as clear text, why not?" John Bliss, a privacy lawyer in IBM (News - Alert) Corp.'s "entity analytics" unit, told a recent workshop on the subject at Harvard University.



The concept of encrypting or hiding identifying details in sensitive databases is not new. Exploration has gone on for years, and researchers say some government agencies already deploy such technologies -- though protecting classified information rather than individual privacy is a main goal.

Even the data-mining project that perhaps drew more scorn than any other in recent years, the Pentagon's Total Information Awareness research program, funded at least two efforts to anonymize database scans. Those anonymizing systems were dropped when Congress shuttered TIA, even while the data-mining aspects of the project lived on in intelligence agencies.

Still, anonymizing technologies have been endorsed repeatedly by panels appointed to examine the implications of data mining. And intriguing progress appears to have been made at designing information-retrieval systems with record anonymization, user audit logs -- which can confirm that no one looked at records beyond the approved scope of an investigation -- and other privacy mechanisms "baked in."

The trick is to do more than simply strip names from records. Latanya Sweeney of Carnegie Mellon University -- a leading privacy technologist who once had a project funded under TIA -- has shown that 87 percent of Americans could be identified by records listing solely their birthdate, gender and ZIP code.

Sweeney had this challenge in mind as she developed a way for the U.S. Department of Housing and Urban Development to anonymously track the homeless.

The system became necessary to meet the conflicting demands of two laws -- one that requires homeless shelters to tally the people they take in, and another that prohibits victims of domestic violence from being identified by agencies that help them.

Sweeney's solution deploys a "hash function," which cryptographically converts information to a random-appearing code of numbers and letters. The function can't be reversed to reveal the original data.

When homeless shelters had to submit their records to regional HUD offices for counting how many people used the facilities, each shelter would send only hashed data.

A key detail here is that each homeless shelter would have its own computational process, known as an algorithm, for hashing data. That way, one person's name wouldn't always translate into the same code -- a method that could be abused by a corrupt insider or savvy stalker who gained access to the records.

However, if the same name generated different codes at different shelters, it would be impossible to tell whether one person had been to two centers and was being double-counted. So Sweeney's system adds a second step: Each shelter's hashed records are sent to all other facilities covered by the HUD regional office, then hashed again and sent back to HUD as a new code.

It might be hard to wrap your mind around this, but it's a fact of the cryptography involved: If one person had been to two different shelters -- and so their anonymized data got hashed twice, once by each of the shelters applying its own formula -- then the codes HUD received in this second phase would indicate as much. That would aid an accurate count.

Even if HUD decides not to adopt the system, Sweeney hopes it finds use in other settings, such as letting private companies and law enforcement anonymously compare whether customer records and watch lists have names in common.

A University of California, Los Angeles professor, Rafail Ostrovsky, said the CIA and the National Security Agency are evaluating a program of his that would let intelligence analysts search huge batches of intercepted communications for keywords and other criteria, while discarding messages that don't apply.

Ostrovsky and co-creator William Skeith believe the system would keep innocent files away from snoops' eyes while also extending their reach: Because the program would encrypt its search terms and the results, it could be placed on machines all over the Internet, not just computers in classified settings.

"Technologically it is possible" to bolster security and privacy, Ostrovsky said. "You can kind of have your cake and eat it too."

That may be the case, but creating such technologies is just part of the battle.

One problem is getting potential users to change how they deal with information.

Rebecca Wright, a Stevens Institute of Technology professor who is part of a five-year National Science Foundation-funded effort to build privacy protections into data-mining systems, illustrates that issue with the following example.

The Computing Research Association annually analyzes the pay earned by university computer faculty. Some schools provide anonymous lists of salaries; more protective ones send just their minimum, maximum and average pay.

Researchers affiliated with Wright's project, known as Portia, offered a way to calculate the figures with better accuracy and privacy. Instead of having universities send their salary figures for the computer association to crunch, Portia's system can perform calculations on data without ever storing it in unencrypted fashion. With such secrecy, the researchers argued, every school could safely send full salary lists.

But the software remains unadopted. One large reason, Wright said, was that universities questioned whether encryption gave them legal standing to provide full salary lists when they previously could not -- even though the new lists never would leave the university in unencrypted form.

Even if data-miners were eager to adopt privacy enhancements, Wright and other researchers worry that the programs' obscure details might be difficult for the public to trust.

Steven Aftergood, who heads the Federation of American Scientists' project on government secrecy, suggested that public confidence could be raised by subjecting government data-mining projects to external privacy reviews.

But that seems somewhat unrealistic, he said, given that intelligence agencies have been slow to share surveillance details with Congress even on a classified basis.

"That part of the problem may be harder to solve than the technical part," Aftergood said. "And in turn, that may mean that the problem may not have a solution."

___

On the Net:

Portia: http://crypto.stanford.edu/portia

Sweeney: http://lab.privacy.cs.cmu.edu/people

[ Back To TMCnet.com's Homepage ]


Digg this article!

Discussions:
Be the first to post a comment on this page!
 
By  
TMCnet

E-mail this page to a friend Order reprints online Print this page Bookmark this page Free magazines Free newsletters RSS-XML alerts
  2008 TMC Labs Innovation Award Winners Announced Presented By INTERNET TELEPHONY Magazine
  White Paper Library Re-Launched On TMCnet
  Introducing the 2008 IPTV Excellence Award Presented by INTERNET TELEPHONY Magazine
  TMCnet Welcomes New Columnist Peter Brockmann
  INTERNET TELEPHONY Conference & EXPO West 2008 Exhibit Hall Nearing Capacity for Fall Event
  Customer Interaction Solutions Announces 2008 IP Contact Center Technology Pioneer Award Winners
  Customer Interaction Solutions Magazine Names Brendan B. Read Senior Contributing Editor
  TMC Schedules Internet Telephony Conference & Expo West 2008
  PIKA Technologies Launches Telephony Hardware Community on TMCnet
  Announcing the 2007 Product of the Year Award Winners Presented by Communications Solutions
  Last Call for Speech Technology Excellence Award Entries
  TMC Schedules Internet Telephony Conference & Expo West 2008
  TMCnet Welcomes New Columnist Matt Bancroft
  TMC Launches WiMAXtoday.TMCnet.com
  2008 TMC Labs Innovation Award Winners Announced by Unified Communications Magazine
  TMCnet Welcomes Rick Bye as Newest Columnist
  TMC Names Best of Show Winners of INTERNET TELEPHONY Conference & EXPO East 2008
  Interactive Intelligence Receives Record Page Views on Highest Trafficked Contact Center Site on the Web




TMC's Customized Keymail Alert and RSS Service Usage Instructions
 To receive daily e-mail alerts and RSS URLs of stories posted on TMCnet.com, please enter keyword terms to match and your e-mail address.  
Keyword 1:
Keyword 2:
Keyword 3:
 
E-mail Address:

Search terms are case-insensitive.

Enclose in double-quotes for exact phrase match.

No password necessary!

Latest TMCnet Headlines

Latest Company News
Subscribe FREE to all of TMC's monthly magazines. Click here now.
TMC LOGO
Technology Marketing Corporation,
One Technology Plaza, Norwalk, CT 06854 USA
Ph: 800-243-6002, 203-852-6800; Fx: 203-866-3326
General comments: tmc@tmcnet.com. Comments about this site: webmaster@tmcnet.com.
About   Contact  Advertise
Technology Marketing Corp. 1997-2008 Copyright. Privacy Policy Sitemap
Advanced