|
Data Mining
and Privacy Issues
by
George W. Zobrist
Data mining, a technique
people use to gather information by looking for hidden or
obscure relationships in data, continues to generate
considerable debate, especially about privacy issues. For example,
data mining could take the form of searching through company
orders, purchase amounts or zip codes to determine customer
preferences. A company’s marketing department could then use the
data for new product development or to determine who would most
likely purchase certain products.
Data mining is similar to
pattern recognition
— or artificial
intelligence
— as applied to a
database. But while the standard database use is more
straightforward, with users generally searching to find out
information they already know exists (such as the
number of employees near retirement or the number of employees in
a certain salary range), data mining
involves searching for information that is not known to exist on
the surface.
Consumer Issues
Where is the problem? In
data mining, the privacy and legal issues that may ensue are key
to the conflict. Over the years, both corporate entities and the
government have collected tremendous amounts of data, storing it
in “data warehouses.” Today’s data mining technology can extract
various patterns and relationships from these data warehouses,
putting consumers’ privacy in jeopardy.
The heart of the matter
is that consumers are aware that collected data is used for
bill payment, for example, and they explicitly agree to that use. They
do not necessarily implicitly agree to allow the corporate entity
to use the data in a data mining scenario; it exceeds the original
intent of the data collection.
Some privacy advocates
believe consumers should be given various levels of “opt-out”
choices: no data mining allowed; for internal use only; or
information being given is for both internal and external uses.
Many credit-card companies and others have begun offering their
customers such opt-out choices. No matter what,
the government, as well as public- or privately-owned companies should inform customers about how
they will use any
data collected from or about them.
Government Action
The Defense Advanced
Research Projects Agency (DARPA) has
come under particularly heavy criticism recently. Its Total
Information Awareness (TIA) program, set up to scour the
Internet and various public and private databases to expose
patterns of suspicious behavior by individuals and track potential
terrorists, has been cited as one that could
potentially violate Americans’ civil liberties. The Bush
administration has denied that such a potential exists.
A 7 February
Department of Defense release noted that two boards (one
internal, the other external) will provide oversight of the
TIA program. These boards would work with DARPA to ensure that the TIA
program is consistent with constitutional and statuary law, and
American values related to privacy.
DARPA has said that it
does not plan to generate a gigantic database and is not
collecting intelligence information, since that responsibility
rests with U.S. foreign intelligence and counterintelligence
units, operating under congressional oversight. Further, DARPA also
said it has never collected privately held consumer data.
Nevertheless, Sen. Ron
Wyden (D-Ore.) attached an amendment onto a recent spending bill
that would block funding for data mining aspects of the TIA
program, until the administration details the scope of TIA’s
activities
— and their
impact on civil liberties
— in a report.
The bill is currently in joint Senate-House committee, where
differences are being worked out. Many expect, however, that the
Wyden amendment will stay attached. A 14 July report noted that
congressional funding for the TIA program is all but “dead.” And
as of mid-September, DARPA was lobbying to get funding for TIA
— or parts of it
— restored.
Vocally expressing the opposing
viewpoint, the Heritage Foundation thinks the Wyden
amendment has gone too far, and that it would restrict law enforcement
efforts to deter terrorist activities.
Other Pending
Legislation and Activities
A Citizens Protection in
Federal Databases Act, submitted 29 July, would require the
Pentagon, the Central Intelligence Agency and the U.S. Departments of the
Treasury and Homeland Security (DHS) to report to Congress their use of
commercial databases to track terrorists, fugitives and "deadbeat"
parents within 60 days.
In addition, according to
Sen. Charles Grassely (R-Iowa), the FBI is working on a Memorandum
of Understanding with DARPA for possible experimentation with TIA
technology. The FBI denies this collaboration, stating only that
the organization seeks to improve its information technology.
Finally, Sen. Russ Feingold
(D-Wis.) has introduced legislation that would freeze all Defense
Department and
DHS data mining programs until Congress can evaluate and authorize
each one. This bill could be instrumental in increasing public
debate on data mining.
The public is just
beginning to address the relationship between data
mining and privacy. Most likely, we may raise even more concern in the near future, as
we become more cognizant of data mining techniques and
implications, especially where privacy issues are concerned.
Additional Resources
These sources will provide
more information on data mining and the surrounding issues:

Dr. George
W. Zobrist is professor emeritus at the University of
Missouri-Rolla, Department of Computer Science. He is IEEE-USA's
Member Activities editor.
|