The Homeland Security Act and the proposed DARPA "Total Information Awareness" (TIA) program

Robert E. Gladd,

Under the guise of combating terrorism, our federal government proposes to assemble -- absent probable cause and/or search warrants --  comprehensive investigative data dossiers on ALL American citizens as well as foreigners in the U.S.

From William Safire's recent NY Times editorial (11/14/2002) (registration required)

"...Every purchase you make with a credit card, every magazine subscription you buy and medical prescription you fill, every Web site you visit and e-mail you send or receive, every academic grade you receive, every bank deposit you make, every trip you book and every event you attend -- all these transactions and communications will go into what the Defense Department describes as "a virtual, centralized grand database." 

To this computerized dossier on your private life from commercial sources, add every piece of information that government has about you -- passport application, driver's license and bridge toll records, judicial and divorce records, complaints from nosy neighbors to the FBI, your lifetime paper trail plus the latest hidden camera surveillance -- and you have the supersnoop's dream: a "Total Information Awareness" about every U.S. citizen. 

This is not some far-out Orwellian scenario. It is what will happen to your personal freedom in the next few weeks if John Poindexter gets the unprecedented power he seeks...."

The lead agency driving this effort is DARPA, the Defense Advanced Research Projects Agency, specifically within the DARPA Information Awareness Office (IAO) under the direction of former Admiral Dr. John Poindexter. DARPA/IAO has already published "RFPs" (Requests For Proposals) and is awarding contracts. For example, see BAA 02-08, "Information Awareness Proposer Information Pamphlet" (a .PDF file).

DARPA/IAO is unabashed regarding its aims (located at Click their graphic to view the IAO site.):

[ 12/24/02 UPDATE: DARPA/OIA is apparently feeling the heat. The above graphic, which I copied from their website when I first assembled this page, has been toned down on the TIA website, with, among other changes, removal of the phrase "keeping track of individuals." They've also removed the Orwellian "scientia est potentia" logo ("knowledge is power") and bios of TIA principals like Poindexter. Interesting. ]

"Keeping track of individuals" without constitutional justification is something we expect of totalitarian regimes. I fear we are losing track of just what it is we are ostensibly trying to defend and preserve.

The explicit OIA goal is to place all recorded private and public personal transactions and histories within ongoing computerized reach of investigative authorities for more effective suppression of terrorist acts. The recently passed Homeland Security Act of 2002 (H.R. 5710, hereinafter referred to as HSA) under TITLE II—INFORMATION ANALYSIS AND INFRASTRUCTURE PROTECTION, mandates exactly this sort of initiative, as it directs the government to centrally

"...access, receive, and analyze law enforcement information, intelligence information, and other information from agencies of the Federal Government, State and local government agencies (including law enforcement agencies), and private sector entities (emphasis mine), and to integrate such information..."

"...To integrate relevant information, analyses, and vulnerability assessments (whether such information, analyses, or assessments are provided or produced by the Department or others) in order to identify priorities for protective and support measures by the Department, other agencies of the Federal Government, State and local government agencies and authorities, the private sector, and other entities..." (pages 23 and 24)

Congressman Dick Armey, while stating his strong opposition to the TIA proposal, denies that the HSA authorizes it. As quoted in an MSNBC article

"...“This bill does not in any way authorize the Department of Defense program known as ‘Total Information Awareness,’ ” Armey said. “It does not authorize, fund or move into the department anything like it. In fact, this bill provides unique statutory protections that will ensure the Department of Homeland Security could never undertake such a program.” Armey also noted that “references in the bill to data-mining are intended solely to authorize the use of advanced techniques to sift through existing intelligence data, not to open a new method of intruding into lawful, everyday transactions of American citizens.” [see]

Well, the relevant sections of the HSA do not make that clear, Mr. Armey, in fact they seem to contradict the assertion. The HSA speaks of integrating data from sources going well beyond "intelligence data" (see above, or better yet, read the Act. A link is provided below.) Moreover, we can be sure that DARPA/OIA will seek to be included in funding allocated under HSA for their little unconstitutional project. The devil will surely be in the details, and the operational details will consist of the endless HSA amendments, appropriations bills, and detailed CFRs (Code of Federal Regulations specs eventually issued for HSA). What is clear at this point is that any logical current reading of HSA tells us that the TIA program falls within the Homeland Security mandate. Confirmation of this last point is seen in remarks made by Under Secretary of Defense for Acquisition, Logistics, and Technology Edward C. "Pete" Aldridge during a November 20th DoD news briefing:

Q: How is this not domestic spying? I don't understand this. You have these vast databases that you're looking for patterns in. Ordinary Americans, who aren't of Middle East origin, are just typical, ordinary Americans, their transactions are going to be perused.

Aldridge: Okay, first of all --

Q: And do you require search warrants? I mean, how does this work?

Aldridge: First of all, we are developing the technology of a system that could be used by the law enforcement officials, if they choose to do so. It is a technology that we're developing. We are not using this for this purpose. It is technology.

Once that technology is transported over to the law enforcement agency, they will use the same process they do today; they protect the individual's identity. We'll have to operate under the same legal conditions as we do today that protects individuals' privacy when this is operated by the law enforcement agency.

Q: So they would need a search warrant, then?

Aldridge: They would have to go through whatever legal proceedings they would go through today to protect the individuals' rights, yes.

Q: As part of this feasibility study, will anybody be looking at legislation, regulation, executive orders that may need to be modified?

Aldridge: I think that's probably an issue that's going to be taken up by the new office of homeland security, who probably will be very much involved in this type -- the use of this type of information. []

The link between DARPA's "Total Information Awareness" proposal and the Homeland Security Department (in addition to regular U.S. civilian law enforcement) seems rather clear from those remarks. And, one must ask just how such agencies will "go through whatever legal proceedings they would go through today to protect the individuals' rights" after the TIA data horse is already out of the barn?

No one can question the worthiness of the fight against terrorism. However, the means as envisioned by OIA raise troubling Constitutional and operational questions. Aggregating private personal information for the (sole?) purpose of conducting widespread criminal investigations without probable cause and warrants seems to directly violate the 4th Amendment. Recall from the Bill of Rights: "The right of the people to be secure in their persons, houses, papers, and effects, against unreasonable searches and seizures, shall not be violated, and no warrants shall issue, but upon probable cause, supported by oath or affirmation, and particularly describing the place to be searched, and the persons or things to be seized." (Amendment IV) It is beyond any dispute, for example, that American authorities may not surreptitiously enter your houses without cause (validated by a warrant), rifle through your belongings, photocopy your papers, extract the data from your personal computers, intercept your emails, and remove this information for criminal investigatory scrutiny. How the proposed TIA program differs materially escapes me. Worse, this is a Defense Department entity proposing to undertake what would be unconstitutional for domestic civilian law enforcement.

Constitutional questions aside, we ought seriously question the likely operational utility of such an undertaking. Toward that end I have provided a "what-if" Excel spreadsheet (Download it by clicking here) with which to assess hypothetical relative effectiveness scenarios of a TIA "terrorism detection database" under varied input assumptions -- its noxious constitutional implications aside. I have entered the following default values: [A] a population of 240,000,000 (~215,000,000 Americans 18+ yrs of age, plus ~25,000,000 foreigners),  [B] 5,000 actual terrorists lurking among them, and [C] & [D] extremely generous (and highly unlikely) 99.9% "accuracy rates" pertaining to "true positives" (terrorists) and "true negatives" (innocent citizens). In such a scenario, counterintuitive though it may be (given a putative "99.9% accuracy rate"), the likelihood of identifying an actual terrorist is -- at best -- approximately 2% (the proportion of true positives in the test-positive subset of the initial population), and this small group will still have to be separated from the nearly quarter million "false positives"  -- i.e., innocent people wrongly identified as terror suspects by a TIA model.

While the relative "accuracy" (sensitivity & specificity) levels of many clinical methods that estimate disease probabilities (or any type of  experimental assay with anterior empirical underpinnings using Bayesian statistical methods [see below] ) are tolerably well-defined (and uniformly well below 99.9%), those pertaining to a TIA program are wholly speculative at this point, and will not clarify for years (if ever). One daunting limitation will come in the form of pervasively inaccurate and/or incomplete data pouring in from the myriad public and private sources. Another will owe to the relative recency and transience of the phenomenon. As Robert Levy of the Cato Institute observes: "Never mind that Pentagon computer scientists believe that terrorists could easily avoid detection, leaving bureaucrats with about 200 million dossiers on totally innocent Americans — instant access to e-mail, web surfing, and phone records, credit-card and banking transactions, prescription-drug purchases, travel data, and court records." (see I could not agree more. While the innocent will more or less simply go on with their customary daily life transactions, our terrorist enemies will undoubtedly take evasive measures. What shall we do? Outlaw, among other things, all anonymous cash transactions? If we don't (and we cannot) the very utility of a TIA database will be fatally compromised at the outset.

Given that no test is infallible, there are inescapable trade-offs in terms of relative false-positive/false negative levels associated with any assessment. For example, where routine workplace drug tests are concerned, labs seek to limit false positives (and the lawsuits they spawn), while they are far less troubled by false negatives (recreational drug users who slip through the screenings). With respect to terrorism, on the other hand, authorities will necessarily fret principally over false negatives -- actual terrorists who go undetected. Should you wrongly end up on a Homeland Security "No-Fly List" or be uselessly visited by a couple of FBI agents in the wake of a false positive TIA "hit", you will likely be met with bureaucratic indifference at best should you protest. At worst, you could be wrongly arrested, have your assets seized, lose your job, or otherwise have your reputation ruined.

We only have 11,400 FBI Special Agents, many of whom are accountants and lawyers working white-collar crime. Do we really want to send criminal investigators en masse off surveilling and interviewing the overwhelmingly innocent errantly identified in a TIA computerized fishing expedition? Moreover, given that the proportion of actual terrorists in the general population cannot but be vanishingly low, minor decrements in the Specificity rate will cause huge increases in false positives -- all of whom will have to be investigated. Change the 99.9% Specificity to 99.0% and see what happens.

First fiscal year funding for this program is set forth in the HSA at $500,000,000. It is at once unconstitutional and another likely significant waste of public funds, one that may well have little positive impact on the war against terrorism.

While HSA contains privacy and data security language restricting the use of TIA-type data to (ill-defined) Homeland Security purposes, we will likely see relentless pressure by law enforcement for access to the information, on the grounds that more effective overall law enforcement via access to TIA data will free up police resources for Homeland Security duties. One need not read far into the HSA to see blurring of enforcement lines. For example, the new Department is directed to

"...monitor connections between illegal drug trafficking and terrorism, coordinate efforts to sever such connections, and otherwise contribute to efforts to interdict illegal drug trafficking." [TITLE I, Section 101(b)(1)(G), page 14]

Given the federal government's simplistic yet relentless media campaign of the past year arguing that "Drug Use Aids Terrorists," well, you get the point (or should).


"Except as specifically provided by law with respect to entities transferred to the Department under this Act, primary responsibility for investigating and prosecuting acts of terrorism shall be vested not in the Department, but rather in Federal, State, and local law enforcement agencies with jurisdiction over the acts in question." (pp 14-15)

Well, Federal, State, and local law enforcement agencies may well -- perhaps quietly -- argue that they cannot perform their duties pursuant to HSA without access to the integrated, panoptic TIA data repositories. Conveniently, HSA is replete with broad language granting this or that Director or Assistant Secretary discretion over what constitute "reasonable" administrative measures under the Act.

It gets worse. HSA is also peppered with language mandating two-way coordination of activities, communications, and data-sharing with "the private sector." HSA officials are charged with "...creating and fostering strategic communications with the private sector to enhance the primary mission of the Department to protect the American homeland" (page 17), "...creating and managing private sector advisory councils composed of representatives of industries and associations designated by the Secretary" (page 18), "...promoting existing public-private partnerships and developing new public-private partnerships to provide for collaboration and mutual support to address homeland security challenges" (pp 18-19), and so forth.

In addition to the warrantless law enforcement implications of HSA's envisioned data repositories, we must also recognize that a TIA database will also constitute a commercial data-miner's wet dream of scope heretofore unimagined. Vigilance with respect to HSA collaborative "public-private partnerships" had better be tireless.

To download the Act go to

What is "Bayesian Statistics"?

Basic 2x2
decision matrix
Test says "cancer" Test says "no cancer"
You have cancer 1. True positive 2. False negative
You do not
have cancer
3. False positive
4. True negative

Bayesian methods are used to refine a posterior probability estimate by using anterior probability knowledge. The table above is familiar to anyone who works in health care or epidemiological analysis. For example, we know both the approximate prevalence of a clinical condition (the proportion of people in the population with the condition) and the historical false positive and false negative rates of a relevant lab test. Using Bayes formula (below), we can better estimate the likelihood you in fact have a disease given that your test comes back positive, or the probability that you are actually disease-free given a negative lab test.

Relax, it's only algebra.

Let p(t|+) = the probability of being a true positive ("t", e.g., for this discussion, a terrorist) given a positive TIA finding (+);

Let p(+|t) = the probability of testing positive (+) given that you are in fact a "t";

Let p(t) = the "prevalence" of true positives, e.g., the proportion of terrorists lurking in the aggregate population;

Let p(+|f) = the TIA probability of testing positive (+) given that you are in fact NOT a "t" (i.e., the false positive rate);

Let p(f) = 1 - p(t), the proportion of innocent ("non-terrorists") people in the population.

Look at the factor p(+|f)p(f) in the right-hand side of the denominator (lower right above). Given that the proportion of population true negatives (non-terrorists in this discussion) is indisputably extremely high, p(t|+), the likelihood of a TIA  assessment yielding true positives will necessarily be intractably low, given the relative magnitude of p(+|f)p(f) (unless we have perfect concurrent 100% Sensitivity and Specificity, which, in the real world, will not be the case). Moreover, p(+|f) is not wholly independent of p(f), in that the more true negatives in the population, the more chances you have to err. Similarly, the fewer true positives in the population, the fewer chances you have to get it right.

This is why we don't test everybody for every disease. This is why we don't test every square foot of the nation in search of pollution.

 This is why we have Probable Cause and Warrants codified in the Constitution -- principles apparently lost on the likes of a John Poindexter or a John Ashcroft. In Bayesian terms, "probable cause" serves to ensure that the "prevalence" of guilty individuals in a criminal proceeding is minimally greater than 50%, making us much less likely to wrongly convict someone of a crime (or brand someone as a "terror suspect" behind his or her back after fishing through personal data without constitutional -- rational -- justification).

Basic 2x2
decision matrix
You decide "guilty" You decide "innocent"
is guilty
1. Just conviction 2. Criminal goes free
is innocent
3. Unjust conviction
4. Just exoneration

Think about it. Use the spreadsheet. Play around with various scenarios to learn just how much more likely you'd be to be falsely accused or convicted than you might think were it not for the probable cause brake on authority.


Those who advocate measures such as this can point out that, given my assumptions (5,000 terrorists lurking amid 240,000,000 adults here), were we to randomly sample people for investigation, the raw probability of collaring a terrorist would be 5,000/240,000,000, or roughly 0.00002083. A TIA model operating at 99.9% "accuracy", consequently, would up the odds of catching a terrorist roughly 1,000-fold. What are we to do? Not avail ourselves of state-of-the-art computer modeling tools in the service of Homeland Security?

While the foregoing math is indeed true, I have two questions in reaction:

  1. What will be the consequences of being wrongly identified (a false positive)? If, for example,  you are a false positive for banned objects (e.g., weapons) at the x-ray equipment and/or metal detector at the airport, the error is quickly confirmed and you are on your way. Your identity is not recorded and added to a database. Anyone who has ever falsely tested positive for illicit drug use or has been wrongly arrested, however, can give you a bit of insight into the persistent ugliness that can in fact follow errors by those in authority (See "FBI’s post-9/11 watch list spreads far, mutates" below).

  2. What if a TIA model only achieves modest (though still technically "significant") "accuracy" and "precision" levels? For example, simply decrement the Sensitivity and Specificity levels in my scenario to 99.0? You then have nearly 2.4 million false positives to weed out. At 95% "accuracy" you would have 12,000,000 people to subsequently surveil and/or interrogate. It quickly becomes logistically untenable. Again, you're stuck with the low-prevalence problem, which trumps any level of "true positive" accuracy.

This is methodologically akin to the mendacious idiocy of requiring Granny and Grandpa the Wal-Mart Greeters to submit to employment drug tests. Statisticians understand this problematic statistical search for the rare (and unevenly distributed) event or object.

So too -- implicitly -- did the framers of our Constitution. Congress  -- and/or the courts -- should strike down this Orwellian undertaking.

Update, November 25th, 2002

Only the guilty need worry, right? If you've nothing to hide, you have nothing to fear from the TIA/HSA and proposed "public-private" information partnerships. Well, you might consider the following, which appeared in my Sunday paper yesterday:

FBI’s post-9 /11 watch list spreads far, mutates


LAS VEGAS — When a patron at the New York-New York casino plugged his frequent player card into a slot machine one day this summer, something strange happened: An alert warned the casino’s surveillance officials that an associate of a suspected terrorist might be on the grounds. 

How did a casino’s computer make such a connection? Shortly after Sept. 11, 2001, the FBI had entrusted a quickly developed watch list to scores of corporations around the country. 

Departing from its usual practice of closely guarding such lists, the FBI circulated the names of hundreds of people it wanted to question. Counterterrorism officials gave the list to car-rental companies. Then FBI field agents and other officials circulated it to big banks, travel-reservation systems, firms that collect consumer data, as well as casino operators like MGM Mirage, the owner of New York-New York. Other recipients included businesses thought vulnerable to terrorist intrusion, including truckers, chemical companies and power-plant operators. It was the largest intelligence-sharing experiment the bureau has ever undertaken with the private sector. 

A year later, the list has taken on a life of its own, multiplying — and error-filled — versions being passed around like bootleg music. Some companies fed a version of the list into their own databases and now use it to screen job applicants and customers. A water-utilities trade association used the list "in lieu of" standard background checks, says the New Jersey group’s executive director. 

The list included many people the FBI didn’t suspect but just wanted to talk to. Yet a version on SeguRed. com, a South American security-oriented Web site that got a copy from a Venezuelan bank’s security officer, is headed: "list of suspected terrorists sent by the FBI to financial institutions." 

Meanwhile, a supermarket trade group used a version of the list to try to check whether terrorists were raising funds through known shoplifting rings. The trade group won’t disclose results. 

The FBI credits the effort, dubbed Project Lookout, with helping it rapidly find some people with relevant information in the crisis atmosphere right after the terror attacks. MGM Mirage says it has tipped off the FBI at least six times since beginning to track hotel and casino guests against the list. 

The FBI and other investigative agencies — which were criticized after Sept. 11, 2001, for not sharing their information enough — are exploring new ways to do so, including mining corporate data to find suspects or spot suspicious activity. The Pentagon is developing technology it can use to sweep up personal data from commercial transactions around the world. "Information sharing" has become a buzzword. 

But one significant step in this direction, Project Lookout, is in many ways a study in how not to share intelligence. 

The watch list shared with companies — one part of the FBI’s massive counterterrorism database — quickly became obsolete as the bureau worked its way through the names. The FBI’s counterterrorism division quietly stopped updating the list more than a year ago. But it never informed most of the companies that had received a copy. FBI headquarters doesn’t know who is still using the list because officials never kept track of who got it. "We have now lost control of that list," says Art Cummings, head of the strategic analysis and warning section of the FBI’s counterterrorism division. "We shouldn’t have had those problems." 

The bureau tried to cut off distribution after less than six weeks, partly from worry that suspects could too easily find out they had been tagged. Another concern has been misidentification, especially as multipart Middle Eastern names are degraded by typos when faxed and are fed into new databases. 

Then there’s the problem of getting off the list. At first the FBI frequently removed names of people it had cleared. But issuing updated lists, which the FBI once did as often as four times a day, didn’t fix the older ones already in circulation. Three brothers in Texas named Atta — long since exonerated, and no relation to the suspected lead hijacker — are still trying to chase their names off copies of the list posted on Internet sites in at least five countries. 

People who’ve asked the FBI for help getting off the bootleg lists say they’ve been told the bureau can’t do anything to correct outdated lists still floating around. The FBI’s Cummings says that "the most we can control is our official dissemination of that list." Once it left the lawenforcement community, "we have no jurisdiction to say, ‘If you disseminate this further, we will prosecute you. ’" 


Despite the problems, Cummings and other proponents of information-sharing say the process should be improved, not abandoned. Software companies are rushing to help, trying to make information-sharing easier and more effective. 

Systems Research & Development in Las Vegas is among those working on ways to make exchanging law-enforcement and corporate information a two-way street without compromising privacy. "I believe there’s probably 10 to 50 companies in America that across them touch 80 percent to 90 percent of the entire country," says Jeff Jonas, Systems Research & Development founder, citing credit-card companies, banks, airlines, hotel chains and rental-car companies. "There should be a protocol in place that corporate America could be plugged into that allows them to say, ‘ We’d like to help, ’" he says. 

But some officials at the U.S. Customs Service, the Office of Homeland Security and the FBI’s own Criminal Justice Information Services Division doubt the wisdom of circulating watch lists widely, and some say they didn’t even know about Project Lookout. 

Civil libertarians worry about enlisting companies to track innocent people for the government. Many companies say they need to be insulated from liability if they’re expected to share data on people with the government. "It’s a tough, tough box to get into. You end up with legitimate concerns about moving into Orwell’s 1984," says Henry Nocella, an official of Professional Security Bureau Ltd. in Nutley, N. J., and a former security director at Bestfoods. "Yet you know there’s a need to collect and analyze information." 


Before Sept. 11, 2001, the government rarely revealed the names of terrorism suspects to companies. The exception was when it had a subpoena for specific information the government believed a company had about a person under investigation. But after the attacks, counterterrorism officials were concerned that members of terrorist cells could have slipped undetected into companies or communities. They feared that by the time they figured out where to direct subpoenas, the suspects could get away or even stage another attack. Holed up in a "strategic information and operations center" in Washington, a small circle of FBI officials decided on Sept. 15, 2001, to put out a broad heads-up to state and local police and to trusted companies. "We’re not playing games here. This was real life. We wanted as many people as possible to know this is who we wanted to talk to," says Steven Berry, an FBI spokesman. 

Agents cast a wide net that, by its nature, included scores of innocent people. 

They started by using record searches and interviews to identify "anybody who had contact" with the 19 hijackers, Cummings recalls. 

Kevin Giblin, chief of the terrorist warning unit, decided that car-rental companies and local police should be the first outside of the airlines to get the list. One firm that received it, Ford Motor Co. ’s Hertz unit, says it checked the list against its records and told the FBI of any matches, but then basically let the list lie dormant. Trade groups proved a quick way to spread the word. The FBI gave the list to the Transportation Department. It shared the names with the American Trucking Associations, which promptly e-mailed the list to nearly 3,000 trucking companies. The International Security Management Association, an elite group of executives at 350 companies, put the list on a password-protected part of its Web site, allowing members to scan it in private, members say. 


On their own, FBI field agents shared the list with some chemical, drug, security-guard, gambling and power-plant companies, according to interviews with companies. The FBI’s Giblin says he hadn’t realized how extensively field agents distributed the list. But he says agents have considerable autonomy and are expected to keep close ties to companies in their area. Giblin says the bureau stressed to recipients that the people named weren’t all suspects. "This wasn’t a blacklist," he says. By the time the FBI tried to close out its list, at least 50 versions were floating around, say people who saw numbered ones. Some companies were asking software firms such as Systems Research & Development how to make better use of the lists. The company, which is financed in part by a venture capital arm of the Central Intelligence Agency, has a program called NORA, for Non-Obvious Relationship Awareness. It mines data to detect hard-to-see links between people, such as use of the same residence or phone number. 

Giblin says when he fields tips nowadays from companies that have the watch list, he tells them it’s obsolete. But not all field offices turn down such tips. 

If the government does decide to disseminate watch lists in the future, it won’t face high legal hurdles, says Daniel Ortiz, a law professor at the University of Virginia. He says someone who appears wrongly on a watch list could ask for a correction but couldn’t prevent the list’s circulation or sue the government for damages under current privacy laws. The government just has to be careful not to single people out solely on race or ethnicity. 

Businesses face more jeopardy, however. Many industries, such as cable companies and banks, operate under special privacy laws preventing them from giving customer information to the government without a subpoena. 

Recall the huge flap in Florida during the 2000 Presidential election, wherein a "public/private" data partnership wrongfully excluded thousands of voters as "ineligible ex-felons"? See Florida's flawed "voter-cleansing" program. Opportunities for political mischief through access to TIA data will, if history is any guide, be legion.

About the author of this web page -

I have been working with analytical data for the past 16 years, in four disparate domains: [1] forensic-level environmental radiation and mixed waste analysis, [2] industrial “Predictive Maintenance” (PDM) diagnostics, [3] Nevada Medicare hospitalization outcomes investigations, and, (for the past nearly three year to date) [4] credit risk management in a subprime demographic (people who perhaps shouldn’t be even be granted credit). My training and experience with both the theory and practicalities of data logistics and assessment are at once broad and deep.

 My tenure in radioassay was one in which you frequently had to justify every for-the-record digit to the satisfaction of a seemingly endless horde of auditors (many of whom served the potentially legally liable parties eager to discredit your work). Put down “2.7 pCi/kg.” on a report and you could expect to be called upon to demonstrate that your records scientifically verified your bench-level operational ability to distinguish between “2.6” and “2.8”. “Significant figures” rounding was a routine contractual stipulation, one subject to ongoing verification.

 During my PDM tenure, it quickly became obvious that, were one of our digital FFT monitor-analyzers to prove inaccurate and permit, say, a power plant turbine bearing or shaft to fail without warning, huge sums might be lost, and people might die (and we might be subsequently sued out of existence). Our engineers and programmers, consequently, personified the term “fastidious.”

 Next: Nevada Medicare, and a rude empirical awakening. The U.S. Health Care Financing Administration (HCFA) quietly internally acknowledged that the hospitalization data we had to work with at the Nevada Peer Review was perhaps only “~80% accurate.” Medical charts were shot through with inaccuracies and omissions owing to realities such as the vagaries of administrative ICD-9 and DRG coding and the chronic inscrutability of clerical and/or physicians’ penmanship. A staple of designing Peer Review statistical evaluation projects was compensatory “20-25% oversample” for chart abstraction and review.

 Now I work in revolving credit risk assessment (a privately-held issuer of VISA and MasterCard accounts), where our department has the endless and difficult task of trying to statistically separate the “goods” from the “bads” using data mining technology and modeling methods such as factor analysis, cluster analysis, general linear and logistic regression, CART analysis (Classification and Regression Tree) and related techniques.

Curiously, our youngest cardholder is 3.7 years of age (notwithstanding that the minimum contractual age is 18), the oldest 147. We have customers ostensibly earning $100,000 per month—odd, given that the median monthly (unverified self-reported) income is approximately $1,700 in our active portfolio.

 Yeah. Mistakes. We spend a ton of time trying to clean up such exasperating and seemingly intractable errors. Beyond that, for example, we undertake a new in-house credit score modeling study and immediately find that roughly 4% of the account IDs we send to the credit bureau cannot be merged with their data (via Social Security numbers or name/address/phone links).

 I guess we’re supposed to be comfortable with the remaining data because they matched up -- and for the most part look plausible. Notwithstanding that nearly everyone has their pet stories about credit bureau errors that gave them heartburn or worse.

12/26/02 UPDATE: see for the latest on the persistent extent, and the actual and potential negative impacts of credit bureau inaccuracies.

 In addition to credit risk modeling, an ongoing portion of my work involves cardholder transaction analysis and fraud detection. Here again the data quality problems are legion, often going beyond the usual keystroke data processing errors that plague all businesses. Individual point-of-sale events are sometimes posted multiple times, given the holes in the various external and internal data processing systems that fail to block exact dupes. Additionally, all customer purchase and cash advance transactions are tagged by the merchant processing vendor with a 4-digit “SIC code” (Standard Industrial Classification) categorizing the type of sale. These are routinely and persistently miscoded, often laughably. A car rental event might come back to us with a SIC code for “3532- Mining Machinery and Equipment”; booze purchases at state-run liquor stores are sometimes tagged “9311- Taxation and Monetary Policy”; a mundane convenience store purchase in the U.K. is seen as “9711- National Security”, and so forth.

Interestingly, we recently underwent training regarding our responsibilities pursuant to the Treasury Department’s FinCEN (Financial Crimes Enforcement Network) SAR program (Suspicious Activity Reports). The trainer made repeated soothing references to our blanket indemnification under this system, noting approvingly that we are not even required to substantiate a “good faith effort” in filing a SAR. In other words, we could file egregiously incorrect information that could cause an innocent customer a lot of grief, and we can’t be sued.

 He accepted uncritically that this was a necessary and good idea.

You just watch. The Homeland Security Act and its eventual amendments and CFRs, along with those pertaining to TIA will also certainly contain such blanket liability immunity provisions.

 We know why. 

Robert E. Gladd,  MA/EPS, CQE

Las Vegas, NV