Patricia Hassett is Professor of Law at Syracuse University. Formerly a prosecuting attorney and a municipal government attorney, Professor Hassett served with the Lord Chancellor’s Advisory Committee on Legal Education and Conduct in England, advising on the education and professional conduct of persons providing legal services. She has also served as a consultant to the English Home Office on a project to improve the quality of bail decisions. Professor Hassett writes in the field of artificial intelligence and the law and has constructed a prototype of an expert system that makes bail recommendations. She teaches courses in constitutional criminal procedure, artificial intelligence and law, and criminal law.
Linda Roberge is Assistant Research Professor at Syracuse University. Her Ph.D. is in Information Systems with supporting fields of Statistics and Public Administration. Her research interests include use of information technology by the legal and medical professions.
Lawyers, along with other professionals, are looking at the many advantages that information technology holds for their profession. In this paper we discuss a new class of information that goes beyond databases to the realm of data warehouses and data mining . These state-of-the-art technologies and the information they produce promise to redefine some of the best-practice standards of the legal profession.
Transactional Records Access Clearinghouse (TRAC), a research center at Syracuse University, has developed a web-based data warehouse/data mining application that makes it possible to produce useful information from previously inaccessible data. For years, investigative reporters, public interest groups, Congressional committees and others have used TRAC’s application, TRACFed. Now the legal profession has discovered the power of TRAC.
But why should lawyers care about data warehouses and data mining? Some of the many situations in which the information could prove useful are highlighted using examples drawn from TRACFed.
The TRACFed data warehouse includes among its many offerings transactional data from the US Federal government concerning its enforcement and prosecution activities, staffing, federal expenditures, and more. Like many others, TRAC’s data warehouse is extremely large currently, occupying approximately 300 gigabites of storage space and growing monthly. Analyzing this much data to discover the relationships of interest can be tricky, even for professional statisticians.
Lawyers and legal researchers often are not trained to do even small data analyses. With this audience in mind, TRAC developed a series of “point-and-click” style data mining tools that put powerful analytical capabilities into the hands of non-statisticians.
Successful lawyering in a particular case involves knowledge and understanding of both the relevant legal rules and the workings of the specific court system in which the case will be processed. Lawyers who work regularly in a particular court will rely on their experience with the workings of the legal system to make decisions about how to handle a case and what advice to give to their clients. Attorneys with limited or no previous contact in a particular court may be at a disadvantage because they will not have the same understanding of the system that their more experienced colleagues have.
However, even for the well-honed veterans, experience can be misleading. Before giving advice based upon a perception of how the system works, careful lawyers would like to know whether their personal perceptions are consistent with actual facts.
TRACFed allows lawyers to confirm their impressions with actual data. Do cases really move more slowly through Judge Smith’s court? How frequently does a particular prosecutor decline certain types of cases? What is the likelihood that my client’s tax return will be audited? How often do criminal cases investigated by a particular agency result in a conviction?
With the resources now available in TRACFed, the practicing lawyer needs to be aware that the best practice standards of the profession are likely to change. No longer will it be acceptable to rely on the impressions, hunches, and anecdotes that have formed the basis for experiential knowledge to date. We have already seen how information sources like Westlaw and Lexis/Nexis have used technology to change best-practice standards regarding the knowledge of legal rules. Now data warehouses of legal system information (coupled with data mining tools) are likely to change the best practice standards relating to knowledge of legal systems.
Case 1 : You are bringing an employment discrimination suit against the government. You feel you have a really strong case but are concerned about a couple of things. First, what is the typical amount of relief awarded? You decide to use TRACFed’s Civil Layer with the Going Deeper Tool . (Please see the Quick Start Guide for more information.) With a few clicks of the mouse, you discover that out of 1,446 employment litigation cases disposed of in 2001 in U.S. District courts, only 327 were granted monetary relief. That’s less than 25%! And for those cases where relief was awarded and recorded, the median amount (meaning half got more and half got less) was only $35,000. You “drill down” by clicking on the link for Year 2001 and discover that in your district, the percentage of cases where relief was awarded was even lower. Again, you drill down to look at the individual records for your district. Here you discover that the news isn’t all bad. The two lowest awards were handled by an Assistant U.S. Attorney who has since left the office!
Second, your client is very discouraged at this point and wonders how long this whole thing is going to drag on. You know that the judge assigned has been pulled out of retirement, but you don’t know much else. Using the Express Tool on TRACFed’s Judges Layer, you discover that this particular judge has heard many civil cases since his retirement, so you can probably find someone who has been before him. You also discover that the average processing time for the small number of cases he closed in 2001 where the government was the defendant was 2,059 days!
Case 2 : Your client went through an IRS audit several years ago; it was an experience she doesn’t want to repeat. Not only did she have to spend a considerable amount of time, but she was also assessed additional taxes which created a considerable burden. She wants to know what the chances are that she will be audited this year.
You use TRACFed’s Administrative Layer area to gather some information for your client. Using the Express Tool you discover that for her type of return (individual with income greater than $100,000), less than 1% (0.38%) of the returns got audited in 2001. However, this is a little over twice the audit rate for all returns. One piece of good news is that the audit rate has been steadily declining over the past five years. Another is that she has moved to a district where the audit rate has been consistently lower than it was in her previous district.
But what is likely to happen if she does get audited? Again TRACFed has some answers. Clicking on …more on audits and focusing on her income class, you find that nationally 80% of the audits result in a tax change and the average amount of taxes and penalties in 2001 was $27,061.
In the unlikely event that she is audited again this year, there is a lot more information you could produce including installment agreements and offers-in-compromise. All you need to do is to select a different Table Topic from the menu of choices.
Case 3 : Your client, an influential doctor, has notified you that the FBI has begun an investigation of his practice. Federal agents have been interviewing staff and are seeking a court order to review practice records. He is particularly concerned that he and his partners may not have strictly followed all the Medicare guidelines regarding coding for surgeries. He is afraid that the government will want to make an example of a high profile practice such as his. He wants to know what has been going on lately in the area of health care fraud.
Although you certainly have some impressions about what has been going on, you decide to test your impressions with the TRACFed. Using the Analyzer Tool on the Criminal Layer, you specify the data slice by choosing your district and the program category of Health Care Fraud for the year 2001. You discover that of the 182 referrals disposed of in your district in 2001, 125 were not prosecuted. Of the 57 that were prosecuted, over three quarters (44) were convicted.
Wondering what that meant in terms of prison time, you use the Explore and List features and select Convictions as the stage to zero in on. You discover that of those convicted, 28 received no prison time; the others received sentences ranging from 2 to 30 months. This then is probably the worst case scenario.
You probably should look at fines too, but decide to think positively and look more closely at the declinations. You generate a table that shows how many prosecutions were declined for particular reasons. You find that the reason given for over half the declinations was “Lack of evidence of criminal intent.”! You generate another table that separates these numbers by investigating agency, and find that the FBI only does a little better than other agencies at finding criminal intent. Raising the “lack of criminal intent” issue seems like a potential strategy if the FBI decides to refer your client’s case to the U.S. Attorney.
One of the things you noticed while looking at individual records was that all were marked ‘N’ for national priority. You wonder if priorities have changed since 9/11? You create data slice of records in 2002 and find that many are now marked as both national and district priority. That certainly can’t help!
You also find two new pieces of information. 1) the only new cases to receive immediate declinations due to weak evidence were investigated by the Secret Service rather than by the FBI; and 2) the referrals coming from FBI seem to be going to one of three prosecutors. One set of prosecutor initials you don’t recognize so you want to look into the Staffing Layer to find the name and seniority of this particular prosecutor. But that can wait for another session.
There is much more digging you would like to do. Luckily, the work you’ve done is stored in your Web Locker so you can return to it tomorrow.
These three cases provide readers with a small taste of the types of information they can produce using TRACFed. Because the data mining tools are easy to use, lawyers are able to generate sophisticated analyses to help them plan more effective strategies than was previously possible. For attorneys who practice in Federal District Courts, TRACFed is an invaluable resource.
Transactional Records Access Clearinghouse is a not-for-profit research center located at Syracuse University. The center has been supported by the university and by grants from Rockefeller Family Fund, the New York Times Company Foundation, the John S. and James L. Knight Foundation, the Beldon Fund and the Open Society Institute. User subscription fees from TRACFed help to defray costs associated with updating and maintaining the data warehouse and data mining tools.
 Database – A collection of information. A transactional database records and tracks the individual activities or transactions of an organization. For example, when a government employee is hired, information about the employee and his/her job is recorded in a transactional database. As information about the employee changes (e.g., salary, work schedule, or grade) the database is updated. A transactional data contains what is often referred to as “live” data that support the operations of an organization. <back to text>
 Data Warehouse – An integrated compilation of data from various sources. Data warehouses differ from transactional databases in several significant ways. First, warehouses consist of one or more transactional databases that are integrated. Second, in addition to transactional records, warehouses may contain summarized data that can also be integrated with the transactional data. And finally, warehouses contain historical data that is updated periodically, often quarterly or yearly, rather than “live” data that is constantly being updated in real-time. Data warehouses are constructed to facilitate decision-making and answer questions. <back to text>
 Data Mining – The process of searching for trends, relationships, and patterns in large amounts of data often from a data warehouse. Finding these hidden relationships is really the process of data analysis. <back to text>
 Going Deeper Tool – The Going Deeper Tool provides users a powerful yet easy to use “drill-down” capability. This allows users to start with aggregated data and drill down all the way in to individual case-by-case information on criminal and civil matters and to individual employees. Using a point and click interface, users generate a query that returns a series of linked tables, each of which relates to an increasingly narrower subset of data. The the final drill shows the individual records that make up the subset of data that was selected.
<back to text>
 Express Tool – The Express Tool provides a simple means of quickly retrieving information from TRAC data warehouses. Express enables users to examine and compare broad categories of information, focus on particular geographic regions or topics, generate rankings, make comparisons, and find trends. Using pull-down menus, users can build dynamic queries based on the available options. Results can be tailored through the query builder so that the desired information is returned in alphabetical order, national ranking order, graphical or map displays. Express is a great jumping off point for in-depth inquiries.
<back to text>
 Analyzer Tool – Analyzer Tool lets you, with a point and click interface, create your own data slice on a selected subject (e.g. civil rights or the environment), a specific agency, or a particular statute. The data slice is stored in your own individual Web Locker where it is available for further mining and analysis using the Analyzer Tool’s power features. Analyzer has four power features. List enables you to display individual records in their entirety – like the list that can be generated in Going Deeper. Explore lets you examine the makeup of your data slice along a number of user specified relevant categories. Focus allows you to undertake the same close examination of your data slice using capabilities similar to those found in the Express Tool. Rank enables the same ranking analysis found in the Express Tool.
<back to text>
 Web Locker – The Web Locker is an online data repository specific to an individual user. It stores the data slices and output of the Power Analysis Features generated by users with the Analyzer Tool. <back to text>