E-Discovery Update - by Fios Inc.: The Myth of Search

By Conrad J. Jacoby, Published on September 17, 2006

Finding relevant evidentiary materials has always been a challenge in litigation, administrative, and internal investigation matters. Too much material is dispersed over a wide area, and the people who created potentially relevant documents rarely remember everything that they created or where it might be stored. Widespread use of personal computers has only amplified this problem, though many attorneys believe that it's somewhat easier now to isolate the relatively few computers on which potentially relevant electronically stored information ("ESI") may reside. Once those repositories have been preserved, it's then theoretically possible to apply a variety of procedures to separate truly relevant materials from digital by-catch that can be freed from litigation hold nets.

The problem with this idea is that it may be easier to isolate the computer workstations ad servers that contain the universe of potentially relevant ESI, but the total amount of data stored on this discrete number of machines dwarfs even the largest hardcopy discovery requests of the past. In light of the enormous amount of ESI that must be searched to find potentially relevant evidence, the legal community has grudgingly accepted that manual review is no longer a viable-much less cost-efficient-way of identifying relevant electronic materials. Instead, a variety of automated filtering and searching paradigms must be used to reduce a vast universe of harvested ESI to a document collection that is small enough to tackle via final manual review. Some search criteria, such as "creation date" and "e-mail author" are well-accepted by litigants and courts alike, achieving a type of gold-standard for reliability. Other filters for ESI, such as conceptual and contextual search, are based on newer technologies and are still establishing their credibility through a combination of real-world case studies and scientific examination. Regardless of the specific technology used, though, everyone agrees that some degree of automated search and filtering is an essential part of working with harvested ESI.

Problems arise, however, in how search results are interpreted by legal professionals. People are past the point of challenging the consistency of automated filtering technology, particularly if it is based on highly objective data such as date ranges and key words. However, the accuracy of the filtering results-that is to say, the degree to which the search captures the types of materials sought-can be appallingly low at times. Unfortunately, however, many legal teams fail to recognize the limitations of the specific filtering techniques they use and are lulled into believing that they have completed a comprehensive search for relevant electronic documents, even though other relevant materials might be located though additional means. At best, such incomplete searches generate an incomplete record that makes it difficult to demonstrate certain factual points. At worst, incomplete search can seriously harm a producing party by providing a basis for motions to compel production and motions for evidence spoliation. Search and search technology must be taken seriously.

Search technology has made incredible advances in recent years, but search results generated by even the most cutting-edge search and filtering technology should never be considered comprehensive and always be evaluated in light of external factors that can greatly impact the significance of search results.

1. Search Doesn't Necessarily Search Everything

Perhaps the most obvious factor that influences the accuracy of search results is the nature of the ESI that has actually been collected. All too often, one or more sources of potentially relevant information-the laptop carried by a globe-trotting member of the management team, archival CD-ROMs burned by administrative assistants before deleting old e-mail messages-slips through the data collection process. As a result, materials stored in those digital repositories are never searched.

Somewhat less obviously, some search and filtering procedures may not search all the data contained in specific documents. For example, many indexing and search engines can be configured to ignore some or all of the metadata associated with documents. Leaving this information out may greatly speed the indexing process and let subsequent searches run much more quickly. Indeed, most document metadata is unlikely to contain information that materially changes the importance of a specific document.

Sometimes, however, potentially non-indexed information could be critically important. E-mail attachments may provide valuable clues in the context of the message to which they are attached. Embedded files and file objects may contain key information. Deleted language from a word-processing document, may clarify the purpose of the final version's language. Depending on the search technology used by the legal team, each of these information sources may or may not be indexed and searched. It's important to understand whether or not they have been.

2. Search Finds Exactly What You Are Looking For

Another limitation to search technology is that queries and filters are limited by the imagination of the person creating them. Key word search is particularly susceptible to this limitation-a person could search on the word "car" and miss relevant documents that discuss "automobile," "vehicle," and "Chevrolet." However, even sophisticated search procedures can omit important information or include unintended results.

For example, filtering ESI by creation date is a common and uncontroversial procedure. However, if the ESI was harvested incorrectly, file creation dates can be altered, permitting otherwise relevant documents to fall outside a date range search query. Even in situations where ESI has been accurately captured, file creation dates are assigned by the computer that created the file. If that machine's system clock was incorrectly set, all files created on that computer will not reflect their actual creation time. For a computer whose system clock reset to January 1, 2000 (or some other default date) at some point in its past, creation date inaccuracies may be significant.

Conceptual and contextual search technologies apply sophisticated algorithms to reduce impact of incomplete search criteria. Concept search uses sophisticated thesauri and analytical algorithms to find relevant documents whether or not they contain specific search terms. Contextual search identifies not only documents that meet specific criteria, but also finds related documents, such as e-mail messages or documents created on the same computer at approximately the same time or that are based on the document in question. As powerful as both of these approaches may be, they are still driven by a specific vision of a case. Should the case materially shift through new legal claims or an expanded investigation, the rule sets that govern the logic in both of these approaches may need to be revised-sometimes significantly-in order to continue correctly identifying potentially relevant documents.

3. "Some Results" Is Not The Same As "Complete Results"

We live in a world where computers are viewed as impartial sources of information. Computers routinely calculate interest generated by bank accounts, determine insurance premiums, and monitor the quality of the water we drink and the air we breathe. Few of us question the information that a computer provides us unless it appears patently incorrect. In those cases, errors are almost always due to operator error or problems with the underlying source data. We trust computers.

Unfortunately, in the context of e-discovery, attorneys may trust computers a little bit too much. E-mail messages and employee work product, the source data searched in fact discovery, contain a huge diversity of language and document formatting. Different authors use different vocabulary and different abbreviations to reference the common ideas and topics. Document metadata may be compromised if someone revised an older document instead of creating a new document from scratch. These and other inconsistencies in source data limit the accuracy of even the most sophisticated search technology.

As a behavioral matter, though, when a computer retrieves documents based on search criteria, most of us have an inherent desire to believe that the computer has found the documents or database records that match our needs, not just our search criteria. We may realize that something is amiss if a search returns few or no responsive documents, but the difference between 10,000 documents and 12,000 documents is more subtle and much more commonly overlooked. Careful researchers use a variety of searches to uncover relevant documents, but even these techniques are unlikely to achieve the goal of finding every single responsive document. It's simply not possible.


Search technology is a crucial part of the e-discovery process. Without procedures to screen out irrelevant ESI, the legal and corporate communities would quickly bog down in an ocean of uncharted data. By the same token, however, it's important to be educated about the significance of search results, including potential weaknesses in the approach used by a legal team. Some materials will always elude capture, for any number of reasons.

Understanding the limits of search technology also makes it possible to present stronger, more accurate statements regarding the ESI discovery process. As with paper-driven discovery, it remains impossible to certify that every single relevant piece of ESI has been located and processed, and no legal team should feel obligated to make such representations. However, identifying the search strategies used to find relevant material, including proactive discussion as to why the specific technology used is adequate for the needs of the matter, will help demonstrate the reasonableness of a producing party's efforts.