Paul Barron is the Director of the Library and Archives at the George C. Marshall Foundation in Lexington, Virginia. This is a revised version of an article published in the Virginia Lawyer in December 2005
“The Internet has quite simply become one of the primary tools in a research strategy
that aims to pull data from all relevant sources. The Internet simply can’t be ignored.” *
In a 2001 Duke Law Journal Article, Carl F. Cranor wrote, “[L]ittle is known about the universe of approximately 100,000 chemical substances or their derivatives registered for commerce. Surprisingly, for seventy-five percent of the 3000 top-volume chemicals in commerce, the most basic toxicity results cannot be found in the public record.” This fact might dissuade some researchers from searching publicly accessible Web-based sources for information on the reliability of scientific evidence about toxic substances.
Although the Web is complex, legal professionals know that the Web has authoritative sources and they have integrated the Web into their research. However they also know that the number of Web documents increases the difficulty of finding relevant information. In August 2005, the Yahoo search blog stated that the Yahoo indexed 19.2 billion documents. Sources cited by the Congressional Research Service estimate that each day 7 million documents are added to the webpages of the sites of the more than 60 million registered top level domains. The same CRS report states that the Deep Web, that portion of the Web reachable only by querying a database such as National Library of Medicine’s PubMed, is much larger. Estimates of its size vary from 150 times to 500 times larger than the Surface Web.
If a law library contained the number of print resources the size of the Surface Web, researchers would use more than simple keywords to locate information. Fortunately by using the advanced search features of Web search tools, relevant information can be found quickly. For instance, to locate the Cranor article, a Google site-limited title search returns one result–the link to the full text article (See Figure 1, immediately below).
The purpose of this article is to review a search process using advanced search query features in Google, Yahoo, and other search tools to find publicly accessible Web-based information on toxic substances and the law and, more specifically, the reliability of scientific evidence about toxic substances. Search tools that perform better with specific topics are searched using queries related to “sick building syndrome.” Although Google and Yahoo are used in Surface Web searches, researchers must understand that no search engine indexes more than 20 percent of the Web. Additional search engines must be used to thoroughly search the Surface Web’s content. The Deep Web is searched using the Scirus science-specific search engine, the OAIster academically-oriented digital resources search engine, and MEDLINE. The information gathered from the two Web sections will supplement more in-depth research conducted in print resources and proprietary databases such as Lexis-Nexis.
The search query syntax is standardized: Search terms of two or more words are enclosed in quotes so the search engines search for phrases, not individual words; Boolean operators are upper case text, and multiple Boolean expressions are nested to keep like concepts or synonyms together. This methods work in the Surface and the Deep Web search tools discussed in this article. Using this standard, the mixed syntax search query format is:
“toxic substances” AND “scientific evidence” AND (reliability OR verification)
Although Google’s and Yahoo’s advanced search query features are used, queries are run in the basic search templates to avoid the confusion of where to place the search query segments in the advance search template.
For such a broad topic the research begins with the Scirus science-only search engine recommended by Levitt and Rosch because the search tool locates scientific, scholarly, technical, and medical data. Unlike the general search engines, Scirus’s advance search option is recommended because that option allows the researcher to find search terms in the article title and to limit the results by content area, date, information type, and subject area (Figures 2 & 3 immediately below.
Figure 2: Scirus Science – specific Search Engine Advanced Search Template
Figure 3: Scirus Results
The date-limited search (2000-2006) Scirus search: “toxic substances” AND “scientific evidence” AND (reliability OR verification) returned relevant resources such as the current edition of the Federal Justice Center’s Reference Manual on Scientific Evidence including the full text of the Reference Guide on Toxicology co-authored by Mary Sue Henifin, J.D., M.P.H., a law firm partner and an Adjunct Professor of Public Health Law at the Robert Wood Johnson Medical School. The “guide focuses on scientific issues that arise most frequently in toxic tort cases … and provides an overview of the basic principles and methodologies of toxicology and offers a scientific context for proffered expert opinion based on toxicological data.”
Locating recent research is accomplished by the OAIster Deep Web search tool, a project of the University of Michigan Digital Library Production Service (Figure 4). OAIster’s mission is to provide links to free, difficult-to-access, academically-oriented digital resources. A search for “toxic substances” AND law returns 11 results. One, an article titled, “Regulating Toxic Substances Through a Glass Darkly: Using Science Without Distorting the Law” is by Cranor and concludes, “legal regulation of toxic substances by the tort (or personal injury) or regulatory law can be addressed by sensitively designing scientific and legal burdens of proof for the legal and public health problem in question.”
Figure 4: OAlster Digital Library
Another specialized database is the National Library of Medicine’s PubMed that is updated twice a week and contains over 15 million citations from MEDLINE and other life science journals (Figure 5). The database’s content is indexed using the controlled vocabulary Medical Subject Headings (MeSH). Since the MeSH terms are very precise, the controlled vocabulary should be reviewed prior to searching the database. For example, the MeSH for “toxic substances” is “hazardous substances” and the descriptor for the toxic “black mold” Stachybotrys atra is satratoxin H, described as “a toxic metabolite of Stachybotrys atra.” Using the controlled vocabulary in the search: “satratoxin H” AND buildings returns 6 results from journals and articles studying the adverse health effects to occupants after exposure to satratoxin H in water-damaged buildings. The first result links-out to a full text copy of the article.
Figure 5: PubMed Search Results Display
Researchers can also search for non-MeSH terms in PubMed; a title search for “sick building syndrome” returned 173 results from journals such as the Archives of Environmental Health (“Studies on the Role of Fungi in Sick Building Syndrome”).
Another useful National Library of Medicine database is TOXNET, a cluster of databases on toxicology, hazardous chemicals, environmental health, and toxic releases. Running the “sick building syndrome” search returned 500 results from U.S., Belgian, British, Dutch, German, and Scandinavian journals.
After searching the Deep Web with specialized search tools the Surface Web is searched using Google and other search engines. An initial search for: “scientific evidence” AND “toxic substances” returns more than 340,000 results from .com, .org, and .gov sites and sites from Canada and the Cocos Islands. While the number of returns may indicate the popularity of an issue, there are too many results. However the 13th result is surprisingly relevant; the website belongs to board certified civil trial lawyer with a B.S. in chemical engineering. The site’s subject categories include: Research Sites for Chemical and Toxic Properties, Scientific Evidence – Resources for Daubert/Frye Issues, Resources for Specified or Classes of Toxins, and Litigation Support Resources.
One useful technique to refine a search returning too many results is to only find webpages with a specific title since a webpage titled, “Scientific Evidence” probably focuses on that subject. The title search syntax in Google is: intitle: and the revised query is:
intitle:”scientific evidence” AND (reliability OR verification) AND “toxic substances”
Figure 6: Google Title Search Results
All 47 of the results have the phrase “scientific evidence” in the title of the result (Figure 6). The second result, an online version of the peer-reviewed American Journal of Public Health, has an abstract of the full text article, “The Weight of Scientific Evidence in Policy and Law,” which can be purchased for $10. The fifth result from Defending Science.org provides a full text copy of the same article for free.
Another effective search technique to reduce the number of results is to limit the results to a sites with a specific top level domain such as a .edu, .gov, or .org. Educational sites may provide articles by faculty, federal and state government sites will provide full text of laws, while organizational sites may express viewpoints about an issue or law. In the .edu domain-limited search:
“scientific evidence” AND “toxic substances” AND (reliability OR verification) AND site:edu
the second result is the 2001 Duke Law Journal article by Cranor (Figure 7, below). The fourth result in the edu-limited search connects to the Harvard University-hosted site Sound Science in the Courtroom. The homepage mentions the Atlantic Legal Foundation whose mission is to “ensure that whenever science is used in a courtroom that it shall be sound science.”
Figure 7: .edu Top Level Domain Limited Search Results
Another search option is to limit the results to a specific site by running a site-limited search. To search only the Atlantic Legal Foundation site for information about toxic substances and scientific evidence, the search query is:
“toxic substances” AND “scientific evidence” AND site:atlanticlegal.org
The second result is a 5300-word article by a lawyer explaining the Daubert standard for the admissibility of scientific testimony (Figure 8). The Daubert standard only applies in federal courts; some states rely on the earlier Frye standard that established a threshold rule for assessing whether scientific testimony had sufficient foundation to be considered by a jury.
Figure 8: Site-limited Search Results
One of the sites linked to by the Florida attorney specializing in toxic tort is the Agency for Toxic Substances and Disease Registry (ATSDR), a federal public health agency of the U.S. Department of Health and Human Services. A review of the site indicated that ATSDR’s mission is to serve “the public by using the best science … to prevent harmful exposures and diseases related to toxic substances.” A site-limited search with the query:
“toxic substances” AND “scientific evidence” AND site:atsdr.cdc.gov
returned 85 results from only the ATSDR site. Along with case studies one of the results is an extensive study about the social and psychological effects of exposure to toxic substances.
Once a useful website is located link checks should be run to the site. Quality websites link to other quality websites and may expand the content of the linked to site. Yahoo or MSN is preferred for link checks because the search engines will run complex Boolean and top level domain-limited searches. Note: The http:// must be included in the link check search query in Yahoo or the search will fail. To find sites that mention scientific evidence and toxic substances that are linked to the Atlantic Foundation website the query is:
link:http://www.atlanticlegal.org AND “scientific evidence” AND “toxic substances”
Figure 9: Yahoo Link Check Results
Three results are returned for the search; the first is the site of the board certified civil trial lawyer located in the first search in Google (Figure 9).
The results from these searches of the Surface and the Deep Web remind us that that the Web is a vast and ever-changing information source. By using advanced search features in specialized search tools and general search engines, relevant information can be located that supplements the print and proprietary databases.
Summary of Web Search Strategies
- Determine appropriate search engines to recover information in both the Surface and the Deep Web.
- Structure the search query with punctuation and groups for the maximum effect.
- Use date restrictions to narrow the results.
- Consider narrowing searches by using intitle, domain or specific site- limited searches.
- Use link checks to “Shepardize” the results.
 Congressional Research Service. (2003). Internet Statistics: Explanation and Sources (Order Code RL31270). Rita Tehan: Author. Red Light District: Plan for Adult Area Sparks a Fight on Control of the Web. Wall Street Journal, 10 May 2006.
 Lawrence, S. & Giles, C. L. (08 July 1999). Accessibility of Information on the Web. Nature 400, 107. To verify the limited overlap in search engine results, run a search query in Thumbshots. Other recommended search engines are: Ask, MSN, Yahoo, and Exalead.
 See the National District Attorneys Association site for a list of states and whether the follow Daubert or Frye.