ResearchWire – Unearthing Statistical Data on Internet: Effective Research Strategies

Genie Tyburski is the Research Librarian for Ballard Spahr Andrews & Ingersoll in Philadelphia, Pennsylvania and the editor of The Virtual Chase:TM A Research Site for Legal Professionals.

(Archived November 1, 1997)


How many intercity passengers rode New Jersey Transit during 1996? 7.4 million. What percentage of rape victims, under the age of twelve, know their offenders? Ninety percent. How many children in the U.S. have blood lead levels that are greater than or equal to 10 micrograms per deciliter? Approximately 1 million. How many incidents of private industry non-fatal work-related injuries and illnesses were reported during 1995? 6.6 million. Since 1986, is this an increase or decrease in the rate of reported cases per one hundred full-time workers? Decrease.

As the reader may surmise, statistical data abounds on Internet. But while its abundance exemplifies one of Internet’s many strengths, its chaotic dispersal underscores an oft stated weakness.

How then do researchers find statistics on Internet? Entering the query, new jersey transit passengers or new jersey transit statistics at a mega search service like MetaCrawler, yields no relevant hits concerning the number of 1996 intercity riders. The search statement, child rape victim offenders, at Infoseek, on the other hand, finds a statistic, but it differs from the answer above, provided by a more current complete source.

The reasons popular search services frequently yield myriad irrelevant or trivial references go beyond the scope of this article.1 Suffice it to say better methods for locating statistics on Internet exist.

One strategy entails using a focused search service; that is, a finding tool designed to retrieve statistical data. The Federal Interagency Council on Statistical Policy introduced such a service earlier this year. Called FedStats, the site provides access to statistics produced by 70 federal agencies. Researchers may browse its A to Z subject index or enter a keyword query.

To illustrate how the resource works, let’s first select a straightforward query. What is the current CPI-U (Consumer Price Index for All Urban Consumers) for the Philadelphia region?

After scanning the letter “C” category of the A to Z subject index, follow the path, Consumer Price Index. This leads to a page with several CPI options. Selecting the one that best describes the desired information, we next respond to a series of six questions from the Bureau of Labor Statistics (BLS) server before retrieving the correct data.

Alternatively, we could have performed a keyword search at FedStats. Entering the query, Philadelphia, and checking the option, Bureau of Labor Statistics, we now bypass five of the six BLS questions we encountered with the first strategy. This method, however, assumes that we know BLS produces the CPI.

But, in all likelihood, we would have found the same data by using one of the Web’s more popular search services. To demonstrate the power of FedStats, let’s construct a more complex query.

What information about U.S. cigarette consumption and production exists on Internet? Have smoking restrictions, higher taxes, consumer awareness about smoking-related illness and disease, and higher retail prices curbed the public’s appetitite for cigarettes? Has cigarette production increased or decreased in recent years? Which states employ the greatest number of workers in the cigarette manufacturing industry?

Entering the query, cigarette, tobacco, we retrieve 804 documents.2 Within the first two pages of hits, we find answers to these questions and more.

The Economic Research Service (ERS) of the Department of Agriculture, in its report on tobacco products, estimates 1996 U.S. cigarette consumption to be the same as consumption during 1995 . The same source also predicts an increase in 1996 production because of better growing conditions. From the 1992 Census of Manufactures census of tobacco products, we learn that Virginia, North Carolina, and Kentucky employ more cigarette workers than other states.

These publications, as well as other references from the FedStats search results list, offer additional relevant statistical data. For example, from the ERS report, “Tobacco Leaf and Products Statistics and Analysis,” discover how U.S. cigarette trade fared during 1996. Or, review the relationship between cigarette smoking and other high risk behaviors among U.S. adolescents in this summary report from the National Center for Health Statistics’ National Health Interview Survey — Youth Risk Behavior Survey (AD #263).

As demonstrated, FedStats presents a better search engine alternative for those seeking statistical data. But performing a keyword search is not the only means by which we may find statistics.

Another useful strategy involves taking a topical approach. To increase the liklihood of success with this method, however, I recommend making use of a statistical research guide. Although several exist on Internet,3 I found none as well-organized, nor as exhaustive in coverage, as Statistical Resources on the Web by the University of Michigan Documents Center.

Since this is a Michigan resource, let’s look for current statistics about the U.S. automotive industry. Select the topic, transportation, from the guide. Under the sub-heading, “Motor Vehicles,” follow the first link — Automotive Information Center.

Maintained by the Michigan Electronic Library, this resource offers the category, “industry statistics.” Selecting it, we discover links to numerous publications offering data concerning vehicle production, car and truck sales, trade with Canada, Japan, and other foreign countries, and the state of the market.

Yet a third research strategy entails use of a catalog site for locating known statistical works. Consider popular titles such U.S. Statistical Abstract, County and City Data Book, and the CIA World Factbook or popular sources like the U.S. Census Bureau. How do we find out if they appear, in whole or in part, on Internet?

The most popular catalog on the Web — Yahoo! — contains an entry for Government:Statistics. Perusing the list, we find links for various Census Bureau offices and publications as well as other government agencies offering statistical data.

Returning to Yahoo!’s main page, we perform keyword searches for known titles. This yields direct links for Statistical Abstract of the United States, 1996, City and County Data Book, and several editions of the CIA World Factbook.

More helpful than Yahoo! for locating government publications offering statistical data is the Migrating Government Publications catalog offered by the Government Publications Department of the Library at the University of Memphis. Arranged both by title and Superintendent of Documents (SuDoc) number, the site tracks and indexes government publications available electronically.

Had we connected to this site, rather than Yahoo!, in our search for Statistical Abstract of the United States, for example, we would have discovered not only the 1996 edition, but also the 1995 edition and the title, Statistical Briefs, which appears next in the catalog.

In closing, looking for relevant statistics, published in traditional information sources, presents a challenge to the best of researchers. Seeking them from the complex world of cyberspace magnifies this struggle. Yet the sheer volume of statistical data available on Internet warrants the search. As Robert Herrick sagely observed:

Attempt the end, and never stand to doubt;
Nothing’s so hard, but search will find it out.
.”

Robert Herrick, though, never experienced the convolution of cyberspace. He had little reason to add the adjective, proficient to “search.” We, on the other hand, do.

********************

Footnotes

  1. For possible explanations, see Pollock, Annabel and Andrew Hockley, BT Laboratories. “What’s Wrong with Internet SearchingD-Lib Magazine, March 1997. Online. Internet. 15 September 1997. Available WWW http://www.dlib.org/dlib/march97/bt/03pollock.html, Notess, Greg R., Montana State University. “Internet Search Techniques and Strategies,” Online, July 1997, pp. 63-66. Online. Internet. 15 September 1997. Available WWW http://www.onlineinc.com/onlinemag/JulOL97/net7.html, Sullivan, Danny, Califia Consulting. “How Search Engines Work,” Search Engine Watch, [no date]. Online. Internet. 15 September 1997. Available WWW http://searchenginewatch.com/work.htm.
  2. Research performed September 12, 1997. Searches performed at later dates may yield different results.
  3. See, for example, Statistical Resources compiled by Peru State College Library, Statistics by the University of Florida, Department of Statistics, and StatLib by the Carnegie Mellon University Statistics Department.

Resources

Bureau of Labor Statistics

Census Bureau

Federal Interagency Council on Statistical Policy, FedStats

Oregon State University, Government Information Sharing Project

RAND

University of Michigan Documents Center, Statistical Resources on the Web


Recommended Readings

Goehlert, Bob, Indiana University. “Search Strategy for Finding Data,” 1997, revised 1 August 1997. Online. Internet. 16 September 1997. Available WWW http://www.indiana.edu/~libsalc/goehlert/ss_data_ec.html.

Goff, Bill, University of Southern Mississippi. “Resources for Economists on the Internet,” May 1997. Online. Internet. 16 September 1997. Available WWW http://econwpa.wustl.edu/EconFAQ/EconFAQ.html.

Moody, Marilyn K., University of Buffalo. “Demystifying Documents … on the Internet,” 9 May 1997. Online. Internet. 16 September 1997. Available WWW http://ublib.buffalo.edu/libraries/units/sel/mkm/michigan/docs.html.

Posted in: ResearchWire