Searching the World Wide Web is an imprecise art for many reasons. Searching just the portion that is the U.S. government webspace is also a challenge, but it just became a whole lot easier. FirstGov, the official federal government portal, has implemented a new and truly improved search engine for finding what you need on U.S. government sites.
FirstGov's search engine for government websites has always been its weakest offering. As a portal, FirstGov excels at providing quick links to federal agency and office sites as well as to state, local, and tribal government websites. It also organizes links by category of federal information, such as government library sites or federal legislative sites. But FirstGov's keyword search against an index of government web content was marred by weak relevancy ranking, an unhelpful display of search results, and other problems. (For more on the topic, see the February 2005 column, Why Google Uncle Sam?.)
The FirstGov team has addressed the flaws with a package of strategies. The old search engine has been replaced by Microsoft's MSN Search and Vivisimo's clustering search engine. (See Vivisimo's press release.) The FirstGov team conducted usability testing on its search results page and made significant improvements there. FirstGov has also refreshed and expanded its index, and no longer includes non-government "think tank" sites such as Rand or the Federation of American Scientists. The overall result is not perfect, but it is much, much better. If you had given up on FirstGov search, go back now and give it another try.
Skipping to the Results
The best features of the new FirstGov Search are on the results screen. Beyond the usual scrolling and clicking and hitting the back button, here is what you can do:
- View a website in a new window, without losing the results screen. (Click on "new window.")
- View a web page within the results screen (Click on "preview"; see image A, below).
- View an image of the page as it looked when it was indexed by the MSN Search crawler. (Click on "Cached.")
- See the results of your search within one selected site. (Click on "More from…")
- Depending on your search terms, get quick links to related information from the FirstGov FAQs, federal forms, or other special sections of FirstGov. (See image B, below.)
The left panel of the results screen has more options brought to you by FirstGov's implementation of Vivisimo's clustering engine. Vivisimo arranges a typically long list of search results into groups, or clusters. Think of it as an aerial view of your results, an alternative to paging through screen after screen of individual hits. Clustering can speed your way to that perfect website you might have missed because it was buried far down in the results list. Users of Vivisimo's Clusty search site will be familiar with this approach.
Clusters by topic are the default. Vivisimo creates topic clusters on the fly, trying to group results that have word pattern similarities. Topic clusters can help to give an overview of your results, to disambiguate results when you are searching on words with several meanings (such as Turkey the country and turkey the bird), and to zero in on a subset of results (such as just those dealing specifically with the vaccine aspects of avian flu).
Clusters by agency must be selected by clicking on the second tab. Agency will likely be the most beneficial cluster type for frequent searchers. More on that in a bit.
Clusters by source, the third tab, don't do much for me now; perhaps they will be more useful in the future. The "sources" include the Web, currently the source of most results. Other designated sources include podcasts and special sections of FirstGov, such as FirstGov FAQs.
Image C - Screen shot of Clusters by Topic
A Critical Look at Search
The FirstGov homepage features the usual blank search box in the upper right corner. (FirstGov Search also has a dedicated page at http://firstgovsearch.gov/.) Links lead to advanced search and to search tips. By all means, take a few minutes to review the clear and concise search tips.
Image D - Screen shot of Advanced Search
Use the advanced search screen to take advantage of its easy fill-in-the-blanks approach to constructing a complex search. For example, advanced search lets you limit by file type (e.g., Adobe PDF, MS Excel, MS PowerPoint) and language (English or Spanish), among other criteria. The principal--perhaps only--reason to limit by language is to find government content on a topic in Spanish. A search on "asbesto or asbestos" limited to Spanish content brought back some very useful results. Limits by language do not appear to be watertight-some English-only content crept into many of my test searches-but it's a great way to ferret out the Spanish-language content that is distributed in bits and pieces across federal, state, and local government websites. So, let's call it a "leaky limit."
Another leaky limit is the advanced search option to select a level of government. You can choose to search only federal sites, all state government sites, or individual states and territories. (The default is to search all federal and state sites, which includes territories and tribal government sites.) Like the Spanish-language limit, the government-level limit leaks because webspace, even government webspace, is wildly diverse and lacking in standard metadata. The .gov domain managed by the General Services Administration is not exclusive to federal government sites. It can be used by state, local, tribal, and territorial governments, although it isn't always. Some non-federal governments have moved on to .org, or even .com addresses, or a mix of these. Others have remained at their original designations ending in .us (for example, .ak.us for Alaska government sites and the .nsn.us suffix used by many tribal, or Native Sovereign Nation, sites).
The whole scheme has gotten rather messy. A commercial website slipped into some of my search results because it uses the address congress.nw.dc.us. (This is the URL was originally used for the CapWiz product; it now redirects to capitoladvantage.com/capwiz.) What all of this means is that a simple limit by domain, such as is available on many standard search engines, will not effectively limit to just federal sites, or just state sites. FirstGov Search does a much better job of filtering by government-level than the generic search engines do, it is just that the filter leaks a bit. On the up side, the FirstGov team is working with state and local government sites to improve the government-level filter. While this work goes on, they are erring on the side of a leaky filter rather than an overly strict filter that could block out relevant sites. Look for gradual improvement on this front.
Clusters to the Rescue
While you are waiting for those improvements, take a look at clustering by agency. Do a search on "medical malpractice" and limit it to search in "Federal Only." Some state sites will leak into your results. Next, select the "By Agency" tab. (See image E below.) The agency cluster view makes it easy to pick out major federal sources--such as the House and the Government Accountability Office--from the many state sites also concerned with the issue. (Tip: add the word "federal" to this search and you'll cull out more results from other federal sources. This tip will not work with all searches, but it is an example of how much one variation on your search statement can bring far different results to the surface. Your first search is always just a starting point.)
Image E - Screen shot of Clusters by Agency
FirstGov Search does not say much about it, but FirstGov can effectively limit searches to the .mil, or military, domain that is exclusive to U.S. military websites. Use the syntax "site:mil" or use advanced search and type "mil" in the box labeled "Limit to these sites." But wait…you won't find some of the recruiting sites, such as GoArmy.com and Marines.com, because they use the .com suffix. For military searching, you will also want to try the military's own DefenseLINK Search which does include those recruiting sites and has its own array of advanced search features.