The Government Domain: Testing the THOMAS Beta

 

THOMAS Beta Test

THOMAS is the primary website through which the U.S. Congress provides public access to legislative information. The site was established in 1994 and is managed by the Library of Congress for the Congress. THOMAS has gained new content and features -- and undergone the occasional facelift -- over the past 12 years. This month, January 2007, the folks at THOMAS rolled out a beta test of new THOMAS features at the URL http://thomas.loc.gov/beta/. While making no change in the core content, the beta presents some of the biggest search, display, and navigation changes to THOMAS in its history.

THOMAS Beta Test Homepage





Mega Search

The biggest change is the addition of a single search box, placed front and center on the test site, for searching all THOMAS content at once. Search results are segregated by database: text of legislation (1989- ); the Congressional Record (1989- ); committee reports (1995- ); presidential nominations status (1987- ); and treaties status (partial coverage, 1967- ; full coverage, 1975- ). The THOMAS mega search can be used to search all databases for the current congress, previous congresses, or both current and previous congresses.

On the THOMAS beta page, the venerable Bill Summary and Status (BSS) database -- covering legislation from 1973 forward -- does not appear as a separate, searchable database as it does on the current THOMAS system. The otherwise helpful beta FAQ is a little fuzzy on this point, but from my testing the CRS bill summary information does not appear to be searched. Instead, data elements from the BSS database, such as the CRS summary, appear as display options in the bill navigation box when the full text of a specific bill is displayed. Some BSS fields, such as "related bills" and notes that appear in the header for bill text displays do appear to be indexed.

Search

Behind the new mega search box is bigger story: THOMAS is using new search software in this beta test. The switch was announced in an Autonomy Corporation press release. Since the beginning, THOMAS had used the InQuery search software. (InQuery is a product of the university-based Center for Intelligent Information Retrieval and is described in this THOMAS help file.) The switch will provide a more flexible platform for Library of Congress developers. Unfortunately for searchers, the THOMAS beta does not have a direct link to search help.

The new search engine brings THOMAS into the modern-day mainstream for basic word searches. Autonomy's software is a great foundation, but I still have a few gripes, such as how a bill number search is implemented. In the beta, searching by a bill number -- probably the most important data element in these databases -- is not distinguished from searching by word or phrase. The existing THOMAS system has an option to specify a bill number search, which is configured to be much more accurate and easy to use. Search for H.R. 6 as a bill number in the 109th Congress full-text legislation database on the current THOMAS, and you will get precisely the six versions of H.R. 6 that were printed. Search for H.R. 6 in the 109th Congress full-text legislation database on the beta THOMAS, and you will get 18 search results, including the six versions of H.R. 6 (helpfully sorted at the top) plus related bills and special rules that mention H.R. 6 in their text or in header information from the BSS database. Many searchers may find this to be helpful; I prefer to have at least the option of a precise bill number search. In addition, because the beta conducts a bill number search as a word search, it is important to enter the search in the properly punctuated format. On the existing THOMAS, the bill number search option tolerates a sloppy hr6 or hr 6.

The beta generally moves away from the precision of field searches like the bill number search. There are few opportunities to combine a search of a controlled field with a word search or with another field-specific search. Want to find which bills referred to the House Agriculture Committee have become public law? Or which bills concerning Vietnam have been reported out of House or Senate committee? Both of these searches can be done in the existing THOMAS system, with the BSS advanced search. If there is a way to do the same searches in the THOMAS beta, I haven't found it yet.

Search Results

Result displays on the THOMAS beta are also very different from the existing system. Search results from a mega search can be sorted by content type, relevance, or date. Content type is the default sort; this is the sort that segregates legislation from committee reports, committee reports from the Congressional Record, et cetera. In the relevance and date sorts, results from all databases are integrated. Those familiar with the THOMAS databases will be able to guess the content type by viewing the results list, but individual results are not labeled by database source.

An image of the default content type sort appears below.

Default Sort




Content type is the logical choice for a default sort of these databases. A second available sort is relevance. The relevance algorithm is not described in the documentation, but it certainly can be helpful for viewing the results of word searches in the full-text legislation, Congressional Record, and committee report databases. The third sort option is date (reverse chronological order). These three sort options are available for the results of a mega search. Once you are viewing the results of a single database search, the sort options are specific to that database. For example, the committee reports database has options for sorting results by report title, committee name, chamber (House or Senate), and report number, in addition to date.

Database-specific sort options are a wonderful idea, but it looks like the THOMAS team needs to tinker with a few of these sorts to get them to work to the searcher's best advantage. A multi-congress search for committee reports, for example, offers a sort by report number. The report numbers option will sort, in ascending order, by the three digits to the right of the hyphen so that, for example, results could come back in this order:

H. Rept. 108-010
H. Rept. 109-019
H. Rept. 108-044

This is because 10 comes before 19, which comes before 44. That's OK, I suppose, but not what I would have expected. This situation becomes less helpful once we are sorting committee reports with more than three digits to the right of the hyphen. In that case, only the first three digits to the right of the hyphen are considered, so that reports may sort in this order:

H. Rept. 106-1005
H. Rept. 104-101
H. Rept. 106-1039

Bill number sorts work in a similar fashion with, for example, H.R. 1005 sorting before H.R. 101 in an ascending bill number sort.

Navigation and Other Features

I could have gone on with observations and suggestions for the sorts but it gets messy fast and this is, after all, just the beta. Navigation on the THOMAS beta merits more attention. As currently configured, the beta offers three approaches to entering THOMAS:

  • a word search of all THOMAS content for the current congress and/or all available previous congresses;
  • a search for all legislation in the current congress sponsored by a specific member of the House or Senate; or

  • a search for all legislation in the current congress that has been assigned a specific subject indexing term, such as "higher education" or "intelligence activities."


  • (The beta also has an experimental "guided search" option, but this has not been developed enough to warrant investigation yet.)

    So, aside from sponsor or subject searches in the current text of legislation database, the THOMAS beta funnels all searchers through its mega search option before they can get directly to a single database. If I want to see the committee report H. Rept. 109-16, for example, I will have to put 109-16 in the mega search box and select "Previous Congresses." I next scroll to the Committee Reports portion of the results and click on "View all committee report results" (because H. Rept. 109-16 does not show up in the first three results) and browse through the list to find my document.

    On the plus side, once you find a document the THOMAS beta presents a handy document view and navigation screen. The navigation box accompanying the text of legislation is the best, with options to display data elements from BSS including the CRS bill summary and congressional actions information.

    Bill Navigation Panel





    I am looking forward to future developments for THOMAS. Top on my wish list will be restoring BSS fields such as congressional status steps and referral/reporting committee to the legislation search options; and providing a quick way to search a bill number in the current congress and find information relating to just that bill. Also, I read one promise on the beta FAQ that I would love to see fulfilled: "RSS feeds are not yet available for THOMAS. We do plan to add this feature in the future."