Features – Confessions of a Deep Linker: Advanced Techniques for Linking to Government Documents and Databases

Phillip A. McAfee is an attorney with a Masters Degree in Health Law from Loyola University in Chicago. He is also the owner of the Health Hippo Web site, which has provided deep and extensive links to government documents related to health law, policy and regulation since 1996.


This article discusses techniques for deep linking and creating self-updating links to U.S. government documents. It assumes familiarity with Internet browsing and some familiarity with HTML coding. Deep links bypass a Web site’s home page to retrieve information from a specific page deep within the site. Another form of deep link automatically runs a search and retrieves a document or list of documents from another site’s database.

In this article, I will provide specific examples of deep and self-updating links to the U.S. Code, Code of Federal Regulations, Federal Register, GAO Reports, federal legislation, and the Congressional Record. A reader could extrapolate these example to create links to documents in other government (and non-government) databases. In the recent past these links have been quite unstable — with government Webmasters seemingly changing the urls on whim. Which leads us to the first theme of this article.

I. The Basics: Why Internet Citizens Must Demand Sensible and Stable Uniform Resource Locators (URLs) for All Government Documents.

When you go to the local law library, the books don’t move from shelf to shelf very often. And certainly the call numbers don’t change much — other than to add additional materials. Such a system would make it impossible to conduct legal research with any degree of confidence. If your library burned down the books may indeed be gone or moved to different places, but the call numbers would remain unchanged and you could easily find the books elsewhere.

Over the past five years the U.S. government’s Internet law library has “burned down” with the predictability of El Nino. And, to further the physical library metaphor, not only have the books disappeared — the entire cataloging system has been replaced on a regular basis. The result? Unreliable legal research in almost all of the key government document databases. But, in the past year or so, this has changed dramatically and for the better.

A. The Good

Cornell University’s Legal Information Institute (LII) provides a great example of elegant URL construction. Because the foundation of URLs was initially well thought out, the LII has provided stable links to the U.S. Code for several years. In addition, the LII links are logical, making them easy to modify to link multiple sections of the U.S. Code. A sample link to The Sherman Act demonstrates Cornell’s URL simplicity:

http://www.law.cornell.edu/us code/15/1.html

Here (to greatly simplify), “http” designates the hypertext transport protocol, “www.law.Cornell.edu” the address of the site containing the desired information, “uscode” the name of the specific database at the site, and “15/1” the exact document location (the “call number,” based on the Sherman Act’s citation, 15 U.S.C. §1). The “html” indicates that the document is coded in hypertext markup language — the formatting language of the World Wide Web.

http://www.law.cornell.edu/u scode/15/1.shtml

By simply changing “.html” to “.shtml” in the above URL, the user will be taken to Cornell’s equally elegant frames interface with options for: 1) going up subsections or chapters; 2) going to the previous or next section of the Code, and; 3) updating the citation by automatically searching for Public Laws that amended the section since the last published version. This third feature, making use of cgi-bin searching of a government database, is discussed more thoroughly as an advanced deep-linking technique below.



The first example above links to the entire Title 15, which contains the Sherman Act. The second example links to the Chapter containing the Sherman Act. Thus, there are at least four ways to link to 15 U.S.C. §1 from LII’s internet site. It should be noted that LII’s framing feature (.shtml) logically works only at the section level.

Cornell is, of course, a non-government site that provides extensive access to government documents. Cornell has claimed copyright in the HTML code generating its documents, but not in the original government works. Web sites across the Internet link heavily into the Cornell server. Cornell also offers Finding, Searching and Linking Aids for sites that “are offered in ways which make it difficult for an author to create links to specific documents or sub-parts within the overall collection,” including a section on creating captive searches to other Internet databases. Of particular note to advanced deep linkers is the excellent advice provided in the hazily-entitled section “rolling your own.”




B. The Bad and the Ugly

Here’s an example of an unartful URL to 42 CFR Sec. 476.101, Peer Review Information — Scope and Definitions, from the House of Representatives server. If you click on this link I am almost sure that it won’t work, though from time to time for reasons unknown to me, it has.

http://orbus.pls.com:8001/cgi-bin/taos_doc.pl?unix+ 1+cfr+140334

The above URL is not sensible and, as a result, the link was quite unstable. Thankfully this format is no longer used for the Code of Federal Regulations, but has been replaced by a less-flawed system at the Government Printing Office (GPO) site, discussed below. For now, the question to ponder is why government Webmasters chose such an obtuse URL to link to a document location that could have been as simple as “42/476.101”? Unfortunately the answer has not been forthcoming in my inquiries, but perhaps lies in the adage that “if you don’t know where you are going you will probably get there.”

C. Sensible, Stable Links to Government Documents Are Critical to the Nation’s Legal Information Infrastructure

Electronic government documents are primary authorities used by others to create logical metasites, thoughtful analyses, authoritative electronic treatises and other important secondary materials. The ability to hyperlink primary legal authorities to their sources is also in and of itself a legal citation system critical to the creation of memos, articles and other documents placed on a server for internal or external distribution. URLs to primary authorities that change constantly or are otherwise unreliable make the creation of all hyperlinked legal documents cumbersome and impractical.

There is no reason for a government document (or any other internet document) to get a new URL just because it has changed in some fashion. By simply archiving the old material to a new URL and updating the new material to the old URL the law, regulation, case, pamphlet, article or booklet becomes a self-updating link. Using the IRS as an example, why shouldn’t the URL to Publication 559: Survivors, Executors, and Administrators always provide the most recent version of the publication and links to archived historical versions? This is what is meant by providing sensible, stable links. The concept seems simple enough, but if I had a beer nut for every Web site that didn’t archive historical press released while maintaining a permanent “press.html” location for the current version, I would rival Health Hippo in girth.

II. Deep Linking Government Documents: Specific Examples

Now the focus shifts to the more problematic area of linking directly into government databases. You may have noticed, for example, that when you find a government document and lay in a bookmark on your browser, the link is often gone the next time you try and use it. Is it the work of magical government fairies? This section will show you how to create permanent links to most government documents, pointing you to specific pages on government servers that outline the URLs necessary to achieve permanent electronic government document Nirvana.

But first a question – why is deep linking necessary at all? Why not just download the bill, rule, report or regulation and provide your own independent link from your Web site? The answer lies in three parts: 1) the government is storing these large chunks of information anyway, making additional storage on a separate server redundant; 2) the document could easily be changed (intentionally or unintentionally) on a private server, making the government server document the only reliable and citable electronic version, and; 3) in some cases the government updates the link or document automatically, and you probably don’t want to be responsible for that task.

A. Deep Linking Thomas — Federal Legislation and the Congressional Record

Thomas has some pretty cool Webmasters. They have provided the capability to create permanent links to documents and automatic searches for several years. Here is an example of a self-updating search on all the bills in the 105th Congress relating to health care fraud.


Each time a bill is added that matches the search terms, you will get it by hitting this link. You can modify the search terms of this link by changing the words in the parenthesis at the end. A search on a single word uses no parenthesis, multiple words are separated by plus signs. Remember that the 105th Congress ends this year and it would be necessary to change “105” to “106” in early 1999.

You can also use Thomas to create links to the Congressional Record in similar fashion. Here is an example of the HTML to get references to fraud and abuse in the 105th Congress.

http: //thomas.loc.gov/cgi-bin/query/r?r105:@phrase(fraud+and+abuse)

You have to experiment with the search terms a little to get what you want, but otherwise it couldn’t get much easier to keep an eye on those folks on the Hill. Thomas also provides specific examples of links to the full text individual bills, all bills from a certain author and other automagic searches at the URL listed above.

B. Deep Linking GAO Reports, the Federal Register and Other GPO Documents

It’s just amazing how many topics the General Accounting Office knows about. You may wish to subscribe to their Daybook mailing list ([email protected]; type “subscribe daybook ” in the body of the message) and then paste documents that relate to your area of expertise into a word processing documents or a notepad. Once you have a significant number stored up, link to them using the following format.

http://frwebgate.access.gpo.gov/cgi-bin/getdoc.cgi?dbname=gao&docid =f:gg96101.txt

The only thing that changes from report to report is the information “gg96101”, which refers to the report number (in this case, GGD96-101 refers to the GGD section, 1996 report, No. 101). Use only the first two letters of the section code and front fill the report number with zeros if less than 100. Watch out for testimony, which requires a “t” after the report number in the URL, and fact sheets, which require an “f.” Complete instructions are available from GAO, along with specialized instructions for obtaining GAO Report DOCIDs.

http://www.access.gpo.gov/cgi-bin/getdoc.cgi?dbname=1998_register& docid=fr14ap98-6

The complete instructions above also contain protocols for creating links to the Federal Register and other GPO documents listed in its Database List. Most, like the example to the Federal Register above, follow the same URL format replacing the “dbname=” with the appropriate database and the “docid=” with the appropriate document identification number. When creating links to old Federal Register documents make sure to change to the proper year as well. Federal Register documents are available back to 1994. Links to documents in portable document format, though discouraged due to the format’s proprietary nature, can often be creating by replacing the “.txt” tag with “.pdf”.

C. Deep Linking Purgatory — The Code of Federal Regulations

Attempting to link to the CFRs is not recommended for people with short attention spans, small children or tendencies to lash out at inanimate objects. The CFRs are every bit as frustrating online as they are in print and perhaps the poor organization of the print product explains some of the shortcomings of GPO’s electronic offering. Nevertheless, if you must try linking to the CFR, here’s an example.

http://frwebgate.access.gpo.gov/cgi-bin/get-cfr.cgi?TITLE=42&am p;PART=455&SECTION=1&TYPE=TEXT

Well, I didn’t say it would be pretty. Better than the old House of Representative’s version, above? Marginally. Speaking of the House server’s CFR, Cornell again came to the rescue by providing a middleware tool that allowed “construction of links using a straightforward and ‘head-compatible’ URL scheme.” Since the version of the CFR Cornell’s tool links to is no longer in existence, it is unclear whether the LII will continue to support this service. In this author’s recent experience, it is unfortunately still needed.

As you can see in the example, CFRs are organized by Title, Part and Section. GPO does not currently allow links to anything but the Section level. One problem is that some of the sections reside only in Subparts on the server, and there is no obvious way to create a link to a Subpart (though I suspect it is possible). Another problem involves Sections with parenthesis and/or dashes, like 26 CFR Sec. 1.501(a)-1, Exemption from taxation, to which I have never been able to link successfully.

GPO has a special page with instructions on linking to the CFRs, but the shortcomings of the database make linking tricky at best. And the links are not particularly reliable, either. I have tried a properly coded link, received an error message and tried the link again with success the same minute. You just never really know with the CFRs.

III. Conclusions

Government documents must be freely available if citizens are to be held accountable to the law. Thinking of the URL as a citation system, rather than a line of “executable computer code” will assist Webmasters in creating sensible, stable links to documents. Input from legal professionals should be sought before building a URL system to ensure that the system is logical. The Cornell URLs provide fine examples of artful URL construction. For the most part, the GPO URL system of databases and document identification numbers is also rationally related to the citations of the legal documents. Regardless of the form of the URL the main point here is that URLs to government documents must stop changing to provide reliable and citable authority that is the foundation of the U.S. legal system.

Although the U.S. Government may eventually obtain copyright protection for some of its electronic works (see 17 U.S.C. Sec.105; Rybaczuk, Selling Government Information: A Comparative Perspective on UK and USA Developments), I see no problem with deep linking government documents or even framing them in a site. It conserves disk space, provides usually reliable and citable links into the primary sources of the law, and in some cases, takes the legal Webmaster out of the business of updating the law. The liability risk may indeed be greater for Webmasters who place downloaded versions of laws on their servers without meticulously checking and updating them.

There is a danger of government Webmasters invoking Java, framing or other schemes that automatically forward all links to the entity’s front page (a growing practice in non-government sites to prevent deep links). If that happens, electronic documents such as this one will be unable to provide hyperlinked citations directly to legal authorities and readers will be left with links to the stunning graphics and marginal search engines on government home pages. I can think of no current law that would preclude government sites from implementing such schemes. But see, The Electronic Freedom of Information Act Amendments of 1996, Public Law 104-231, 104th Congress, for an excellent place to put such language.

Will the day come when the correct citation to U.S. legal authorities is a free and direct link into the databases of the codes, cases and regulations? Perhaps it is not far off, as documents coming directly from the sources of laws should soon provide even higher reliability than those regurgitated from giant legal publishers. For now, it is probably sufficient to follow some uniform method of citation in the text and rely on the URL to identify the particular government databases being accessed. (See e.g., Official Citation Resolution of the American Bar Association, suggesting a paragraph numbering system for pinpoint citations in primary source electronic legal documents – where page numbers are irrelevant; also see Comments to Judicial Conference Regarding Citation Reform which provides hundreds of links to comments about the proposed citation standard.)

Linking to these documents has provided me with good sport over the last several years as I tried to keep up with the ever-changing URLs of the government Webmasters. Hopefully those URLs have stabilized and experts in other areas will feel confident creating metasites and other products using these techniques.

Posted in: Features, Web Management