Features - Selecting Web Sites for "Beyond Google" Resource Discovery

Rita Vine is a professional librarian and co-founder of Workingfaster.com, which helps professionals break through the clutter of the Internet and access information that matters. She teaches web searching to clients across North America, and serves on the selection team of the Search Portfolio, an enterprise product of the 100 top starting points for searching the free web. A “lite” version of the Search Portfolio, with links to about 10% of its resources, is available at http://www.searchportfolio.com/searchlite.html.



Among the millions of web sites that have been created since the origin of the web over a decade ago, many aspire to help users find information. Search engines, meta-search tools, subject directories and a host of other specialized search tools and link lists can be useful in specific information-seeking situations.

As the Internet has grown and matured, so have the business models associated with delivering information to eyeballs. Organizations seek to build awareness of their web sites through cooperative linking, public announcements, and, for those with sufficient budgets, through more conventional print and web-based advertising models. These models have become increasingly sophisticated and, in the case of many corporate web search tools, increasingly deceptive. Easily recognizable banner ads, scorned by busy web surfers, have encouraged web advertisers to develop and deploy less obvious sources of revenue for search sites. Pay-for-placement, pay-for-inclusion, pay-for-spidering and other methods of inserting paid content into search results pervade many of the most popular web search tools, and most users can’t differentiate paid content from “true” search results.

Because the web has grown so quickly, even serious web searchers can’t keep up with the changes to favorite search tools. Tracking, evaluating selecting and monitoring changes in search tools is time-consuming and requires a thorough understanding of web-based information delivery models in order to fully decipher and test the claims made by search tools. As a result, information searchers tend to rely on Google as their default tool. Users frame their search to enable Google to identify pages with keywords that they think will appear in useful pages.

I spend much of my time identifying, evaluating and selecting resources for a web site selection service that provides an unbiased, peer-reviewed selection of the 100 top general starting points for searching the free web. These starter sites enable serious searchers to “go beyond Google” by using high quality subject catalogues, directories, and specialized search tools. These tools provide links to many high quality resources that, for many reasons, Google can’t be pressed into finding.

What makes a great web starting point?

A good starter should be reasonably easy to understand and use. It should cover enough topics/sources to be used repeatedly for many different types of searches. It should cover many subjects, or, if it is a meta-search tool for a type of resource (for example, a meta-list of phone books), it should be the most comprehensive tool in its class. It should be browsable as well as searchable, and its taxonomy should be easy to understand and use by a typical adult user. It should benefit from some sort of expertise – either from librarians who select the resources or from informed experts who inject an element of selection (and deselection!) into their choices. No one wants a huge list of undifferentiated web sites: we can get that from a search engine!

Librarians Index to the Internet (http://www.lii.org) is a very useful all-purpose resource discovery tool. It has a well-designed browsable interface, with a search template that permits simple keyword searching of the site’s entries. A limited number of sites are carefully selected by a team of librarians and the selection presents ‘best-of-breed’ within each subject grouping. The advanced search option enables browsing by Library of Congress Subject Headings.

A good starter should link to predominantly free information. There should be evidence that someone took time to search for information-rich links, not just the most popular or well-known sites.

Some starter sites look promising but provide only limited amounts of free information. CEOExpress (http://www.ceoexpress.com), a popular portal of free links to news and business information, designed for the adult business user, has adopted this free/fee business model. The free information pages, comprised principally links to external sites, present information designed to encourage the information seeker to subscribe to the ‘plus’ version.

Although the free resources linked by CEOExpress are popular and well known, they aren’t necessarily the very best of the business web. The not-nearly-as-well-known BizLink.org (http://www.bizlink.org), a business starter site from the Public Library of Charlotte and Mecklenburg County, has excellent links with annotations to many useful business resources. Although it has fewer popular news links than CEOExpress, the quality of its business information resources is far richer.

Usability is important

Many potentially great resource discovery tools become unusable because of poor or clumsy navigation. Many otherwise good link lists present topical resources in one very long list of resources in alphabetical order by title. The presentation of long lists of sites in alphabetical order is particularly unsuited to web browsing because most users will not browse to the end of a long page. An alphabetical list is a practical option only if it is either a) short or b) modified so that the “best of” sources appear at the top of the page. Even a machine-generated script which forces a list into some sort of auto-generated algorithmically-ranked list can make a long list more usable by forcing the more “relevant” titles to the top.

I may reject otherwise solid starters if they are secondary sources of information, or if I can’t easily verify the source of the information sources supplied. For example, I reject third-party versions of Medline such as those featured in several commercial health-related sites, preferring instead a direct link to the National Library of Medicine’s own PubMed (http://pubmed.gov). But some secondary sources of information are harder to ignore. I was tempted by a free source of US business data sets in spreadsheet format from Economy.com’s Free Lunch (http://www.economy.com/freelunch/) but troubled by the fact that it was a secondary source to primary federal data. Because Free Lunch aggregated many disparate sources in one location, I felt that the usability features of the site outweighed my reluctance to accept secondary sources.

I don’t automatically reject sites that require registration in order to view free information. If the registration is simple (takes under 3 minutes, one time only) and the source is otherwise a best-of-breed, it may be included. I need to be reasonably confidant that the information provider is reputable, that a public privacy policy is in place on the site, and that the information provider will honor it. Even with those safeguards, I still recommend to users who register with free sites that they reserve a Hotmail or Yahoo! mail account exclusively for registration at these sites, so that any subsequent spam is deflected to these email accounts.

Identifying great web sites: a laborious task

I read almost 50 online newsletters, journals and weblogs (“blogs”) that report new and interesting web tools. Be warned: most sites that look interesting prove uninspiring on closer examination, and I estimate that our team ultimately rejects over 90% of suspects. Many newly announced resources may be so underpopulated with links that they aren’t quite “ready for prime time.” However, these starters may hold promise for the future, so I retain many of them in a file which is reviewed every few months to see if there has been any significant enhancement.

Gary Prices’ ResourceShelf (http://www.resourceshelf.com) is particularly good at getting word out of many new resources. Other excellent tools include the “What’s New” list of Librarians Index to the Internet (available by email subscription at http://lii.org/search/file/mailinglist or online at http://lii.org/search/ntw); the New and Notable Web Sites section in each issue of the Internet Resources Newsletter (http://www.hw.ac.uk/libWWW/irn/irn.html); Genie Tyburski’s TVC Alert (http://www.virtualchase.com/tvcalert/) for legal sources; Bob Berkman of the Information Advisor (http://www.findsvp.com/insights/ia/ia.cfm) catches many useful business sites. I also subscribe to the Informed Librarian Online (http://www.infosourcespub.com/ilofreesubscribe.cfm), a handy free monthly alert with dozens of links to online library journals and weblogs.

When good sites go bad

It’s always hard to watch formerly good sites fall into disarray or disappear completely, but such disappointments occur regularly in the web world. Excite.com, which started as a search engine some years ago, has become a commercial portal with a meta-search interface that leads to results from a high proportion of pay-for-placement tools. AssociationsCanada.com, established in the heady days of Internet entrepreneurism, had a good directory of Canadian associations until the founders abandoned the project and the domain name was redirected to a porn site. I still mourn the loss of DeskRef, a superb project of the Ramapo Catskill Public Library. DeskRef linked over 1,000 free quick-reference lookup tools using a taxonomy that was instantly understandable and usable – until it was removed from the web without notice in March 2003. Nothing comparable to DeskRef exists on the free web (although you can still see old snapshots of DeskRef through the Wayback Machine at http://web.archive.org/web/*/http://www.rcls.org/deskref), and our team reluctantly selected some useful – though not nearly as elegant – replacement sources.

Profit is not a four-letter word

Approximately 50% of our best-of-the-web sites are public sites created and supported by a for-profit company. The other 50% consists of sites funded and supported mainly by libraries and government agencies. Our split selection suggests that there is nothing intrinsically wrong with a commercial or sponsored site. Take Kapitol/Infobel’s Teldir.com (http://www.infobel.com/teldir), a portal site to over 400 free phone books from around the world. Kapitol is a Belgian provider of wireless, Internet and CD-ROM based telephone and professional directory solutions. The free information site is used to brand the company to the user for future reference and to draw users to the company’s products and services.

Even some flagrantly commercial starters can be good for some searches. About.com (http://www.about.com) is chock-a-block with pop-ups, pop-unders, banners and sponsored links, but it still points to some excellent information links for entrepreneurs, small business, travel and hobbies.

Of all the resources selected for our service, only About.com (http://www.about.com) Yahoo! (http://www.yahoo.com) and Google earn a significant portion of their revenue from web-based advertising. Perhaps this provides an unintentional a rule-of-thumb for seekers of quality information resources: avoid sites populated heavily by advertising.

There is no one single all-purpose starter, and there never will be

Serious searchers need to understand -- and accept -- that there is not now and there will never be JUST ONE starter that will be suitable for finding everything anywhere. Popping a few keywords into Google is easy: research is hard. Serious searchers need to have a “search toolbox” – a list of starter sites that they can return to over and over again when they don’t already know the best starting points for their information search. Methodical seeking of quality information sources from excellent starter sites helps searchers feel confident that they have fully explored the web, gone beyond Google, and – perhaps most importantly – know when to stop and move on to other fee and print-based information tools.