Features - Update to Search Engines Compared

Diana Botluk is a reference librarian at the Judge Kathryn J. DuFour Law Library at the Catholic University of America in Washington, D.C., and is the author of the The Legal List: Research on the Internet.  She teaches legal research at CAPCON, Catholic University Law School, and the University of Maryland.  Take a class with Diana!  Here's how...


Click Here for a PDF format of this article

General Web search engines are getting smarter. Yes, I know, only living creatures get smarter. Computers don't think, they only do math. But in the case of search engines, this year they're doing it better. Many of the general Web search engines have added a new  feature that attempts to anticipate what it is the researcher wants, sort of the way librarians sometimes have to guess at what it is their patrons really want, instead of what they ask for. Have you seen the Web site Ask Jeeves (http://www.askjeeves.com). This exemplifies the concept. Researcher types in search request and AskJeeves replies with many possible questions which might actually represent what researcher is looking for. For example, I typed in campbell soup and asked Jeeves. Before Jeeves listed my Web search results he came back and told me he found answers to the following questions, and listed several questions like "what is the company Web site?" or "where can I find stock quotes?" Clicking on a questions retrieves its answer.

So what does that have to do with general Web search engines? It seems this is the hot new feature for the year 2000, and more of the general Web search engines are incorporating anticipated questions into their Web results. Of the six search engines I included on the features comparison, AltaVista and Excite suggested questions I might want answered. While Go.com did not suggest questions to be answered, it did suggest possible new searches I might want to try. And both Lycos and HotBot told me what other searches people did who did the same search as me, sort of like Amazon's "people who read this book also read...."

Although general Web search engines are getting easier to use, sometimes locating relevant information is still like looking for a needle in a haystack. General search engines are not always the proper tool to use when trying to locate specific information. It is often better to locate a database likely to contain the desired information and search for it within that database.* However, sometimes that isn't possible. And sometimes we want to be more general, casting a wider net to see what we come up with. At these times, we may choose to use a general search engine. There are so many general Web search engines that often we don't know which one to choose. The chart that follows will compare the features of six popular search engines. In this way, researchers can see at a glance which search engine offers which features, and be more informed as to their choices. Ultimately, a researcher armed with this knowledge will be able to pick the best tool for the particular job.

AltaVista

AltaVista provides a lot of search construction options for sophisticated searchers, and therefore has long enjoyed favor by information professionals. Besides traditional Boolean search options, AltaVista has many field restrictors, and also has an interesting forced phrase searching feature. In a basic AltaVista search, the search engine treats two terms together that are statistically most likely to be a phrase as a phrase, even without quotes. This changes in AltaVista's advanced search, where it depends on which search box you use. In the Boolean search box, all terms are treated as a phrase unless there is a connector between them. In the Sort by search box, tems are treated individually, and a phrase must be designated by quotes. Note that AltaVista not only supports truncation, but also internal wildcards when and asterisk (*) is used.

AltaVista translates its results into several different languages, which can come in handy when you run across a page in a language you don't understand. The translation is not precise, but it can be good enough to get the gist of what is on the page. AltaVista also has a special photo and media finder. The photo finder returns a results list complete with photo thumbnails. Like Google, AltaVista uses RealName Internet keywords to link to the home pages of companies or brand names. Additionally, on its results display AltaVista has a feature which prompts questions that might be in the minds of researchers, and provides links with the answers to these questions.

Excite

Excite provides a few more options than Google for creating a detailed search, but the beauty of Excite is its concept searching. Excite is a good place to cast a wide net over the Internet ocean. If you type a word in the search box, not only do you search for that word itself, but also forms of the word, synonyms of the word, and other words that are highly statistically related to that word. Thus, Excite is a good search engine to give yourself a little leeway, when you don't necessarily want the search terms to be as precise as they can be. However, if you use a Boolean operator in the search string, remember that it turns the search into an exact keyword search, eliminating any concept searching for word variations, synonyms, etc.

Go

When Infoseek and Disney joined forces, the portal Go.com was created. Thus the search engine at Go.com is the old Infosee search engine, and still employs many of the same features. Go.com supports Boolean searching and allows searching with many restrictors. Additionally, the advanced search provides a variety of specialized search engines for finding specific information. A popular option here is the Companies search, which results in company capsules and home page urls, with options for stock and press information. A Go.com search results page provides many options besides a list of relevant search results. It offers possible related searches as well as the option of allowing the search engine to create a new search automatically based on a specific page. It also provides the opportunity to search in levels, by performing one broad search first, then narrowing it down by searching again within the given results, rather than searching the Web all over again. Its tranlation option allows the translation of any page.

Google

This is the simplest search engine to use, but in some ways it is dangerously simple. Google provides very few options for searchers to construct a detailed search. In fact, just about the only options are a minus sign to exclude terms and parentheses to force terms to be searched as a phrase. BEWARE! Google has only inclusive searching with no option for alternative searching besides starting another search. What does that mean?

Let's say I wanted to search for pages that say either attorney or lawyer. With most search engines you would have an option of searching for either term at the same time. They would search attorney OR lawyer automatically, making alternative searching the default. Or you may have to physically insert the Boolean connector OR between the words to force the alternative search.

With Google, typing attorney lawyer in the search box will only search for pages that say both attorney and lawyer. Google does not support the Boolean connector OR. The only way to search for pages containing either attorney or lawyer is to perform two searches.

Google's results list provides a similar pages feature that allows the computer to construct automatically a new search for pages similar to a specified page - a good option if you find one good page and want more like it. It provides an option for highlighting search terms on the resulting page, something not often seen in Web search engines. Finally, Google makes use of RealNames or Internet keywords, which means results on the list with that mark will link directly to the company or brand name home page. This can be extremely useful. Suppose you want to search for The Walt Disney Company home page. Simply typing disney in any search engine's search box is likely to result in hundreds of thousands of hits, listing every page where the word Disney appears. This is not an efficient way to search for the company's home page, unless the search engine uses the RealName mark to distinguish that hit as the one that will deliver the home page. On Google, these results are listed first.

HotBot

HotBot's form interface still makes it a favorite among many researchers. It allows you to build a fairly sophisticated search without having to remember connectors or restrictors, since they all appear right there on the form and you can choose and click on what you want. The best way to use HotBot is to skip the first page and go directly to the Advanced Search page, which provides many more search options.

HotBot results employ direct hit technology, which provides a list of the top ten most popular links for any given search. The theory behind direct hit is that the sites that people go to most based on a given search are also likely to be the most relevant sites for that search. Another interesting item on the search results page is a list of possible related searches from which to choose to have the search engine automatically perform another search.

Lycos

Lycos also supports Boolean searching, and by far has the most extensive options for proximity searching as any search engine on the Web. Lycos, like Go.com, provides the option of searching by levels, where you can search within a previous set of results. It will also offer suggested searches following your initial search. Lycos' advanced search provides a variety of specialized search engines for locating specific information. An automated tracking feature at Lycos allows users to register and have their searches updated automatically. Lycos' results page offers a popular links region where the most popular links for certain searches will be distinguished from regular results.

**For permission to distribute this search engine comparison chart for educational purposes, please contact Diana Botluk.**


Comparison of Major Web Search Engines

  AltaVista
www.altavista.com
Excite
www.excite.com
Go
go.com
Google
www.google.com
HotBot
www.hotbot.com
Lycos 
www.lycos.com

Search Language

default searching alternative alternative alternative inclusive inclusive inclusive
inclusion of words simple search: plus sign (+) (simple search); AND or plus sign (+) (advanced search) AND or plus sign (+)

"must have" on the advanced search form

plus sign (+) or use must on the advanced form automatic choose "all the words" on the search form, the "must contain" option on the search form, or use AND or the plus sign (+) with the Boolean expression option AND, plus sign (+) or choose "all the words" on the advanced search form

alternative words

OR in an advanced search OR all words are searched alternatively unless otherwise specified not supported choose "any of the words" on the search form or use OR with the Boolean expression option OR or choose "any of the words" on the advanced search form

proximity

NEAR finds terms occurring within 10 words of each other in an advanced search         NEAR: terms within 25 words of each other

NEAR/#: terms within specified number of words of each other

ONEAR: same as NEAR, but with words in specified order

ONEAR/#: same as ONEAR, but with words in specified order

FAR: terms appear at least 25 words apart

FAR/#: terms appear at least specified number of words apart

OFAR: same as FAR, but with words in specified order

OFAR/#: same as OFAR, but with words in specified order

BEFORE: finds pages where the first term appears before the second

exclusion of words

minus sign (-) (simple search); NOT or minus sign (-) (advanced search) NOT or minus sign (-) minus sign (-) or should not on the advanced form minus sign (-) Choose "must not contain" on the search form; NOT or the minus sign (-) with the Boolean expression option NOT or minus sign (-)

phrases

quotation marks ("") or hyphens (-) between words

FORCED PHRASE SEARCHING:

basic search will assume 2 terms together that are statistically most likely a phrase will be searched as a phrase, even without quotes

advanced search: terms in the Boolean box are treated as a phrase unless there is a Boolean connector; terms in the sort by box are treated separately unless enclosed in quotes

quotation marks ("")

In Advanced Search, multiple words in a single keyword box are automatically treated as a phrase.

quotation marks ("") or phrase on the advanced form

proper names with initial capital letters are searched as a phrase even without quotes; commas between capitalized names force them to be searches separately

quotation marks ("") quotation marks (""), or choose "exact phrase" on the search form quotation marks ("")

ADJ: words must appear next to each other

ADJ/#: terms must appear exactly specified number of words apart

OADJ: same as ADJ, but words must be in specified order

OADJ/#: same as ADJ/#: but words must be in specified order

case sensitivity lower case search searches are case insensitive; upper case searches force case sensitivity not case sensitive lower case searches are case insensitive; capitals forces case sensitivity not case sensitive lower case searches are case insensitive; mixed upper/lower case searches are case sensitive  
truncation/wildcard asterisk (*) for truncation or internal wildcard – must appear after at least 3 characters and can replace 0 to 5 characters not necessary with concept searching     truncate words with an asterisk (*) or choose "enable word stemming" on the form

asterisk (*) is a wildcard that can replace any number of characters; question mark (?) is a wildcard that can replace only one character; either can be used anywhere in a word.

 
nesting parentheses ( ) parentheses ( )     parentheses ( ) with the Boolean expression option  

Search Restrictors

date form based date restriction available with advanced search       restrict by date on the search form  
language yes yes   yes   yes
location   yes (advanced search) yes (advanced search)   restrict by location on the search form  
media type Special media finder. Image finder returns thumbnails in results.       restrict by media type on the search form; or by using feature:  
title (searches specific word or phrase in a page's title field) title:   title:   title: use the form option
url (searches specific word or phrase in a page's url) url:   url:     use the form option

other search restrictors

anchor: specified word in the text of a hyperlink

applet: specified Java applet

domain: pages within the specified domain

host: pages on a specified computer

image: pages with images having a specified file name

like: finds pages similar to or related to a specified url

link: pages with a link to another specified url

text: specified text in any part of the page other than an image tag, link or url

Example: to search for web pages with llrx in the url, add url:llrx to the search statement

domain type (advanced search) link:

site:

some restrictions can be accomplished automatically through the advanced form

link: domain: pages within the specified domain

depth: designates exactly how many subdirectories

should appear in the url (also available as a check off box on the search form)

 

Other Search Features

AltaVista translates into several different languages

Special photo and media finder

related searches can be performed by clicking on related pages on search results page

basic search ranks by relevancy; advanced search users MUST use the sort by box to control ranking preferences

punctuation other than mentioned in search language and restrictors is read as a word separator, or blank space

default searching is inclusive/alternative using concept searching, now called intelligent search. Concept searching will search for other word forms, synonyms, and related terms in addition to the keyword typed into the search statement

use of Boolean operators (AND, OR, NOT) forces keyword (exact term) searching rather than concept searching

allows searching within previous set of results by using the pipe symbol (|), or clicking the search within results box

allows a new search to find pages similar to a chosen result

offers possible related searches by clicking on a generated list of similar searches

plus sign (+) forces stop words to be searched sophisticated form based interface makes advanced searching easy and eliminates the need to investigate appropriate search language

direct hit technology displays top ten most popular pages from given results at the top of the results display (not available with all searches)

a list of possible related searches from which to choose appears on the results page

allows a second level of searching within a given set of results

searching for the person on the search form searches automatically for address and e-mail information

allows searching within a previous set of reults

offers suggested searches after an initial search

Results Display

display reveals short summary, url, file size, page date and language

word count reveals the number of times each search term appears

translation option allows translation of any page

site compression in basic search shows only one page per web site; site compression is not automatic in advanced search, but can be turned on by a check box

image finder results are thumbnail images

results are limited to 200, but advanced search users can change the url to go beyond 200

ASK ALTA VISTA: in basic search, when a common English sentence, phrase or question is entered, AV prompts that it knows the answer to the question. users choose from a lits of questions and AV finds the answer in a special database

AV may also prompt a better way of entering a query if many others have used that query before

a list of RealName (RN) internet keywords can be clicked to

display reveals url, relevancy score, and summary

results show hits in directory first, then web, then news

user can sort results by site or relevancy

similar search feature allows a related search

Quick Results" provides fast answers to the most popular questions.

display reveals relevancy score, url, date, summary and file size

first shows matching directory topics, then web pages

translation option allows translation of any page

users can choose to hide summaries

users can sort by date

results clustering prevents all top hits from being from same site; results clustering can be turned off with ungroup results

display reveals url, summary and file size

highlights search terms on results pages

similar pages feature allows a related search

users can set number of results per page

RealNames (RN) mark designates the result will link directly to company or brand name home page

full descriptions include summary, url, file size and page date

search form allows users to set results to full descriptions, brief descriptions, or urls only

results clustering prevents all top hits from being from same site

search results also offer results from the Hotbot directory

Hotbot displays the top ten most popular sites first; if you change the default search settings, the top ten results may appear as a link rather than being immediately listed

a refine option on the search page suggests new or alternative terms to add to the search

results include summary & url

results are organized into web sites, news articles, shopping, and the most popular links for given search terms

results also match directory categories

suggests refine terms

Subject Directory

yes yes yes yes yes yes

Other Special Features

when result is from a web site of an organization in AV’s company/organization database, AV allows users to jump to a company fact sheet provides a variety of specialized search engines for specific information advanced search provides a variety of specialized search engines for specific information   results page offers a variety of options to locate information in alternative ways automatic tracking feature lets users register and have their searches updated automatically

advanced search provides a variety of specialized search engines for specific information

 

***************

Footnote

* See Strategies for Online Legal Research: Determining the Best Way to Get What You Need, LLRX, April 3, 2000; Exposing the Invisible Web, LLRX, October 1, 1999.