Features - Search Engines Comparison 2001By Diana Botluk, Published on August 1, 2001
Diana Botluk is a lawyering skills instructor at the Catholic University of America School of Law in Washington, D.C., and is the author of the The Legal List: Research on the Internet. She teaches legal research at CAPCON, Catholic University Law School, and the University of Maryland. Take a class with Diana! Here's how...
At first glance, using a general search engine to locate information on the web seems easy. But getting a search engine to work with precision is another story. General search engines come packed with features that are often underutilized, but can be helpful in increasing search precision. The features differ from engine to engine, and skilled researchers will adjust their search strategy to take advantage of these differences depending on the type of results sought. This article will explain the differences in some of the available features, then examine a few major search engines in light of these features.
When you type two words into a search engine box without any connectors, how does the engine put them together? Will it find only those pages where both words appear, or will it find pages where either word appears? Search engines with an inclusive default treat two separately typed words as if there were an AND between the words, while search engines with an alternative default treat the same two words as if there were an OR between the words. Thus, the results for the same search typed into two different search engines can be enormously different because one is inclusive, and the other alternative.
Inclusive Default Search Engines
Alternative Default Search Engines
Many search engines allow a researcher to designate alternative or inclusive through the use of the connectors OR and AND. Inclusion can also be designated using a plus sign as a word modifier:
apple OR blueberry
apple AND blueberry
Some search engines use automatic concept searching as a default. Many advanced online researchers are accustomed to keyword searching, where the exact string of characters typed in is searched. Thus, an advanced researcher who unwittingly uses a search engine with a concept searching default can become frustrated. Concept searching occurs when the engine not only searches for the exact character string, but also for word forms, and even synonyms and other words that statistically appear with the typed word.
Keyword Search Default Search Engines AltaVista HotBot Lycos Concept Search Default Search Engines AltaVista (for some searches) Excite
Most search engines allow exclusion of search results that contain certain terms. Many engines recognize this feature by placing a minus sign or the word NOT in front of the term to be excluded. This feature should be used sparingly to avoid eliminating relevant results that might have a casual mention of the excluded term. Note that a minus sign modifies a single word, while NOT is a connector between words:
pie NOT apple
When using keyword, or exact match, searching, it can be helpful to command the search engine to locate pages where there are various forms of the word being sought. Typing the root of a word and adding a truncation symbol on the end can accomplish this. Most search engines recognize an asterisk as a truncation symbol. For example, if I wanted to find pages with various forms of the word independence, I would type independen* and the results would include pages that contain independence, independent, and independently.
Search restrictors in web search engines are similar to search fields in Westlaw. They allow a search for terms or values contained only in certain portions of a page, rather than anywhere in the entire page. A simple example is a search restricted to a type of domain, like .com or .edu. If a domain restriction is used, the search engine seeks results only where the url matches the designated domain type. Search restrictions are accomplished in different ways on different search engines, usually showing up in an engine's advanced searching option. Serious researchers have long applauded HotBot's search form, which makes restricted searching easy.
Title restrictions are often available. Use these with caution, perhaps as a first step to see what pops up. A title restriction reflects the title of the web page, designated by the web author. It may not necessarily correspond to the title of the document appearing on the page. For example, I might be looking for a copy of the Declaration of Independence. That document may appear on a web page entitled Historic Documents by the web author. If I restrict my search for "declaration of independence" to the title portion of pages, I will miss this page because it is actually called Historic Documents.
Searches can often be restricted by date. Additionally, dates often appear on the list of search results. However, like page titles, page dates can be somewhat misleading. The dates that are searched or reflected in results lists are the dates of the web page, and not necessarily the date of the document on the page. A search with a date restriction of July 4, 1776, will yield no results since no web pages were created or changed on that date. Thus, if I am searching for the Declaration of Independence, it won't help me to try and place a date restriction in my search query. However, date restrictions can be useful to locate newly created or recently updated web pages, weeding out older results.
Most search engines recognize quotation marks around two or more terms as the designation of a phrase. Additionally, this can sometimes be accomplished by placing the Boolean connector ADJ between the terms. Thus, "apple pie" or apple ADJ pie will search for the phrase apple pie, and not search the two terms separately.
Many search engines support the use of parentheses to nest various parts of a search query. For example, a search for apple or blueberry pie can be accomplished by nesting:
(apple or blueberry) ADJ pie
It can also be accomplished by searching two alternative phrases:
"apple pie" OR "blueberry pie"
It is often useful to perform a multi-level search, first casting a wide net, then narrowing by searching only within that set of results. This feature is offered by AltaVista, Google, HotBot and Lycos.
When comparing search engines, search language is only half the story. Search results are also important. Search engines use various mathematical formulas to match terms from the search query to web pages containing those terms. These formulas take various factors into consideration to present lists of results often ranked by relevancy, at least, relevancy according to the formulas used. Some of the factors that go into the determination of relevancy are how closely together the terms appear, how many times they appear on the page, how close to the top of the page they are, and how unique they are.
Beyond pure relevancy rankings, however, many options are available to achieve a variety of results. Search engines present results quite differently, often without clearly explaining how the results are calculated or displayed. A serious researcher will seek to understand these differences and use them to her advantage.
Several years ago, before sophisticated portal sites were developed, there were two major ways to search for information on the web: directories and search engines. A directory is a collection of links to web sites which is classified into subject categories and subcategories.
As directories and search engines developed into overall portals, directories incorporated search engines and search engines incorporated directories. Portals have attempted to make these two entities appear seamless; however, they are two distinct finding tools. Understanding this concept allows the researcher to take more control over her searching.
Consider, for example, the classic directory, Yahoo! In a search for the Declaration of Independence, I can click through subject categories to locate it, or I can type "declaration of independence" in the search box. When searched, Yahoo! first searches its classified directory for subcategories entitled Declaration of Independence. If none are present, it then searches the directory for listed web sites entitled Declaration of Independence. If there are none, Yahoo! then uses search engine Google to search for web sites which contain the phrase Declaration of Independence. Yahoo! presents the first set of results it can, even if that happens to be the third step, web page results from Google. I do not have to prompt Yahoo! to move through to the next step if the first step found nothing; it happens automatically. This is why different searches on Yahoo! may produce results pages that look quite distinct.
Besides Yahoo!, there are two other major subject directories that have linked themselves with major search engines. The Open Directory Project provides directory results to Google, HotBot and Lycos, while LookSmart provides directory results to AltaVista and Excite.
Most Popular Results
As researchers began to realize that mathematical relevancy ranking didn't always equal researchers' intuitive relevancy ranking, tools were developed to put a more human factor back into relevancy determinations. Search engines can now measure what the most popular sites are, given certain search terms, and list the popular sites as results options. This is the driving force behind Direct Hit, which is used at HotBot and Lycos. Google and AltaVista include popularity as a factor in their formulas to determine relevancy rankings.
Most search engines allow the look of the results page to be changed, especially with regard to the number of hits per page. Additionally, they may offer the option of listing only titles or sorting by date or site, rather than relevancy.
Some searches produce many individual page hits from the same overall web site, making it seem like the results all come from the same place. When a search engine uses results compression, or clustering, it shows only one page per web site, while offering an option to view the other results from that site. This feature can be found at AltaVista, Excite, Google and HotBot.
Suggestions for further searching based on the initial search are provided by many search engines. These suggestions can be simple, such as synonyms or alternative search terms. They can be more sophisticated, such as suggestions for searching in different, specialized databases. Ask Jeeves is built entirely around suggested searches. If I type a question into Ask Jeeves' search box, it returns a list of suggested specialized databases that might contain the answer to that question.
For example, I asked Jeeves "Where can I find the Declaration of Independence?" Jeeves returned several suggested sources for the text of the Declaration of Independence, as well as historical background on it.
Suggested searches can also be found at AltaVista, Excite, HotBot and Lycos.
If I locate a web page that is highly relevant to my research issue, I might be interested in finding more pages that are very similar. Some search engines will perform a search for other similar pages at the click of a button. I simply choose a page from my results list and ask the engine to perform a second search to find similar pages. This feature can be found at Google (Similar Pages) and AltaVista (Related Pages).
A few years ago, AltaVista began offering a tool to translate a given results page from one language to another. The translations aren't the greatest, but they're better than nothing when confronted with results in an unfamiliar language. Google and Lycos also offer translation.
Default Searching: alternative in Basic Search; phrase in Advanced Search Default Searching: keyword, but other concepts are also automatically searched in some situations
Inclusion: + (plus sign) in Basic Search; AND in Basic or Advanced Search Alternative: OR in Basic or Advanced Search Exclusion: - (minus sign) in Basic Search; AND NOT in Basic or Advanced Search Phrases: "" (quotation marks); in Basic Search, two terms that usually appear as a phrase are treated as a phrase even without quotes Proximity: NEAR locates terms within ten words or each other Case Sensitivity: lower case is insensitive; Capitalization forces case sensitivity Truncation/Wildcard: * (asterisk) can be used in the middle of a word as well as at the end Nesting: parentheses Restrictors: host: url: link: domain: text: title: applet: object: anchor: image:
On Search Assistant Form: text, title, link, date, region, domain, host
Searching by Levels: yes Other Search Features: special searches for images, video, and MP3/audio
Automatic Directory Results: no (Help screens say yes, but I couldn't find one instance where they appeared as search results); a separate directory can be browsed from the main page. Popular: not a separate list, but popularity is built into AltaVista's relevancy formula Clustered: called site compression, is automatic in Basic Search and can be turned on in Advanced Search Suggestions: yes Similar: yes, called Related Pages Translated: yes Other Features: While Basic Search presents results ranked by relevancy, Advanced Search results will appear in random order unless the sort by box is used. Sort by allows users to place greater weight on certain terms.
Default Searching: alternative Default Searching: concept; use of Booleans forces keyword searching
Inclusion: + (plus sign) or AND Alternative: OR Exclusion: - (minus sign) or NOT Phrases: "" (quotation marks) Proximity: no Case Sensitivity: no Truncation/Wildcard: no Nesting: parentheses Restrictors: language and country/domain on Advanced Search form Searching by Levels: no Other Search Features: very popular search topics offer relevant quick results in the left margin
Automatic Directory Results: from LookSmart; results sites that appear in the directory will list category and subcategories on the results list; also, click on Web Directory from the results page Popular: no Clustered: yes, choose View by URL Suggestions: yes, use Zoom In feature Similar: no Translated: no Other Features:
Default Searching: inclusive Default Searching: keyword
Inclusion: + (plus sign); choose all the words from the pull down box Alternative: parentheses around alternative words; choose any of the words from the pull down box Exclusion: - (minus sign) Phrases: "" (quotation marks); choose the exact phrase from the pull down box Proximity: no Case Sensitivity: no Truncation/Wildcard: no Nesting: no Restrictors: on Advanced Search: language, text, title, url, domain Searching by Levels: no Other Search Features: easy refinement of original search from any results page
Automatic Directory Results: no Popular: no Clustered: no Suggestions: no Similar: no Translated: no Other Features: basic or advanced search form, depending on which was used, remains at the bottom of each search results page and recalls previous search
Default Searching: inclusive Default Searching: keyword
Inclusion: automatic; use + (plus sign) to include stopwords Alternative: OR Exclusion: - (minus sign) Phrases: "" (quotation marks) Proximity: no Case Sensitivity: no Truncation/Wildcard: no Nesting: no Restrictors: cache: link: related: info: spell: stocks: sites: allintitle: intitle: allinurl: inurl:; on Advanced Search form: language, title, url, domain, link Searching by Levels: yes Other Search Features: several specialty search engines, including one for government pages
Automatic Directory Results: from Open Directory, relevant subject categories and subcategories appear at the top of results Popular: not a separate list, but built into Google's formula Clustered: yes Suggestions: for individual terms from the search query; click on a term from the box on the results page to see definitions and search suggestions for that term. Similar: yes; can be chosen from the results list or accomplished directly from the Advanced Search form without performing an initial search Translated: yes; pages published in Italian, French, Spanish, German and Portuguese can be translated into English Other Features: offers the option of looking at the index's cached page (what was actually searched) rather than the live page on the Internet; results list shows highlighted search terms in context
Default Searching: inclusive Default Searching: keyword
Inclusion: automatic; AND with Boolean phrase option; all the words from the pull down menu; + (plus sign) Alternative: OR with Boolean phrase option; any of the words from the pull down menu Exclusion: NOT with Boolean phrase option; must not contain from the pull down menu; - (minus sign) Phrases: "" (quotation marks); exact phrase from the pull down menu Proximity: no Case Sensitivity: lower case not sensitive; Capitalization forces sensitivity Truncation/Wildcard: * (asterisk) matches 0 or more characters; ? (question mark) matches one character only; they can be placed anywhere in the term Nesting: yes, with Boolean phrase option Restrictors: date, language, domain, depth, feature; search form also allows searches for different types of files Searching by Levels: yes Other Search Features: search form makes advanced searching easy to use
Automatic Directory Results: yes, from Open Directory Popular: yes Clustered: yes Suggestions: yes Similar: yes Translated: no Other Features: will automatically run the same search in Lycos at the click of a button
Default Searching: inclusive Default Searching: keyword
Inclusion: + (plus sign); all words on Advanced Search form Alternative: any words on Advanced Search form Exclusion: - (minus sign) Phrases: "" (quotation marks); exact phrase on Advanced Search form Proximity: no Case Sensitivity: no Truncation/Wildcard: no Nesting: no Restrictors: title, url, host, domain, language on Advanced Search form Searching by Levels: yes Other Search Features: has special content based searches for multimedia, recipes and more
Automatic Directory Results: yes, from Open Directory Popular: yes Clustered: no Suggestions: yes Similar: no Translated: yes Other Features: will automatically run the same search in HotBot at the click of a button