logo

Features - Clustering With Search Engines, Part 2

By Tara Calishain, Published on June 16, 2002

 Tara Calishain has authored or co-authored several books on using the Internet, including The Lawyer's Guide to Internet Research. She is the editor of ResearchBuzz, a free weekly newsletter on Internet search offerings and search engine news.  Tara is also the author of LLRX Buzz, a weekly column on new web sites and services focused on the legal community.

Editor's Note (SP): For Part 1 of this article, please use this link: http://www.llrx.com/features/clusteringsearch.htm


Introduction

In part one of this article we took a look at general search engines that offer clustering features. In this episode we're going to look at one more general search engine that is still offering clustering -- AltaVista -- but not yet offering it to the general public. Then we'll take a look at a few meta-search engines that cluster, and a specialty search engine that clusters.

AltaVista -- http://www.altavista.com

You may remember that several weeks ago AltaVista was testing their clustering technology with a small percentage of their users. They're still testing it, but I was able to take a second look at it.

AltaVista's paraphrase looks a little like AlltheWeb's recommended terms results; once you run a search, AltaVista's recommendations for narrowing down search results show up at the top of the page. A search result for "neurosurgery" shows about a dozen results, including brain, functional results, and Johns Hopkins. Clicking on one of the results to narrow down the search leads to another collection of recommended narrowing terms (Clicking on Johns Hopkins leads to suggestions that include pediatric neurosurgery, Johns Hopkins Hospital, and Johns Hopkins University) and so on.

As I mentioned, this is not yet publicly available, but I like the suggestions it makes. If you use AltaVista keep an eye out for it.

In addition to AltaVista and many other general search engines, there are some meta-search engines that cluster their results. Vivisimo is probably the most famous, but there are other ones available too.

Vivisimo -- http://www.vivisimo.com

Vivisimo has a very simple front page, but the search results are organized in groups. A search for neurosurgery provides 163 results. On the left side of the screen are the groups of results, which in this case include Neurosurgeons, Programs, and Nervous System. Click on the + beside the search results to get narrower and narrower search results, until you get to actual page listings. Click on the page title and get the page on the right side of the screen. This page design makes it really easy to explore several categories without "losing your place."

Don't forget to check out Vivisimo's advanced search, which allows you to specify the search engines you want to use and specify how many results you want (the more results you specify the more interesting the categories get -- that's what my experimenting showed, anyway). You can also specify in what language your search results should be and how you want your pages to display (in a frame, in a window, or in a new window). There's even a filter for removing offensive content (though that does limit the number of search engines available.)

While Vivisimo is fairly well known, Query Server is more just a demo site. But it's a demo site worth looking at -- it offers clustering search for several different categories of Web search.

Query Server -- http://www.queryserver.com/
 
Query Server offers several different types of search on the left side of the front page. You'll see links to search there for Web, News, Health, Money, and Government. Each of these searches cluster results, and they all have pretty much the same interface. But they each delve into different resources.

Search results are presented in a frame on the right side of the site. The top of the frame has a query box. Below that is a listing of the search engines queried. Below that is a listing of the groups that search results were clustered into, while below that are the results themselves. Results are divided by cluster and assigned scores based on how relevant they are. A search for "neurosurgery" provided several different clusters, including Cyber Museum of
Neurosurgery, UCLA Neurosurgery, and Harvard Medical School.

The other search engines provide results in much the same way, but I encourage you to check out each engine, and especially the small customize link on the lower right of each query box. The customize lets you specify the engines used, specify whether or not you want to search for ALL or ANY term given, how many results you want total, and how long you want to Query Server to search.

Surfwax -- http://www.surfwax.com

Before you start playing with Surfwax, I have to tell you something: I have never been able to get Surfwax to work except with Internet Explorer.

Surfwax is a service that offers both subscription-based and free services. The subscription-based service gives you access to more search engines and more features, but there is some searching that you can do for free.

After you've done a search, you'll see a "focus" link in the upper-left corner. Click on the little box beside the word. You'll get "focus words" that you can add to the search. Focus words are divided into narrower or broader, and the big difference between this list and others you've seen is that this
list contains generic words, and not links to specific people or places like Johns Hopkins or Harvard Medical School. This makes for a different set of search results than the other ones I've mentioned in this article.

Surfwax has been around for a while, but it's not been around nearly as long as the old reliable Northern Light. And while Northern Light no longer offers Web search, it still uses its clustering technology for news search.

Northern Light News Search -- http://www.northernlight.com/news.html

I'm not able to use neurosurgery for this example since a search has to have a certain number of results in order to be classified into folders.

"George Bush" works well for a search, though. Search results are divided into several different folders, including stock markets, macroeconomics, terrorism, and Pakistan. Pick a folder and you'll get the results that appear in the folders. Unfortunately the folder listing does not provide information about what's in a particular folder, but there are subfolders provided if the topic is broad enough. It also appears that the search results are listed by order of date; handy if you're looking for recent stuff.

You can't always come up with a search query that's specific enough that you'll only find a few search results. In that case, using clustering search engines can break out several hundred results into manageable packages, or provide you suggestions that reduce the ocean of information to a reasonable level. Enjoy!