Features – Clustering With Search Engines

Tara Calishain has authored or co-authored several books on using the Internet, including The Lawyer’s Guide to Internet Research. She is the editor of ResearchBuzz, a free weekly newsletter on Internet search offerings and search engine news. Tara is also the author of LLRX Buzz, a weekly column on new web sites and services focused on the legal community.

Editor’s Note (SP): For Part 2 of this article, please use this link: //www.llrx.com/features/clusteringsearch2.htm


Search engines still aren’t as smart as we’d like ’em to be. Sure, Google’s great, and Yahoo comes in real handy sometimes, but sometimes your search terms just aren’t finding what you’re looking for.

Enter clustering. With clustering search engines gather results into groups around a certain theme, or in some cases just provide you with related keywords that perhaps you wouldn’t have thought of yourself, helping you zero in on your goal. The Internet Archive (IA) is a virtual time machine. A non-profit company, the IA is working to “prevent the Internet – a new medium with major historical significance – and other “born-digital” materials from disappearing into the past.” To date, the archive’s collection consists of 10 billion web pages, 16 million Usenet postings, 360 archival movies, and 5,000 pages from Arpanet (from the U.S. Department of Defense). Not only is IA a wonderful way to preserve the Internet, but is most helpful in answering reference questions and has been my assistant (or should that be the other way around) in many a legal research project.

In part I of Clustering With Search Engines, we’ll look at regular search engines that cluster — and boy, are there are a lot of ’em! In Part II of this article, we’ll look at meta-search engines that cluster as well as specialty clustering search engines and a search engine that is still offering clustering on a limited basis.

We’ll start with one you might not have heard of yet: Google Labs’ clustering agent, Google Sets.

Google Sets – http://labs.google.com/sets

Google Sets doesn’t provide search results. Instead, it helps you find similar terms to the ones you’ve already entered, letting you create more complex queries in one area.

Enter a couple of words – Tamoxifen and Arimidex work; they’re drugs used to treat breast cancer. You’ll get a small set of results, but it’ll include items you might not have heard of. Be sure to click on them to get Google search results to see how they’re related to your original search terms.

Let’s do a more general example — say dog breeds. Enter collie, chihuahua, and german shepherd in the set boxes. You’ll get back an enormous list of dog breeds. You don’t want to use all of these, of course, but it’ll give you an idea of how to narrow your search.

Use Google sets to build queries when you’re looking for similar items or brainstorm on how to put a search together. The other search engines in this article cluster in a more traditional way; we’ll start with Wisenut.

Wisenut — http://www.wisenut.com

Wisenut is a full-text search engine that was recently bought by LookSmart. Enter a search in it — we’ll use “neurosurgery” as the primary example for the rest of the article — and you’ll see that the search results include a black area at the top of the page which has related topics (neurosurgery university, pediatric neurosurgery, etc.) and a number of results. WiseNut calls this the WiseGuide. Some results have a + beside them; click on the + for subtopics. The subtopics will show up in a gray area underneath the clustered results.

There’s also a [search this] link next to each of the clustered results, which runs another search with those keywords. Those keywords take you to a different set of clustered results in addition to Web page results, and so on and so on.

Teoma — http://www.teoma.com

Teoma was recently purchased by Ask Jeeves, and has gotten a lot of press as a potential “Google Killer.” While I don’t think I’d go that far, it does have interesting clustering technology.

Run the neurosurgery search and you’ll get four sets of results. Top left are sponsored results. Bottom left are Web site (non-sponsored) results. Top right are the suggestions for refining the result (that’s what we’ll focus on). and bottom right are the “Link Collections from Experts and Enthusiasts,” as Teoma calls them. If you’re just looking for general information then use the link collections. If you’re interested in narrowing your search, though, use the suggestions.

Just click on one and your search will be run again, with the suggested term you searched on included in the link. You’ll get a different set of site results, suggestions, and expert link collections, too.

Infonetware.com — http://www.infonetware.com/

This site isn’t a search engine per se but is rather a demonstration of Infonetware’s “RealTerm Technology.”

Enter a search term at the top of the page. The results page is framed. The area on the left provides you with topics related to your search term, while the frame on the right shows the Web page search results. The topics have a number in parens beside them that shows how many results are in that particular topic.

Click on a topic and the results for that topics will appear in the right frame. With some of the terms, you’ll see sub- topics that allow you to narrow your search results even more.

While Infonetware works with full-text searching, the Oingo engine uses the Open Directory Project and offers suggestions for searching.

Oingo — http://www.oingo.com/

Since Oingo uses the Open Directory Project as its search source, it’s already clustered in a way. (ODP is a searchable subject index like Yahoo.) When you do a search, the search results page will first give you a drop-down list of potential meanings for your search, if any. Beneath that is a list of categories which relate to your search (listed in order of relevance.) Finally, site results from the directory itself.

Unfortunately, the suggestions are limited; searching for neurosurgery provides very few suggestions. It’s only when you do a search for a more general term does Oingo’s usefulness come through. Searching for Rose, for example, provides several suggestions (plant life, pink wine, several
different American towns, etc.) and a manageable list of categories.

If you pick a suggested definition, Oingo will run a search again using the definition you specified. All the definitions I looked at for “rose” provided just category results, not results of individual sites. This is a good one to try if you’re searching for something that’s in a pretty broad category, like flowers, trees, animals, etc.

AlltheWeb — http://www.alltheweb.com

Now that the Northern Light Web search is no longer publicly available (supposedly), my favorite search engine that nobody remembers is AlltheWeb. AlltheWeb provides two ways to narrow search results. They’re both on the right side of the results screen.

The first way is FAST Topics, which apparently uses both ODP topics and dynamically generated topics. Click on a topic and you’ll get a list of Web sites related to that topic.

There’s also a “Narrow Your Search” option that lists search terms related to your search. Click on one of those and your search will be run again with the term you clicked. Not all search terms have both Topics and Narrow Your Search terms, but all the ones I looked at had either one or the other.

That’s it. Next week we’ll look at meta-clusterers, and a full-text search engine that’s still testing its clustering.

Posted in: Features, Search Engines, Search Strategies