The Latest on
Click here to subscribe to the weekly LLRXBuzz Email Update.
Searching a Site
When There's No Search Engine
Last week while I was out partying, eating unturkey and vegetarian stuffing, an e-mail came in asking about ways to search sites that don't have search engines. My friend Kathy Biehl has already addressed the question in her own column this week (read it at http://www.llrx.com/columns/webcritic5.htm. That'll teach me to take Thanksgiving off!) but I wanted to take a second look and offer a few thoughts of my own for searching a site when there's no site search engine. There are several steps you can try.
1) Make sure there's no search engine. Sounds dumb, but sometimes it's hidden away somewhere on the site. Look in the help section or the site map section. If you're really desperate, do a fake 404 -- sometimes well-designed 404 pages have pointers to the search engine to help the befuddled surfer who's lost her way. For example, say I'm at the Michigan home page, which gave Kathy so much trouble. It's at http://www.state.mi.us. To force a 404 error, I'll type in http://www.state.mi.us/peanutbuttercookies/. I use that because a) it's distinct, and b) it's 99% percent guaranteed not to actually pull up a page on a governmental site (and if it does, I want to move to your state.) In this case, it doesn't work. To see a case where it does work, check out http://www.researchbuzz.com/peanutbuttercookies/.
Of course -- and pardon me for the tangential rant -- there's absolutely no excuse for a site to not have a search engine. (If you've got over about a dozen pages on your site, or you change and archive pages often, you need a search engine.) If you're not technically adept or you don't have enough access to your server to install a search package. Consider a third-party engine like Atomz.com. Atomz.com will put a search engine on your site for free and won't even show ads. There are other search providers available as well -- some show ads, some charge.
Customizing your 404 message is a little more tricky, but it's well-worth it if you want to have the friendliest site possible. Plinko.net offers pointers to the Webmaster who wants to configure their 404 messages -- check 'em out at http://www.plinko.net/404/custom.asp.
2) Use a search engine to find a search engine. With both AltaVista and Google, you can restrict searches to a domain.
a) In Google, you use the site: special syntax to find pages on a particular domain. For example, if you wanted to find "trademark" on the state of Michigan site, your search on Google would look like this: trademark site:www.state.mi.us
b) In AltaVista, you use the host: special syntax. If you wanted to find "trademark" on the state of Michigan site, the search on AltaVista would look like this: +trademark +host:www.state.mi.us
Now, if you run both of those searches you'll find one of the two problems with this method. The obvious problem is that the search engines are not guaranteed to index ALL of a particular site! (In this example, AltaVista provides seven results while Google provides only five.) The second problem is that of content freshness. If you're using a search engine, it's almost guaranteed that the content through which you're searching is not current. It would be difficult to use this method to search for recent events.
Because of these limitations I suggest that you apply this method in two stages. In the first stage use it to try to find a search engine on the site. Do this by searching for "search engine," then altering the search to phrases like, "any words" and "all words." For example, if I did this search:
"search engine" site:www.state.nc.us
It would give me only four results, one of which would confirm that there is a search engine on the site. If that doesn't work (and don't sit there and try it for five hours; just take a few minutes and if it doesn't work move on to the next stage) then try your actual search term using these syntaxes, like the trademark search examples shown above. And use both search engines-- as you can see in the above example what's in one isn't necessarily in another one.
3) If that doesn't work, you're down to last ditch strategies. Sometimes sites aren't indexed by their names, they're indexed by IP numbers. And unless you want to go dig out the IP number, that can crimp your efforts using method #2 explained above. Go back to the site you're trying to search. Look for copyright disclaimers down at the bottom of the screen, something like "copyright 2000 state of Michigan" and use that in addition to your search term, like this:
+trademark +"copyright 2000 state of michigan"
Long copyright disclaimers work well for this, as do people's names. (Look for something like, "Contact Webmaster So AndSo at SoAndSo@SoAnd.So") Try to find something distinct enough that it won't pick up other pages. (You'll know when it hasn't worked by the kind of results you get!)
4) If all else fails, send a polite e-mail to the Webmaster letting them know a) you wanted to use their site, and b) you couldn't because of a lack of search engine.
Being a Webwrangler is a lonely business; sometimes it's easy to forget that you're building for an audience. Drop a note and let them know (nicely) how the site could work better for you. Point them toward Atomz.com if you want. Maybe if enough people do that, we'll see more easily-accessible site search engines and I won't have to write articles like this. Happy searching!