Marcus P. Zillman, M.S., A.M.H.A., is Executive Director of the Virtual Private Library and Founder/Creator of BotSpot®. He is the author of nine different Internet MiniGuides, Internet Sources Manual and eCurrent Awareness Resources 2004 Report. His Subject Tracer™ Information Blogs are freely available from the Virtual Private Library, which include the latest resources on Deep Web Research and Bot Research. All of the current links that he has
created are available at http://www.LinksByMarcus.com/. His monthly free newsletter is titled AwarenessWatch™ and his monthly Internet column has been archived since 1996.
Searching the World Wide Web Using Your Brain and Bots is a keynote presentation that I have been delivering over the last year, and much of my information comes from the extensive research that I have completed over the years into the “invisible” or what I like to call the “deep” web. The Deep Web covers somewhere in the vicinity of 600 billion pages of information located through the world wide web in various files and formats that the current search engines on the Internet either cannot find or have difficulty accessing. The current search engines find about 3.3 billion pages at the time of this writing. [Editor's Note: After this article was submitted, Google "announced it expanded the breadth of its web index to more than 6 billion items...4.28 billion web pages." [Link]
In the last several years, some of the more comprehensive search engines have written algorithms to search the deeper portions of the world wide web by attempting to find files such as .pdf, .doc, .xls, and .ppt . These files are predominately used by businesses to communicate information within their organization, or to disseminate information to the world external from their organization. Searching for this information using deeper search techniques and the latest algorithms allows researchers to obtain a vast amount of corporate information that was previously unavailable or inaccessible. Research has also shown that even deeper information can be obtained from these files by searching and accessing the “properties” information on these files. This is interesting research that was written and posted on my blog a few months ago.
This guide is designed to provide you with resources to better understand the history of the deep web research, as well as various classified resources that allow you to search through the currently available web to find those key sources of information nuggets only found by understanding how to search the "deep web".
Articles, Papers, Audios and Videos (Current and Historical) Cross Database Articles Cross Database Search Services Cross Database Search Tools Resources Bot Research Resources and Sites
Articles, Papers, Audios and Videos (Current and Historical)
Annotation for the Deep Web
http://csdl.computer.org/comp/mags/ex/2003/05/x5042abs.htm
Current Awareness Discovery Tools on the Internet
http://zillman.blogspot.com/2003_10_01_zillman_archive.html#106648219377744380
Deep Web - Exploring the Secrets of the Hidden Internet by Marcus P. Zillman, M.S., A.M.H.A., - 23 minutes - Internet/Technology Channel
http://www.planetearthradio.com/
Easy Topic Maps
http://easytopicmaps.com/
Econtent: Invisible Web Catalog
http://www.findarticles.com/cf_0/m0BLB/4_22/55280144/p1/article.jhtml?term=invisible+web
Finding the Invisible Web
http://websearch.about.com/library/weekly/aa061903a.htm
Graph Structure in the Web
http://www.almaden.ibm.com/cs/k53/www9.final/
Guardian Unlimited: Search for the Invisible Web
http://www.guardian.co.uk/online/story/0,3605,547140,00.html
Invisible Web Gets Deeper
http://www.searchenginewatch.com/sereport/article.php/2162871
Invisible Web Revealed
http://www.searchenginewatch.com/sereport/article.php/2167321
JEP: The Deep Web
http://www.press.umich.edu/jep/07-01/bergman.html
Kent State University: Searching the Invisible Web
http://www.library.kent.edu/internet/invisible_web/
Library Journal: Braking Through the Invisible Web
http://libraryjournal.reviewsnews.com/index.asp?layout=article&articleid=CA266430&publication=libraryjournal
LLX: Book Review: The Invisible Web
http://www.llrx.com/features/invisibleweb.htm
LLX: Mining Deeper Into the Invisible Web
http://www.llrx.com/features/mining.htm
LLX: ResearchWire: Exposing the Invisible Web
http://www.llrx.com/columns/exposing.htm
Mining the Invisible Web
http://www.miningtheinvisibleweb.com/
Mining the Deep Web With Specialized Drills
http://www.nytimes.com/2001/01/25/technology/25SEAR.html?ex=1064894400&en=31212059088afb68&ei=5070
New Profusion Site Offers Better View of Invisible Web
http://www.searchenginewatch.com/sereport/article.php/2163591
Online or Invisible?
http://www.neci.nec.com/~lawrence/papers/online-nature01/
PhysicsWeb: The Physics of the Web
http://physicsweb.org/article/world/14/7/09
Publications about Web Analysis, Web Search, Citation Indexing, Digital Libraries, Machine Learning, Neural Networks [Steve Lawrence, NEC Research Institute]
http://www.neci.nec.com/~lawrence/papers.html
Researchers Map of the Web
http://www.almaden.ibm.com/almaden/webmap_press.html
Scientific American: Featured Article: The Semantic Web
http://www.sciam.com/article.cfm?articleID=00048144-10D2-1C70-84A9809EC588EF21&catID=2
Search Engine Hunts for Gold Beneath the Surface of the Web
http://www.nytimes.com/2001/02/08/technology/08GEE3.html?ex=1064894400&en=edb8d4dda8b88e9a&ei=5070
Searching the Deep Web
http://www.dlib.org/dlib/january01/warnick/01warnick.html
Searching the Deep Web - Video
http://www.osti.gov/media/DeepWebVideo.html
Seeing the Invisible Web
http://lib.berkeley.edu/TeachingLib/Guides/Internet/InvWebPowerpoint/index.htm
Seeing through the 'invisible' Web
http://www.usatoday.com/tech/2001/10/15/invisible-web-search.htm
Spidering Hacks
http://www.oreilly.com/catalog/spiderhks/Technology Review: A Smarter Web
http://www.technologyreview.com/articles/frauenfelder1101.asp
The Deep Web: Surfacing Hidden Value
http://www.brightplanet.com/technology/deepweb.asp
The Invisible Web by Chris Sherman
http://www.freepint.com/issues/080600.htm#feature
The Invisible Web for Educators
http://www3.dist214.k12.il.us/invisible/article/invisiblearticle.html
The Invisible Web: What it is, Why it exists, How to find it, and Its Inherent Ambiguity
http://www.lib.berkeley.edu/TeachingLib/Guides/Internet/InvisibleWeb.html
The Invisible Web: Where Search Engines Fear To Go
http://www.powerhomebiz.com/vol25/invisible.htm
Web Characterization Project
http://wcp.oclc.org/
What is the Invisible Web?
http://websearch.about.com/library/weekly/aa061203a.htm
The World Wide Web as a DeFacto Database: Using Technology to Find, Maintain and Update Current Company Information
http://www.intelliseek.com/whitepapers.aspUsing the Internet As a Dynamic Resource Tool for Knowledge Discovery
http://zillman.blogspot.com/2003_08_01_zillman_archive.html#106198657492603187
ZDNet: I've Discovered the 'invisible Web'--Have You? Here's How!
http://reviews-zdnet.com.com/4520-6033_16-4206148.htmlThe above represents both historical articles and some of the latest articles and papers discussing various views, opinions and research into the deep web. A combination of current and historical information allows you to gain a better understanding of how the dynamics of the deep web have developed over a period of time. Comparing historical perspectives with current day thinking allows one to appreciate the true “deep web” and how it is changing the way the web is searched and the many new ways to discover, retrieve and extract information from the deep web.
Searching various databases at the same time to retrieve information is another important facet of deep web research. The following represents a selected number of articles, search sources and search tools on cross database searching of the deep web:
Cross Database Articles
Digital Libraries - Cross-Database Search: One-Stop Shopping
http://libraryjournal.reviewsnews.com/index.asp?layout=articlePrint&articleID=CA170458&publication=libraryjournal
Search Tools Reports: Searching for Text Information in Databases
http://www.searchtools.com/info/database-search.html
Cross Database Search Services
FlashPoint
http://flashpoint.lanl.gov
GPO Access - Search Across Multiple Databases
http://www.gpoaccess.gov/multidb.html
Hermes
http://www.ibt.unam.mx/biblioteca/
King County Library System
http://www.kcls.org/
NLM Gateway Search
http://gateway.nlm.nih.gov/gw/Cmd
SearchLight
http://searchlight.cdlib.org/cgi-bin/searchlight
SUMSearch
http://sumsearch.uthscsa.edu/
Apple - Mac - Sherlock
http://www.asia.apple.com/sherlock/
askOnce
http://www.askonce.com/
Blue Angel Technologies
http://www.blueangeltech.com/
Bright Planet
http://brightplanet.com/
Copernic
http://www.copernic.com/en/index.html
ENCompass Solutions
http://encompass.endinfosys.com/
Intelliseek
http://www.intelliseek.com/
MetaLib
http://www.exlibris-usa.com/metalib/
MetaSearch Initiative
http://www.niso.org/committees/MetaSearch-info.html
MuseGlobal
http://www.museglobal.com/
Peter's PolySearch Engines
http://www2.hawaii.edu/~jacso/extra/poly-page.html
Profusion
http://beta.profusion.com/
Registry of Library Knowledge Bases
http://www.public.iastate.edu/~CYBERSTACKS/KBL.htm
VIAF: The Virtual International Authority File
http://www.oclc.org/research/projects/viaf/default.htm
WebFeat
http://www.webfeat.org/There are a few online tutorials and presentations on the Deep Web and the following fairly well represent a selective number of sites:
Presentations
Gumshoe Librarian
http://www.virtualchase.com/gumshoe/
Quick Introduction to OWL Web Ontology Language
http://www.xfront.com/owl-quick-intro/sld001.htm
Seeing the Invisible Web
http://lib.berkeley.edu/TeachingLib/Guides/Internet/InvWebPowerpoint/index.htm
To properly search the Deep Web, one must be armed with the appropriate resources and tools necessary to accomplish the mission to search, retrieve and extract information from areas on the web that have never been searched before or areas that search engines cannot actively extract from. Some of these current resources are:
Resources
AIRS Oxygen
http://www.airsdirectory.com/products/technologies/oxygen/
Beaucoup
http://www.beaucoup.com/
CiteLine Professional
http://www.citeline.com/pro_info.html
Comet Way
http://www.cometway.com/content.agent?page_name=Home
CompletePlanet
http://www.completeplanet.com/
Cybermtrics - First Generation Tools - Invisible Web
http://www.cindoc.csic.es/cybermetrics/search13.html
DART™
http://www.dynago.com/dart/
Deep Web
http://www.deepweb.com/
Deep Web Technologies
http://www.deepwebtech.com/
Direct Search
http://www.freepint.com/gary/direct.htm
Find Articles
http://www.findarticles.com/
Fossick
http://www.fossick.com/
iBoogie™
http://www.iboogie.tv/
INFOMINE
http://infomine.ucr.edu/
Inquirus
http://inquirus.nj.nec.com/
Instant Information Systems
http://www.docdel.com/
Intelligence Center
http://www.intelligence-center.com/
Intelliseek
http://www.intelliseek.com/
Intellisonar™
http://www.quigo.com/intellisonar.htm
Invisible Library
http://www.invisiblelibrary.com/
Invisible Web - Hidden Pages and Websites
http://websearch.about.com/cs/invisibleweb/
Invisible-Web - Searchable Databases and Specialized Search Engines
http://invisible-web.net/
Kapow Web Collector
http://www.automated-info-solutions.com/products_main.html
KDnuggets: Data Mining, Web Mining, and Knowledge Discovery Guide
http://www.kdnuggets.com/
KeepMedia
http://www.keepmedia.com/
Khadoma
http://www.khadoma.com/
Knowledge Discovery
http://www.KnowledgeDiscovery.info/
Librarians' Index to the Internet
http://lii.org/
MagPortal
http://www.magportal.com/
Mappa.Mundi Magazine
http://mappa.mundi.net/
Medical Databases Online
http://www.medic8.com/MedicalDatabases.htm
Microsoft Web Search Research and Patents
http://www.webmasterworld.com/forum34/481.htm
Mining the Invisible Web
http://www.MiningTheInvisibleWeb.com/
New Zealand Digital Library
http://www.nzdl.org/fast-cgi-bin/library
OAIster
http://oaister.umdl.umich.edu/o/oaister/
OneLook Dictionary Search
http://www.onelook.com/
Quigo Technologies
http://www.quigo.com/
Profusion
http://www.profusion.com/
Science and Technology Sources on the Internet
http://www.library.ucsb.edu/istl/01-winter/internet.html
Scientific and Technical Information Network (STINET)
http://stinet.dtic.mil/
Search Adobe PDF Online
http://searchpdf.adobe.com/
The Internet Sleuth
http://www.isleuth.com/
The Deep Web
http://library.albany.edu/internet/deepweb.html
The Deep Web: Surfacing Hidden Value
http://wfps.k12.mt.us/wfhs/library/deep_web.htm
The InvisibleWeb
http://www.invisibleweb.com/
The Invisible Web
http://www.lib.berkeley.edu/TeachingLib/Guides/Internet/InvisibleWeb.html
The Invisible Web WebLog
http://ciquest.shef.ac.uk/invisible/
Those Dark Hiding Places: The Invisible Web Revealed
http://library.rider.edu/scholarly/rlackie/Invisible/Inv_Web.html
Turbo10
http://turbo10.com/
UNESCO Information Services - Databases
http://www.unesco.org/unesdi/
Universal Data Element Framework (UDEF)
http://www.udef.org/
Wall Street Executive Library
http://www.executivelibrary.com/
Web Farming
http://webfarming.com/
Web Fountain by IBM
http://www-1.ibm.com/mediumbusiness/venture_development/emerging/wf.html
Web Intelligence Consortium
http://wi-consortium.org/
Web IR & IE
http://www.webir.org/
As I mentioned in the beginning of this guide, my current keynote presentation of Searching the World Wide Web Using Your Brain and Bots was created from these various resources along with my resources for Bot Research. Identifying competent and usable resources from the deep web, as well as identifying competent and usable tools to search the deep web is of extreme importance, and the following resources for Bot Research will start you on your discovery of knowledge for deep web research:
Bot Research Resources and Sites
Bot Research
http://www.BotResearch.info
BotTechnology.com
http://www.bottechnology.com/
BotSpot®
http://www.botspot.com/
UMBC AgentWeb
http://agents.umbc.edu/
Agent Construction Tools
http://www.agentbuilder.com/AgentTools/index.html
Spidering Hacks
http://www.oreilly.com/catalog/spiderhks/
BotLaw - The Place for Legal Research on Intelligent Agents/Bots
http://www.botlaw.com/
Fantomas Spider Spy™ The BotBase
http://fantomaster.com/fasvsspy01.html
1st Spot
http://1st-spot.net/topic_agents.html
AgentLand
http://www.agentland.com/
The Mobile Agent List
http://www.informatik.uni-stuttgart.de/ipvr/vs/projekte/mole/mal/preview/preview.html
The Web Robots Pages
http://www.robotstxt.org/wc/robots.html
Internet Agents - CWS Apps
http://cws.internet.com/32agents.html
The CGI Resource Index: Programs and Scripts: Perl: Searching
http://cgi.resourceindex.com/Programs_and_Scripts/Perl/Searching/
Tucows SearchBots for Windows 95/98
http://tucows.icm.edu.pl/searchbot95.html
GeneSys Middleware
http://sourceforge.net/projects/genesys-mw/
MultiAgent Systems
http://www.multiagent.com
Web IR & IE
http://www.webir.org/
Indexing Robot Crawler Checklist
http://www.searchtools.com/info/robots/robot-checklist.html
Search Engine Watch News
http://www.searchenginewatch.com/
Search Tools - Information Guides and News
http://www.searchtools.com/
Web Intelligence Consortium
http://wi-consortium.org/
Spider Hunter
http://www.spiderhunter.com/
Webbot - the W3C libwww Robot
http://www.w3.org/Robot/
AntWorld
http://aplab.rutgers.edu/ant/
TrademarkBots®
http://www.trademarkbots.com
Six Questions: Super Searcher Marcus P. Zillman, M.S., A.M.H.A.
http://www.bottechnology.com/latest_articles/article21.htm
Eliza - The Original
http://www-ai.ijs.si/eliza/eliza.html
ALICEBot
http://www.alicebot.org/
KiwiLogic
http://www.kiwilogic.com/
LifeFX
http://www.lifefx.com/
NativeMinds
http://www.nativeminds.com/
Botizen
http://www.botizen.com/
ChatterBots
http://www.ChatterBots.info/
ChatterBots at BotSpot®
http://botspot.com/search/s-chat.htm
The Simon Lavern Page
http://www.simonlaven.com/
Deep Web Research
http://www.DeepWebResearch.info
Knowledge Discovery
http://www.KnowledgeDiscovery.info
The “Deep Web” is a very exciting place to search and to do research. New tools are constantly being created, and more databases and unique files are constantly being added. This all adds up to a phenomenal growth area of the world wide web that deserves your constant attention through search and current awareness to alert you to the latest happenings and sources available on the Internet.