Deep Web Research 2010

Bots, Blogs and News Aggregators is a keynote presentation that I have been delivering over the last several years, and much of my information comes from the extensive research that I have completed over the years into the “invisible” or what I like to call the “deep” web. The Deep Web covers somewhere in the vicinity of 1 trillion pages of information located through the world wide web in various files and formats that the current search engines on the Internet either cannot find or have difficulty accessing. The current search engines find about 200 billion pages at the present time of this writing.

In the last several years, some of the more comprehensive search engines have written algorithms to search the deeper portions of the world wide web by attempting to find files such as .pdf, .doc, .xls, ppt, .ps. and others. These files are predominately used by businesses to communicate their information within their organization or to disseminate information to the external world from their organization. Searching for this information using deeper search techniques and the latest algorithms allows researchers to obtain a vast amount of corporate information that was previously unavailable or inaccessible. Research has also shown that even deeper information can be obtained from these files by searching and accessing the “properties” information on these files!

This report and guide is designed to give you the resources you need to better understand the history of the deep web research, as well as various classified resources that allow you to search through the currently available web to find those key sources of information nuggets only found by understanding how to search the “deep web”.

This Deep Web Research 2010 article is divided into the following sections:

Articles, Papers, Forums, Audios and Videos Cross Database Articles Cross Database Search Services Cross Database Search Tools Peer to Peer, File Sharing, Grid/Matrix Search Engines
Presentations Resources – Deep Web Research Resources – Semantic Web Research Bot Research Resources and Sites Subject Tracer Information Blogs


99 Resources to Research & Mine the Invisible Web by Jessica Hupp

Academic and Scholar Search Engines and Sources

All of OCLC’s WorldCat Heading Toward the Open Web by Barbara Quint

An Interactive Clustering-based Approach to Integrating Source Query interfaces on the Deep Web by W. Wu, C. Yu, A. Doan, W. Meng

Annotation for the Deep Web

Automatic Extraction of Web Search Interfaces for Interface Schema Integration by H. He, W. Meng, C. Yu, Z. Wu

Automatic Information Extraction From Semi-Structured Web Pages By Pattern Discovery

Automatic Meaning Discovery Using Google by Rudi Cilibrasi and Paul M. B. Vitanyi

Beyond Google: The Invisible Web – Tools for Teaching the Invisible Web

Bibliomining Bibliography

Bibliomining for Automated Collection Development in a Digital Library Setting: Using Data Mining to Discover Web-Based Scholarly Research Works by Dr. Scott Nicholson

Bot Research

Client-Side Deep Web Data Extraction

Clustering E-Commerce Search Engines by Q. Peng, W. Meng, H. He, C. Yu

Common Information Environment Seeks To Reveal the Hidden Web,13927,1195901,00.html

Crawling the Hidden Web by Sriram Raghavan and Hector Garcia-Molina

Current Awareness Discovery Tools on the Internet

Data Extraction and Label Assignment for Web Databases

Deep Web – Exploring the Secrets of the Hiddden Internet by Marcus P. Zillman, M.S., A.M.H.A., – 23 minutes – Internet/Technology Channel

Deep Web Navigation in Web Data Extraction

Desperately seeking Web Search 2.0

DigiCULT Thematic Issue 6
Resource Discovery Technologies for the Heritage Sector, June 2004
Download Thematic Issue 6:Link HiRes .pdf (4,9 MB)

Efficient and Effective Metasearch Project

Experiences In Crawling Deep Web In The Context Of Local Search by Dheerendranath Mundluru and Xiongwu Xia

Graph Structure in the Web

Grey Literature

Grey Literature Network Service (GreyNet)

Gray Literature: Resources for Locating Unpublished Research by Brian S. Mathews

Gray Literature Subject Guide

Information Retrieval and the Semantic Web by Tim Finin, James Mayfield, Clay Fink, Anupam Joshi, and R. Scott Cost

In Search of the Deep Web

Invisible Web Gets Deeper

Invisible Web Revealed

IR and IE on the Web – PhD and MSc Dissertations

JEP: The Deep Web

LLRX: Book Review: The Invisible Web

LLRX: Deep Web Research

LLRX: Deep Web Research 2005

LLRX: Deep Web Research 2006

LLRX: Deep Web Research 2007

LLRX: Deep Web Research 2008

LLRX: Deep Web Research 2009

LLRX: Mining Deeper Into the Invisible Web

LLRX: ResearchWire: Exposing the Invisible Web

Metadata? Thesauri? Taxonomies? Topic Maps! by Lars Marius Garshol

Mining Newsgroups Using Networks Arising From Social Behavior

Mining the Deep Web: Search Strategies That Work by Lee Ratzan

Mining the Deep Web With Specialized Drills

Mining the Peanut Gallery: Opinion Extraction and Semantic Classification of Product Reviews

Mining Topic-Specific Concepts and Definitions on the Web

Modelling and Mining of Network Information Systems Publications

Net Plan Builds in Search by Kimberly Patch

Online or Invisible?

OntoMiner: Bootstrapping and Populating Ontologies From Domain Specific Web Sites

OpenIndex – Creating a Public Internet Index

Out-googling Google: Federated Searching and the Single Search Box

PhysicsWeb: The Physics of the Web

Publications about Web Analysis, Web Search, Citation Indexing, Digital Libraries, Machine Learning, Neural Networks [Steve Lawrence, Google Labs]

QProber: Classifying and Searching “Hidden-Web” Text Databases

Research Beyond Google: 119 Authoritative, Invisible, and Comprehensive Resources

Researchers Map of the Web

Scientific American: Featured Article: The Semantic Web

Search Engine Meeting 2005 Boston, Massachusetts – White Papers and Presentations

Search Engine Meeting 2006 Boston, Massachusetts – White Papers and Presentations

Search Engine Meeting 2007 Boston, Massachusetts – White Papers and Presentations

Search Engine Meeting 2008 Boston, Massachusetts – White Papers and Presentations

Search Engine Meeting 2009 Boston, Massachusetts – White Papers and Presentations

Search Engine Technology and Digital Libraries

Searching the Deep Web by Alex Wright

Searching the Deep Web

Searching the Deep Web – Video

Searching the Internet (White Paper, Audio and Video)

Search Interfaces on the Web: Querying and Characterizing by Denis Shestakov

Seeing through the ‘invisible’ Web

SemaForm – Semantic Wrapper Generation for Querying Deep Web Data Sources

Semantic Web Content Accessibility Guidelines for Current Research Information Systems (CRIS)by A. Lopatenko

Structured Databases on the Web: Observations and Implications

Testbed for Information Extraction from Deep Web

The Deep Web: Surfacing Hidden Value by Michael K. Bergman

The Future Of News: The Digital Information Librarian

The Hidden Potential of the Web,13927,1195901,00.html

The Invisible Web by Chris Sherman

The Invisible Web: What it is, Why it exists, How to find it, and Its Inherent Ambiguity
The Invisible Web: Where Search Engines Fear To Go

The Ultimate Guide to the Invisible Web

The Virtual Private Library(TM) and The Deep Web Video by Melissa Barker

Timeline of Events Related to the Deep Web

Topological Measures and Maps Of the Web

Toward the Semantic Deep Web by James Geller, Soon Ae Chun, and Yoo Jung An

Towards Automatic Incorporation of Search Engines Into A Large-Scale Metasearch Engine

Traffic-Based Feedback on the Web by Jonathan Aizen, Daniel Huttenlocher, Jon Kleinberg, and Antal Novak

Travel Industry and Deep Web: Exclusive Interview with Marcus P. Zillman

UMBC – AgentNews

Understanding Metadata

Using the Internet As a Dynamic Resource Tool for Knowledge Discovery

Web Characterization Activity

Web Data Extractors White Paper Link Compilation

Web Pages Search Engine Based on DNS by Wang Liang, Guo Yi-Ping, and Fang Ming

WebScales: Towards a Highly Scalable Metasearch Engine

What Is the Deep Web? A WhatIs Podcast 15 Minute Interview with Marcus P. Zillman

What is the Invisible Web? A Crawler Perspective by Natalia Arroyo, Laboratorio de Internet

Why the Deep Web Needs the Semantic Web by Jennifer Zaino

WISE-Cluster: Clustering E-Commerce Search Engines Automatically by Q. Peng, W. Meng, H. He, C. Yu

Yahoo and the Deep Web


Basic Functional Requirements for Cross Search Service

Digital Libraries- Cross-Database Search: One-Stop Shopping

Search Tools Reports: Searching for Text Information in Databases

The Right Solution: Federated Search Tools by Roy Tennant

UK Web Archiving Consortium


Entrez – The Life Sciences Cross-Database Search Engine

EnergyFiles – Subject Pathways

GPO Access – Search Across Multiple Databases

King County Library System

NLM Gateway Search


Scitopia – Deep Federated Search

The Metasearch Infrastructure Project


Bright Planet


Cross Database Search Tools Summary

Dieselpoint Java Search and Navigation Software

DbVisualizer – The Universal Database Tool

Dublin Core Metadata Initiative (DCMI)

EEVL Xtra – Cross Database Search


Gold Rush – Database Search Tool


MetaSearch Initiative

mod_oai Project – Getting OAI-PMH For Free


Peter’s PolySearch Engines

PBCore – The Public Broadcasting Metadata Dictionary

Registry of Library Knowledge Bases

Search Federal Research and Development

SRU – Search/Retrieve via URL

STINET Multisearch

The Flamenco Search Interface Project

VIAF: The Virtual International Authority File



ALPINE Network – SourceForge: Project

An Efficient Scheme for Query Processing on Peer-to-Peer Networks

Azureus – Vuze Java Bittorrent Client


Between Rhizomes and Trees: P2P Information Systems by Bryn Loban



BitTorrent FAQ and Guide

Bit Torrent Official Site and Search Engine

Bitzi – The Free Universal Media Catalog


BotSpot(R): File-sharing Bots
Coral – The Coral P2P Content Distribution Network

Capn’s PHP Gnutella Search

Crackle – Stream On

Current P2P Search Implementations – P2P Networks

Deepnet Explorer – P2P/RSS-ATOM Web Browser

Distributed Search Engines

Distributed Search in P2P Networks

FAROO – P2P Web Search

FilesOverMiles – Browser to Browser File Sharing (P2P)


Free Haven Project

Frost Project – Freenet Messaging and File Sharing Client

FuzzBox: Tangent Research Artificial Intelligence and Robotics

GNUnet – GNU Project – Free Software Foundation (FSF)

GRACE – GRid seArch and Categorization Engine

Grid, Distributed and Cloud Computing Resources – Open Source, Distributed Internet Crawler!

HyperCuP – Shaping Up Peer-to-Peer Networks

Ian Clarke’s Blog

IM and P2P Threat Center


International Workshop on Peer-to-Peer Knowledge Management (P2PKM)

Internet Movie Database (IMDb)

isoHunt – IRC and Bit Torrent Search Engine

JXTA Project

Kademlia: A Peer-to-peer Information System Based on the XOR Metric

Kazaa Media Desktop



Lphant – The Full P2P Solution

MoleSter – A Tiny File-Sharing Application



MysterNetworks – The Evolution of Peer-to-Peer

NeuroGrid – P2P Search

Open Directory – File Sharing

Open Directory – MP3 Search Engines

OpenNap: Open Source Napster Server

Oyster – Managing, Searching and Sharing Ontology Metadata in a Peer-to-Peer Network.

P2P and the Future of Private Copying by Peter K. Yu, Michigan State University College of Law

P2PNet – Updated P2P News

P2P News from Topex

PeerCast P2P Radio

PeerMind – P2P Monitor

Port Knocking

PowerFolder – P2P Whole Folder Synchronization

Rodi – Tiny P2P Client/Host



Slyck – File Sharing News and Info


Speckly – Torrent Search Simplified

Super-Peer-Based Routing and Clustering Strategies for RDF-Based Peer-to-Peer Networks

Swarm – A Transparently Scalable Distributed Programming Language

SwarmStream(TM) SDK

The Anthill Project

The Pirate Bay – BitTorrent Tracker

The Chord Project

The Freenet Project

The Peer-to-Peer Weblog

The Role of Peer to Peer File Sharing in Law Firm Marketing by Andy Havens


Torrent Finder

Torrent Reactor

Tranche Project – Secure P2P for the Scientific Community

Tribler – A Social Community That Facilitates Filesharing Through P2P


Understanding BitTorrent: An Experimental Perspective by Arnaud Legout, Guillaume Urvoy-Keller, and Pietro Michiardi

Videora – Personal Video Using P2P and RSS


WiPeer – Serverless Peer to Peer Collaboration

YaCy – Distributed P2P Based Web Indexing and Anonmymous Search Engine

Yahoo! Directory Peer-to-Peer File Sharing

YAPPERS: A Peer-to-Peer Lookup Service over Arbitrary Topology

YouServ – A P2P (peer-to-peer) Web Hosting/File Sharing System


Zilok – Peer To Peer Rental Marketplace


From Theory To Practice – Bielefeld Academic Search Engine

Gumshoe Librarian

Quick Introduction to OWL Web Ontology Language

Searching the Internet and the Invisible Web Video

The Virtual Private Library(TM) and The Deep Web Video by Melissa Barker

RESOURCES – Deep Web Research


AnkaSearch – Meta Search and Deep Web Search Desktop Tool

A Roadmap for Web Mining: From Web to Semantic Web

Biznar – Innovative Business Research Search Engine


Bot Research

BrainBoost – Question Answering Search Engine


Cazoodle – Search, Integrate, and Organize — The Real World

COLLATE – Collaboratory for Annotation, Indexing and Retrieval of Digitized Historical Archive Material

Comet Way

CompletePlanet – 70,000 Databases and Speciality Search Engines

Creative Commons RDF-Enhanced Search

Cuil Search – Search 127 Billion Web Pages

Cyber Cemetery


Cybermtrics – First Generation Tools – Invisible Web

Data Fountains: Open Source Internet Resource Discovery and Metadata/Full-Text Generation Service
Data Mining Resources

DeepDyve – Deep Web Search Engine

DeepPeep – Discover the Hidden Web

Deep Web Research

Deep Web Technologies

DigiCULT Resources – Resource Discovery & Information Retrieval


Diectory Resources

Direct Search

eFinancial Bot Deep Meta Search Engine

eGreenBot – Green Resources Search Engine

eHealthcare Bot Deep Meta Search Engine

eMarketing Bot Deep Meta Search Engine


Engineering Village 2

Hakia – Search For Meaning

Find Articles

FindThatFile – Comprehensive Internet File Search

Freely Accessible Databases for the Public

Ghostscript, Ghostview and GSview

GlobalSpec – Engineering Search Engine

Google Labs

Google Scholar

HighWire Press – Largest Repository of Free Full-Text Life Science Articles in the World


IncyWincy – The Invisible Web Search Engine


Instant Information Systems

Institutional Archives Registry

Intelligence Center

Intelligence Competence Center – ICCrawler

Internet Archive

Internet Search Environment Number (ISEN)


Invisible Library

Kapow Web Collector

KDnuggets: Data Mining, Web Mining, and Knowledge Discovery Guide


Knowledge Discovery

Kosmix – The Web Searched and Organized For You

Large-Scale Deep Web Integration: Incomplete Bibliography

Librarians’ Index to the Internet


Mamma – Deep Web Search Engine

Mappa.Mundi Magazine

Mednar – Innovative Medical Search

Microsoft Web Search Research and Patents

Mining the Deep Web for Economic Data

Mooter Search

MSN Sandbox

MyFeedMe – Always On, Always Looking, Always Learning

News Group Search

New Zealand Digital Library

OAI-PMH Implementation Guidelines – Conveying rights expressions about metadata in the OAI-PMH framework


OneLook Dictionary Search

Open Archives Initiative

OpenIndex – Creating a Public Internet Index

Open Source Intelligence

QProber: Classifying and Searching “Hidden-Web” Text Databases – PERSIVAL Project
Plagium – Plagiarism Tracker and Checker

Powerset – Natural Language Semantic Based Web Search Engine

Pretrieve Search – Free Public Record Search Engine

Recommended Gateway Sites for the Deep Web

Science Accelerator – Search Key Resources from DOE OSTI


Science and Technology Sources on the Internet

Scientific and Technical Information Network (STINET)

Science Commons – FirstGov for Science – Government Science Portal – Deep Web Search Engine

Scirus – Search Engine for Scientific Information

SDARTS – A Protocol and Toolkit for Metasearching

Search Adobe PDF Online

Site Update Notification Project – Web Crawler and Deep Web Research

Social Buzz Bot

STN International – Databases in Science and Technology

Swoogle – Semantic Bot

TechDeepWeb – How-To Guide to the Deep Web for IT Professionals

TechXtra – Indepth Academic and Scholar Search

Testbed for Information Extraction from Deep Web

The Invisible Web

THOR: Deep Web Data Extraction

Those Dark Hiding Places: The Invisible Web Revealed


UNESCO Information Services – Databases

Wall Street Executive Library

Web Data Extractors

Web Farming


Web Intelligence Consortium

Web IR & IE

WebScales: Towards a Highly Scalable Metasearch Engine

Web-Searching Agents

Zakta – Personal and Social Deep Web Search Engine

RESOURCES – Semantic Web Research

4Store – An Efficient, Scalable and Stable RDF Database

AIS SIGSEMIS – SIGSEMIS: Semantic Web and Information Systems

Analyzing Social Networks on the Semantic Web


Combining RDF and OWL with SOAP for Semantic Web

DARPA Agent Markup Language

DBin Project – Semantic Web P2P and/or Semantic Newsgroup Client.

DERI International – Digital Enterprise Research Institute

Digital Object Identifier (DOI)

Fabl – A Native Programming Language for the Semantic Web

FOAF Project – A Semantic Web Application

Foundation for Intelligent Physical Agents (FIPA)

GistWeb – Gist of Any Web Page Actual Content

Go3R – Knowledge Based Semantic Search Engine To Avoid Animal Experiments

GoodRelations Vocabulary – Semantic Web Based eCommerce

Great Summary – End Information Overload

hakia – Search for Meaning

HP Labs Semantic Web Research

Infomesh’s Semantic Web Introduction

International Journal of Metadata, Semantics and Ontologies (IJMSO)

International Journal on Semantic Web and Information Systems (IJSWIS)

Jena – A Semantic Web Framework for Java

Journal of Biomedical Semantics

Journal of Web Semantics

Journal of Web Semantics: Preprint Server

Knowledge Discovery


Knowledge Search

Language Engineering for the Semantic Web: A Digital Library for Endangered Languages

Linked Open Data from the New York Times

Magpie – The Samatic Filter and Tool For the Semantic Web

MetaData at W3C

MindRaider – Semantic Web Outliner



OASIS – Advancing eBusiness Standards

Ontologies for Education (O4E)

Ontology Matching

Ontology Metadata Vocabulary (OMV)


O’Reilly’s Semantic Web Primer

Potential Advantages Of Semantic Web For Internet Commerce by Yuxiao Zhao and Kristian Sandahl

Powerset – Natural Language Semantic Based Web Search Engine

pOWL – Semantic Web Development Plattform

Practical Semantic Analysis of Web Sites and Documents

RDF Context Tools

RDF – Resource Description Framework

Rules and Rule Markup Languages for the Semantic Web – RuleML-2003

Science and the Semantic Web

Semantic Desktop Environment – gnowsis

Semantic Email by Luke McDowell, Oren Etzioni, Alon Halevy, and Henry Levy

Semantic Interoperability of Metadata and Information in unLike Environments (SIMILE)

Semantic Knowledge Technologies and Language Computation
Semantic Markup Deconstructed Example

Semantic Routing BOF

Semantic Translator for Enhanced Retrieval by the Bremen University (BUSTER) – The Semantic Web Community Portal

Semantic Web Activity Statement

Semantic Web Application Platform – SWAP

Semantic Web for AURIS-MM

Semantic Web Laboratory

Semantic Web Primer for Object-Oriented Software Developers

Semantic Web Publications

Semantic Web Roadmap

Semantic Web Services Challenge

Semantic Web – The Voice of Semantic Web Technology

Semantic Web W3C

SenseBot – Semantic Search Engine That Finds Sense On the Web

SIG SEMIS Semantic Web and Information Systems

SIMAC – Foafing the Music – Semantic Interaction with Music Audio Contents

SIMILE Project – Semantic Interoperability of Metadata and Information in unLike Environments

Sindice – The Semantic Web Index

SOAPAgent – An Open SOAP Directory Project Info – OWL API

Swoogle – Semantic Bot

SWRL: A Semantic Web Rule Language Combining OWL and RuleML

Technology Review: Sir Tim Berners-Lee – The Semantic Web

The Cover Pages

The Memetic Web

The ontoprise(R) GmbH

The RDF Query Language (RQL)

The Semantic Grid

The Semantic Web: An Introduction

The Semantic Web By Tim Berners-Lee, James Hendler and Ora Lassila

The Semantic Web In Breadth

The Semantic Indexing Project – Creating Tools To Identify the Latent Knowledge Found in Text

The Semantic Web Is Your Friend

Transforming and Enriching Documents for the Semantic Web by Dietmar Roesner, Manuela Kunze, Sylke Kroetzsch

Twine – A Semantic Web Application That Allows You To Share, Organize, and Find Information

uClassify – Free Text Classified Web Service

UDDI – Universal Description, Discovery, and Integration

Web Semantics: Science, Services and Agents on the World Wide Web

Web Service Modeling Ontology

Wilbur Toolkit for Semantic Web Programming

World Wide Web Reference Semantic Web

Yahoo Groups – SemanticWeb


1st Spot

80legs – Powerful and Economical Service Platform for Crawling and Processing Web Content

Agent Construction Tools



Agent Model Yields Leadership

Agent Portal AI


AgentSheets – Authoring Tool to Create Agents

Alarm Growing Over Bot Software by Robert Lemos


Android World

Applied Soft Computing
Article Search API – New York Times Articles 1981 to Present

B.4.1 Search Robots – The Robots.txt File

Bookmach – Track Your Favorite Subject Using Sticky Zine and Blog Search

Bot A Blog

BotHunter – Passive Network Monitoring Tool

Bots, Blogs and News Aggregators


Build a Web Spider on Linux – A Simple Spider and Scraper Collects Internet Content

Cetus Links – Mobile Agents


Connotate – Intelligent Agent Technology and Competitive Intelligence Tools

Data Mining Resources

DataparkSearch Engine – Full-Featured Open Source Web-Based Search Engine

Deep Web Research

Design of a Parallel and Distributed Web Search Engine by Salvatore Orlando, Raffaele Perego, and Fabrizio Silvestri
Dictionary of Algorithms and Data Structures

Eliza – The Original ChatterBot

FAME (Facilitating Agents in Multiculture Exchange)Project

Fantomas Spider Spy(TM) The BotBase

File Information Tool Set (FITS)

Foundation for Intelligent Physical Agents


GeneSys Middleware

Google Guide

Google Wave – Communications and Collaboration Tool

IEI’s Graphical Programming Toolbox

iMacros(TM) – Browser Based Macro Recorder and Intelligent Agent

Imagination Engines

Indexing Robot Crawler Checklist

Information Retrieval Intelligence

Institute for Human and Machine Cognition (IHMC)

Intellexer – Custom Built Search Engines, Knowledge Management Tools, Natural Language Processing

Intelligent Information Systems Research Laboratory

International Journal of Agent-Oriented Software Engineering (IJAOSE)

Internet Mathematics

iRobis – Institute of Robotics in Scandinavia AB


Kngine – Semantic Search and Answer Engine

Knowledge Discovery

Koders – Source Code Search Engine

LAIR – Research Projects of the Laboratory of Applied Informatics Research

List of User-Agents (Spiders, Robots, Crawler, Browser)

Minimal-Intelligence Agents for Bargaining Behaviors in Market-Based Environments by Dave Cliff and Janet Bruten

MIT Media Lab: Software Agents

Modelling and Mining of Network Information Systems
Mozenda Web Agent Builder – Web Data Extraction



OpenKapow – Serving Mashups For the Long Tail of the Web

Open Source Web Information Retrieval (OSWIR05)

Oxyus Search Engine

ParsCit Project – Reference String Parsing – Web Spider and Search Engine

Robots.Txt Checker – Validator for Robots.txt Files

Robots.Txt – Robots Exclusion Standards

Searchbots – Uniquely Searching the Internet

Search Engine Robots

Search Engine Watch News

Search Tools – Information Guides and News

SeerSuite – CiteSeerX Toolkit

Semantic Indexing and Search

Semantic Web


Site Update Notification Project – Web Crawler and Deep Web Research

Smarter Bots

SocSciBot – Social Sciences Link Analysis Research

Spidering Hacks

Spinn3r: RSS Content, News Feeds, News Content, News Crawler and Web Crawler APIs

Structure and Interpretation of Computer Programs – Video Lectures by Hal Abelson and Gerald Jay Sussman

Supybot, A Superb Python IRC Bot

Swoogle – Semantic Bot

TBot – Windows Live Messenger Translation Bot

TextRunner Search – Searches Hundreds of Millions of Assertions Extracted from 500 Million High-Quality Web Pages

The Intelligent Software Agents Lab

The Lemur Toolkit – Language Modeling and Information Retrieval Research

The Search Engine Project (TSEP)

The Simon Lavern Page

The Web Robots Pages

TSEP – The Search Engine Project

UMBC AgentWeb

UMBC eBiquity

Webbot – the W3C libwww Robot

Web Curator Tool (WCT)

Web Data Extractors – White Paper Link Compilation

Web Information Retrieval/Natural Language Processing Group (WING)

Web Intelligence Consortium

Web IR & IE

WolframAlpha Computational Knowledge Engine – Trillions of Pieces of Curated Data and Millions of Lines of Algorithms

Words, Extended – Internet Text Information Retrieval, Extraction and Display Bot

Zakta – Personal and Social Deep Web Search Engine

Subject Tracer(TM) Information Blogs

Subject Tracer(TM) Information Blogs created and developed by the Virtual Private Library(TM) combine the best of the latest tools on the Internet. Using bots, blogs and news aggregators the Subject Tracer(TM) Information blogs generate RSS feeds with the latest resources to create a current information resource flow through niched subject tracers. I am proud to be the creator of the Internet’s first Subject Tracer(TM) Information Blogs:

Virtual Private Library(TM)

Accessibility Resources

Agriculture Resources

Artificial Intelligence Resources

Astronomy Resources

Auction Resources

Biological Informatics

Biotechnology Resources

Bot Research

Business Intelligence Resources


Data Mining Resources

Deep Web Research

Directory Resources

eCommerce Resources

Elder Resources

Employment Resources

Entrepreneurial Resources

Financial Sources

Finding People

Games Resources

Genealogy Resources

Grant Resources

Green Files

Grid, Distributed and Cloud Computing Resources

Healthcare Resources

Information Futures Markets

Information Quality Resources

International Trade Resources

Internet Alerts

Internet Demographics

Internet Experts

Internet Hoaxes

Intrapreneurial Resources

Journalism Resources

Knowledge Discovery

Military Resources

New Economy Analytics, Resources and Alerts

Outsourcing/Offshoring Information and Resources

Prediction Markets

Privacy Resources

Reference Resources

Research Resources


Script Resources


Social Informatics

Statistics Resources

Student Research

Theology Resources

Tutorial Resources

World Wide Web Reference

Posted in: Data Mining, Features, Internet Resources - Web Links, Internet Trends, Search Engines, Search Strategies