Deep Web Research 2009

Bots, Blogs and News Aggregators is a keynote presentation that I have been delivering over the last several years, and much of my information comes from the extensive research that I have completed into the “invisible” or what I like to call the “deep” web. The Deep Web covers somewhere in the vicinity of 1 trillion pages of information located through the World Wide Web in various files and formats that the current search engines on the Internet either cannot find or have difficulty accessing. Search engines find about 20 billion pages at the time of this publication.

In the last several years, some of the more comprehensive search engines have written algorithms to search the deeper portions of the world wide web by attempting to find files such as .pdf, .doc, .xls, ppt, .ps, and others. These files are predominately used by businesses to communicate information within their organization, or to disseminate information to external communities. Searching for this information using deeper search techniques and the latest algorithms allows researchers to obtain a vast amount of corporate information that was previously unavailable or inaccessible. Research has also shown that even deeper information can be obtained from these files by searching and accessing the “properties” information on these files.

This guide is designed to provide a wide range of resources to better understand the history of deep web research. It also includes various classified resources that allow you to search through the currently available web to find key sources of information located via an understanding of how to search the “deep web”.

This Deep Web Research 2009 article is divided into the following sections:

  • Articles, Papers, Forums, Audios and Videos
  • Cross Database Articles
  • Cross Database Search Services
  • Cross Database Search Tools
  • Peer to Peer, File Sharing, Grid/Matrix Search Engines
  • Presentations
  • Resources – Deep Web Research
  • Resources – Semantic Web Research
  • Bot Research Resources and Sites
  • Subject Tracer Information Blogs

ARTICLES, PAPERS, FORUMS, AUDIOS AND VIDEOS (Current and Historical)

99 Resources to Research & Mine the Invisible Web by Jessica Hupp http://www.collegedegree.com/library/college-life/99-resources-to/

Academic and Scholar Search Engines and Sources http://www.ScholarSearchEngines.com/ All of OCLC’s WorldCat Heading Toward the Open Web by Barbara Quint http://www.infotoday.com/newsbreaks/nb041011-2.shtml

An Interactive Clustering-based Approach to Integrating Source Query interfaces on the Deep Web by W. Wu, C. Yu, A. Doan, W. Meng http://www.cs.binghamton.edu/~meng/pub.d/sigmod04-final.pdf

Annotation for the Deep Web http://csdl.computer.org/comp/mags/ex/2003/05/x5042abs.htm

Automatic Extraction of Web Search Interfaces for Interface Schema Integration by H. He, W. Meng, C. Yu, Z. Wu http://www.cs.binghamton.edu/~meng/pub.d/WWWposterhe.pdf

Automatic Information Extraction From Semi-Structured Web Pages By Pattern Discovery http://portal.acm.org/citation.cfm?id=640423&dl=ACM&coll=portal

Automatic Meaning Discovery Using Google by Rudi Cilibrasi and Paul M. B. Vitanyi http://arxiv.org/abs/cs.CL/0412098 Benevolent “Virus” Helps Reveal the Hidden Web http://www.syllabus.com/article.asp?id=9680

Beyond Google: The Invisible Web – Tools for Teaching the Invisible Web http://www.lagcc.cuny.edu/library/invisibleweb/teachingtools.htm

Bibliomining Bibliography http://www.bibliomining.com/ Bibliomining for Automated Collection Development in a Digital Library Setting: Using Data Mining to Discover Web-Based Scholarly Research Works by Dr. Scott Nicholson http://dlist.sir.arizona.edu/archive/00000625/

Bot Research http://www.BotResearch.info/

Client-Side Deep Web Data Extraction http://doi.ieeecomputersociety.org/10.1109/CEC-EAST.2004.30

Clustering E-Commerce Search Engines by Q. Peng, W. Meng, H. He, C. Yu http://www.cs.binghamton.edu/~meng/pub.d/WWWposterPeng.pdf

Common Information Environment Seeks To Reveal the Hidden Web http://society.guardian.co.uk/e-public/story/0,13927,1195901,00.html

Crawling the Hidden Web by Sriram Raghavan and Hector Garcia-Molina http://citeseer.ist.psu.edu/461253.html

Current Awareness Discovery Tools on the Internet http://zillman.blogspot.com/2004/09/current-awareness-discovery-tools-on.html

Data Extraction and Label Assignment for Web Databases http://www2003.org/cdrom/papers/refereed/p470/p470-wang.htm

Deep Content – Guide To Effective Searching of the Internet http://www.brightplanet.com/deepcontent/tutorials/search/index.asp

Deep Web – Exploring the Secrets of the Hiddden Internet by Marcus P. Zillman, M.S., A.M.H.A., – 23 minutes – Internet/Technology Channel http://www.planetearthradio.com/technology.htm

Deep Web Navigation in Web Data Extraction http://snipurl.com/13xdm

Desperately seeking Web Search 2.0 http://snipurl.com/64im

DigiCULT Thematic Issue 6 Resource Discovery Technologies for the Heritage Sector, June 2004 Download Thematic Issue 6:Link HiRes .pdf (4.9 MB) http://snipurl.com/7v46

Diving in the Deep End of the Web by Suzanne Ross http://research.microsoft.com/displayArticle.aspx?id=1052

Efficient and Effective Metasearch Project http://www.cs.binghamton.edu/~meng/metasearch.html

Google Teams Up with 17 Colleges to Test Searches of Scholarly Materials By Jeffrey R. Young http://chronicle.com/free/2004/04/2004040901n.htm

Graph Structure in the Web http://www9.org/w9cdrom/160/160.html

Grey Literature http://en.wikipedia.org/wiki/Gray_literature

Grey Literature Network Service (GreyNet) http://www.greynet.org/

Gray Literature: Resources for Locating Unpublished Research by Brian S. Mathews http://www.pla.org/ala/mgrps/divs/acrl/publications/crlnews/2004/mar/graylit.cfm

Gray Literature Subject Guide http://www.csulb.edu/library/subj/gray_literature/

Information Retrieval and the Semantic Web by Tim Finin, James Mayfield, Clay Fink, Anupam Joshi, and R. Scott Cost http://ebiquity.umbc.edu/v2.1/paper/html/id/185/

In Search of the Deep Web http://archive.salon.com/tech/feature/2004/03/09/deep_web/index_np.html

Invisible Web Gets Deeper http://www.searchenginewatch.com/sereport/article.php/2162871

Invisible Web Revealed http://www.searchenginewatch.com/sereport/article.php/2167321

IR and IE on the Web – PhD and MSc Dissertations http://www.webir.org/phd.html

JEP: The Deep Web http://hdl.handle.net/2027/spo.3336451.0007.104

LLRX: Book Review: The Invisible Web //www.llrx.com/features/invisibleweb.htm

LLRX: Deep Web Research //www.llrx.com/features/deepweb.htm

LLRX: Deep Web Research 2005 //www.llrx.com/features/deepweb2005.htm

LLRX: Deep Web Research 2006 //www.llrx.com/features/deepweb2006.htm

LLRX: Deep Web Research 2007 //www.llrx.com/features/deepweb2007.htm

LLRX: Deep Web Research 2008 //www.llrx.com/features/deepweb2008.htm

LLRX: Mining Deeper Into the Invisible Web //www.llrx.com/features/mining.htm

LLRX: ResearchWire: Exposing the Invisible Web //www.llrx.com/columns/exposing.htm

Metadata? Thesauri? Taxonomies? Topic Maps! by Lars Marius Garshol http://www.ontopia.net/topicmaps/materials/tm-vs-thesauri.html

Mining Newsgroups Using Networks Arising From Social Behavior http://www.almaden.ibm.com/cs/projects/iis/hdb/Publications/papers/www03_social.pdf

Mining the Deep Web: Search Strategies That Work by Lee Ratzan http://www.computerworld.com/action/article.do?command=viewArticleBasic&articleId=9005757&pageNumber=1

Mining the Deep Web With Specialized Drills http://lists.webjunction.org/wjlists/web4lib/2001-January/034742.html

Mining the Peanut Gallery: Opinion Extraction and Semantic Classification of Product Reviews http://www.kushaldave.com/p451-dave.pdf

Mining Topic-Specific Concepts and Definitions on the Web http://www.cs.uic.edu/~liub/publications/WWW-2003.pdf

Modelling and Mining of Network Information Systems Publications http://www.mathstat.dal.ca/~mominis/Publications.htm

Net Plan Builds in Search by Kimberly Patch http://snipurl.com/5kn0

Online or Invisible? http://citeseer.ist.psu.edu/online-nature01/

OntoMiner: Bootstrapping and Populating Ontologies From Domain Specific Web Sites http://www.public.asu.edu/~hdavulcu/VLDB-WS03.pdf

OpenIndex – Creating a Public Internet Index http://www.openindex.org/index.php

Out-googling Google: Federated Searching and the Single Search Box http://library.marist.edu/ACRL/Foxhunt_demo.html

PhysicsWeb: The Physics of the Web http://physicsweb.org/article/world/14/7/09

Publications about Web Analysis, Web Search, Citation Indexing, Digital Libraries, Machine Learning, Neural Networks [Steve Lawrence, Google Labs] http://labs.google.com/people/lawrence/

QProber: Classifying and Searching “Hidden-Web” Text Databases http://qprober.cs.columbia.edu/

Research Beyond Google: 119 Authoritative, Invisible, and Comprehensive Resources http://oedb.org/library/college-basics/research-beyond-google

Researchers Map of the Web http://www.almaden.ibm.com/almaden/webmap_press.html

Scientific American: Featured Article: The Semantic Web http://www.sciam.com/article.cfm?id=the-semantic-web

Search Engine Meeting 2005 Boston, Massachusetts – White Papers and Presentations http://www.infonortics.com/searchengines/sh05/05pro.html

Search Engine Meeting 2006 Boston, Massachusetts – White Papers and Presentations http://www.infonortics.com/searchengines/sh06/06pro.html

Search Engine Meeting 2007 Boston, Massachusetts – White Papers and Presentations http://www.infonortics.com/searchengines/sh07/07pro.html

Search Engine Meeting 2008 Boston, Massachusetts – White Papers and Presentations http://www.infonortics.com/searchengines/sh08/08pro.html

Search Engine Technology and Digital Libraries http://www.dlib.org/dlib/june04/lossau/06lossau.html

Searching the Deep Web by Alex Wright http://mags.acm.org/communications/200810/?pg=16

Searching the Deep Web http://www.dlib.org/dlib/january01/warnick/01warnick.html

Searching the Deep Web – Video http://www.osti.gov/media/DeepWebVideo.html

Searching the Deep Web Online Streaming Tutorial http://www.InformationDetective.com/

Searching the Internet (White Paper, Audio and Video) http://www.SearchingTheInternet.info/

Seeing through the ‘invisible’ Web http://www.usatoday.com/tech/2001/10/15/invisible-web-search.htm

SemaForm – Semantic Wrapper Generation for Querying Deep Web Data Sources http://www.ucalgary.ca/~jkwalny/502/finalreport.pdf

Semantic Web Content Accessibility Guidelines for Current Research Information Systems (CRIS)by A. Lopatenko http://derpi.tuwien.ac.at/~andrei/AURIS_DE.htm

Smart Search – Advanced Search Engines Link Many Data Sources http://gcn.com/23_24/tech-report/26999-1.html

Structured Databases on the Web: Observations and Implications http://eagle.cs.uiuc.edu/pubs/2004/dwsurvey-sigmodrecord-chlpz-aug04.pdf

Testbed for Information Extraction from Deep Web http://research.microsoft.com/users/nickcr/pubs/yamada_www2004poster.pdf

The Deep Web http://www.internettutorials.net/deepweb.html

The Deep Web: Surfacing Hidden Value by Michael K. Bergman http://hdl.handle.net/2027/spo.3336451.0007.104

The Future Of News: The Digital Information Librarian http://www.masternewmedia.org/2004/03/24/the_future_of_news_the.htm

The Hidden Potential of the Web http://society.guardian.co.uk/e-public/story/0,13927,1195901,00.html

The Invisible Web by Chris Sherman http://www.freepint.com/issues/080600.htm#feature

The Invisible Web: What it is, Why it exists, How to find it, and Its Inherent Ambiguity http://www.lib.berkeley.edu/TeachingLib/Guides/Internet/InvisibleWeb.html

The Invisible Web: Where Search Engines Fear To Go http://www.powerhomebiz.com/vol25/invisible.htm

The Mechanics of Deep Net Meta Search http://turbo10.com/papers/deepnet.pdf

The Ultimate Guide to the Invisible Web http://oedb.org/library/college-basics/invisible-web

Timeline of Events Related to the Deep Web http://papergirls.wordpress.com/2008/10/07/timeline-deep-web/

Topological Measures and Maps Of the Web http://informatics.indiana.edu/fil/Web/

Toward the Semantic Deep Web by James Geller, Soon Ae Chun, and Yoo Jung An http://www.computer.org/portal/cms_docs_computer/computer/homepage/Sep08/r9itsys.pdf

Towards Automatic Incorporation of Search Engines Into A Large-Scale Metasearch Engine http://www.cs.binghamton.edu/~meng/pub.d/wi2003.pdf

Traffic-Based Feedback on the Web by Jonathan Aizen, Daniel Huttenlocher, Jon Kleinberg, and Antal Novak http://www.pnas.org/cgi/content/abstract/0307539100v1

Travel Industry and Deep Web: Exclusive Interview with Marcus P. Zillman http://blog.relactions.com/2007/08/travel-industry-and-deep-web-exclusive.html

UMBC – AgentNews http://agents.umbc.edu/agentnews/

Understanding Metadata http://www.niso.org/standards/resources/UnderstandingMetadata.pdf

Using the Internet As a Dynamic Resource Tool for Knowledge Discovery http://zillman.blogspot.com/2004/09/using-internet-as-dynamic-resource.html

Web Characterization Project http://wcp.oclc.org/

Web Data Extractors White Paper Link Compilation http://www.WebDataExtractors.com/

Web Pages Search Engine Based on DNS by Wang Liang, Guo Yi-Ping, and Fang Ming http://arxiv.org/pdf/cs.NI/0403035

WebScales: Towards a Highly Scalable Metasearch Engine http://www.cs.binghamton.edu/~meng/pub.d/PIreport04.html

What Is the Deep Web? A WhatIs Podcast 15 Minute Interview with Marcus P. Zillman http://zillman.blogspot.com/2006/10/what-is-deep-web.html

What is the Invisible Web? A Crawler Perspective by Natalia Arroyo, Laboratorio de Internet http://cybermetrics.wlv.ac.uk/AoIRASIST/arroyo.html

WISE-Cluster: Clustering E-Commerce Search Engines Automatically by Q. Peng, W. Meng, H. He, C. Yu http://www.cs.binghamton.edu/~meng/pub.d/PengWIDM04.pdf

Yahoo and the Deep Web http://news.com.com/2100-1024-5167931.html

CROSS DATABASE ARTICLES

Basic Functional Requirements for Cross Search Service http://www.icbl.hw.ac.uk/perx/basicfunctionalrequirements.htm

Digital Libraries- Cross-Database Search: One-Stop Shopping http://www.libraryjournal.com/article/CA170458.html

Search Tools Reports: Searching for Text Information in Databases http://www.searchtools.com/info/database-search.html

The Right Solution: Federated Search Tools by Roy Tennant http://snipurl.com/5zxp

UK Web Archiving Consortium http://www.webarchive.org.uk/

CROSS DATABASE SEARCH SERVICES

ARC – A Cross Archive Search Service http://arc.cs.odu.edu/

Entrez – The Life Sciences Cross-Database Search Engine http://www.ncbi.nlm.nih.gov/Entrez/index.html

EnergyFiles – Subject Pathways http://energyfiles.osti.gov/

GPO Access – Search Across Multiple Databases http://www.gpoaccess.gov/multidb.html

King County Library System http://www.kcls.org/

NLM Gateway Search http://gateway.nlm.nih.gov/gw/Cmd

SUMSearch http://sumsearch.uthscsa.edu/

Scitopia – Deep Federated Search http://www.scitopia.org/scitopia/

The Metasearch Infrastructure Project http://www.cdlib.org/inside/projects/metasearch/

CROSS DATABASE SEARCH TOOLS

Bright Planet http://brightplanet.com/ Copernic http://www.copernic.com/en/index.html

Cross Database Search Tools Summary http://lists.webjunction.org/wjlists/web4lib/2001-September/027669.html

Dieselpoint Java Search and Navigation Software http://www.dieselpoint.com/

DbVisualizer – The Universal Database Tool http://www.dbvis.com/products/dbvis/

Dublin Core Metadata Initiative (DCMI) http://www.dublincore.org/

EEVL Xtra – Cross Database Search http://www.ariadne.ac.uk/issue44/eevl/

EMC http://software.emc.com/

Gold Rush – Database Search Tool http://goldrush.coalliance.org/

MetaLib http://www.exlibrisgroup.com/metalib.htm

MetaSearch Initiative http://www.niso.org/workrooms/mi

Project – Getting OAI-PMH For Free http://www.modoai.org/

MuseGlobal http://www.museglobal.com/

Peter’s PolySearch Engines http://www2.hawaii.edu/~jacso/extra/poly-page.html

PBCore – The Public Broadcasting Metadata Dictionary http://www.utah.edu/cpbmetadata/

Registry of Library Knowledge Bases http://www.public.iastate.edu/~CYBERSTACKS/KBL.htm

Search Federal Research and Development http://fedrnd.osti.gov/

SRU – Search/Retrieve via URL http://www.loc.gov/standards/sru

STINET Multisearch http://multisearch.dtic.mil/

The Flamenco Search Interface Project http://bailando.sims.berkeley.edu/flamenco.html

VIAF: The Virtual International Authority File http://www.oclc.org/research/projects/viaf/default.htm

WebFeat http://www.webfeat.org/

PEER TO PEER (P2P), FILE SHARING, GRID AND MARIX SEARCH ENGINES

ALPINE Network – SourceForge: Project http://sourceforge.net/projects/alpine/

An Efficient Scheme for Query Processing on Peer-to-Peer Networks http://aeolusres.homestead.com/files/index.html angrycoffee.com http://www.AngryCoffee.com/

Azureus – Vuze Java Bittorrent Client http://azureus.sourceforge.net/

BadBlue http://badblue.com/

Between Rhizomes and Trees: P2P Information Systems by Bryn Loban http://firstmonday.org/htbin/cgiwrap/bin/ojs/index.php/fm/article/view/1182

Bibster http://bibster.semanticweb.org/index.htm

BigChampagne http://www.bigchampagne.com/

BitTorrent FAQ and Guide http://www.dessent.net/btfaq/

Bit Torrent Official Site and Search Engine http://www.BitTorrent.com/

Bitzi – The Free Universal Media Catalog http://www.bitzi.com/

Blubster http://www.blubster.com/

BotSpot®: File-sharing Bots http://www.botspot.com/BOTSPOT/Windows/Download_Bots/File-sharing_Bots/

BTjunkie – Bittorrent Search Engine http://www.btjunkie.org/

Coral – The Coral P2P Content Distribution Network http://www.coralcdn.org/

Capn’s PHP Gnutella Search http://capnbry.net/gnutella/gs.php

Crackle – Stream On http://www.crackle.com/

Current P2P Search Implementations – P2P Networks http://ntrg.cs.tcd.ie/undergrad/4ba2.02-03/p8.html#CurrentP2PSearchImplementations

Deepnet Explorer – P2P/RSS-ATOM Web Browser http://www.deepnetexplorer.com/

Distributed Search Engines http://www.openp2p.com/pub/t/74

Distributed Search in P2P Networks http://csdl.computer.org/comp/mags/ic/2002/01/w1068abs.htm

FAROO – P2P Web Search http://www.faroo.com/ Filetopia http://www.filetopia.org/

Free Haven Project http://www.freehaven.net/index.html

Frost Project – Freenet Messaging and File Sharing Client http://jtcfrost.sourceforge.net/

FuzzBox: Tangent Research Artificial Intelligence and Robotics http://tangentresearch.com/news/07252001_p2p_ai.html

GNUnet – GNU Project – Free Software Foundation (FSF) http://www.gnu.org/software/GNUnet/gnunet.html

GRACE IST Project http://www.grace-ist.org/

GRACE – GRid seArch and Categorization Engine http://www.ub.uni-stuttgart.de/grace/

Grid Resources http://www.GridResources.info/

Grokster3G http://www.grokster3g.com/grub.org

Open Source, Distributed Internet Crawler! http://grub.org/

HyperCuP – Shaping Up Peer-to-Peer Networks http://www-db.stanford.edu/~schloss/hypercup/Ian

Clarke’s Blog http://blog.locut.us/

IM and P2P Threat Center http://www.symantec.com/business/security_response/

iMesh http://www.iMesh.com/ International Workshop on Peer-to-Peer Knowledge Management (P2PKM) http://www.p2pkm.org/

Internet Movie Database (IMDb) http://www.imdb.com/iso

Hunt – IRC and Bit Torrent Search Engine http://isohunt.com/

JXTA Project https://jxta.dev.java.net/

Kademlia: A Peer-to-peer Information System Based on the XOR Metric http://citeseer.ist.psu.edu/529075.html

Kazaa Media Desktop http://www.kazaa.com/us/index.htm

LegalTorrents http://www.legaltorrents.com/

Limewire http://www.limewire.com/

LionShare P2P Project – Legitimate File-Sharing Among Individuals and Educational Institutions http://lionshare.its.psu.edu

Lphant – The Full P2P Solution http://www.lphant.com/

MoleSter – A Tiny File-Sharing Application http://ansuz.sooke.bc.ca/software/molester/

Mnet http://mnet.sourceforge.net/

MusicBrainZ http://www.MusicBrainZ.org/

MysterNetworks – The Evolution of Peer-to-Peer http://www.mysternetworks.com/

NeuroGrid – P2P Search http://www.neurogrid.net/ Open Directory – File Sharing http://dmoz.org/Computers/Software/Internet/Clients/File_Sharing/

Open Directory – MP3 Search Engines http://dmoz.org/Arts/Music/Sound_Files/MP3/Search_Engines/

OpenNap: Open Source Napster Server http://opennap.sourceforge.net/

OpenP2P.com http://www.openp2p.com/

Oyster – Managing, Searching and Sharing Ontology Metadata in a Peer-to-Peer Network. http://oyster.ontoware.org/

P2P and the Future of Private Copying by Peter K. Yu, Michigan State University College of Law http://papers.ssrn.com/sol3/papers.cfm?abstract_id=578568

P2PNet – Updated P2P News http://p2pnet.net/index.php

P2P News from Topex http://www.topix.net/tech/p2p

PeerCast P2P Radio http://www.peercast.org/

PeerMind – P2P Monitor http://www.PeerMind.com/

Piolet http://www.piolet.com/ Port Knocking http://www.portknocking.org/

PowerFolder – P2P Whole Folder Synchronization http://www.powerfolder.com/

Rodi – Tiny P2P Client/Host http://larytet.sourceforge.net/btRat.shtml

ScrapeTorrent http://www.ScrapeTorrent.com/ Skype http://www.skype.com/

Slyck – File Sharing News and Info http://www.slyck.com/index.php

Snoopstar http://www.snoopstar.com/

Speckly – Torrent Search Simplified http://speckly.com/

Super-Peer-Based Routing and Clustering Strategies for RDF-Based Peer-to-Peer Networks http://citeseer.ist.psu.edu/nejdl02superpeerbased.html

SwarmStream™ SDK http://onionnetworks.com/products/swarmstream/

The Anthill Project http://www.cs.unibo.it/projects/anthill/

The Pirate Bay – BitTorrent Tracker http://thepiratebay.org/

The Chord Project http://pdos.csail.mit.edu/chord/

The Freenet Project http://freenetproject.org/

The Peer-to-Peer Weblog http://p2p.weblogsinc.com/

The Role of Peer to Peer File Sharing in Law Firm Marketing by Andy Havens //www.llrx.com/columns/marketing7.htm

ToPeer http://www.topeer.com/

Torrent Finder http://ts.kurtubba.com/

Torrent Reactor http://www.torrentreactor.net/

Torrent Typhoon (TT) http://www.torrenttyphoon.com/

Tranche Project – Secure P2P for the Scientific Community http://tranche.proteomecommons.org/

Tribler – A Social Community That Facilitates Filesharing Through P2P http://www.tribler.org/

TrustyFiles http://www.trustyfiles.com/

Understanding BitTorrent: An Experimental Perspective by Arnaud Legout, Guillaume Urvoy-Keller, and Pietro Michiardi http://hal.inria.fr/inria-00000156/en

URLBlaze: URL Sharing Network http://www.urlblaze.com/

Videora – Personal Video Using P2P and RSS http://www.videora.com/

WASTE http://slackerbitch.free.fr/waste/

WiPeer – Serverless Peer to Peer Collaboration http://www.wipeer.com/

YaCy – Distributed P2P Based Web Indexing and Anonmymous Search Engine http://www.yacy.net/

Yahoo! Directory Peer-to-Peer File Sharing http://dir.yahoo.com/Computers_and_Internet/Internet/Peer_to_Peer_File_Sharing/

YAPPERS: A Peer-to-Peer Lookup Service over Arbitrary Topology http://citeseer.ist.psu.edu/ganesan03yappers.html

YouServ – A P2P (peer-to-peer) Web Hosting/File Sharing System http://www.bayardo.org/youserv/

Zebra http://indexdata.dk/zebra/

PRESENTATIONS

From Theory To Practice – Bielefeld Academic Search Engine http://www.diglib.org/forums/spring2004/presentations/summann-2004-04.pdf

Gumshoe Librarian //www.llrx.com/features/gumshoe.htm

Quick Introduction to OWL Web Ontology Language http://www.iro.umontreal.ca/~lapalme/ift6281/OWL/CostelloQuickIntroOwl.pdf

Searching the Internet and the Invisible Web http://www.InformationDetective.com/

The Future of the Internet: Bots, Blogs and News Aggregators http://www.zillman.tv/

RESOURCES – Deep Web Research

A Roadmap for Web Mining: From Web to Semantic Web http://eprints.pascal-network.org/archive/00000841/01/roadmap.pdf

Beaucoup http://www.beaucoup.com/

BlogPulse http://www.BlogPulse.com/

Bot Research http://www.BotResearch.info/

BrainBoost – Question Answering Search Engine http://www.BrainBoost.com/

BrightPlanet’s Deep Federation Portal™ (DFP) http://www.brightplanet.com/products/dfportal.asp

Can’t Find On Google http://www.cantfindongoogle.com

COLLATE – Collaboratory for Annotation, Indexing and Retrieval of Digitized Historical Archive Material http://www.collate.de/

Comet Way http://www.cometway.com/content.agent?page_name=Home

CompletePlanet – 70,000 Databases and Speciality Search Engines http://www.completeplanet.com/

Creative Commons RDF-Enhanced Search http://search.creativecommons.org/

Cuil Search – Search 121,617,892,992 Web Pages http://www.cuil.com/

Cyber Cemetery http://govinfo.library.unt.edu/

CyberFiber http://www.cyberfiber.com Cybermtrics – First Generation Tools – Invisible Web http://www.cindoc.csic.es/cybermetrics/search13.html

Data Fountains: Open Source Internet Resource Discovery and Metadata/Full-Text Generation Service http://infomine.ucr.edu/Data_Fountains/

Data Mining Resources http://www.DataMiningResources.info/

DeepDyve – Deep Web Search Engine http://www.deepdyve.com/

Deep Web Research http://www.DeepWebResearch.info/

Deep Web Technologies http://www.deepwebtech.com/

DigiCULT Resources – Resource Discovery & Information Retrieval http://www.digicult.info/pages/resources.php?t=21 digitalAGORA http://aut.edu/agora/

Directory Resources http://www.DirectoryResources.info/

Direct Search http://www.freepint.com/gary/direct.htm

eFinancial Bot Deep Meta Search Engine http://www.eFinancialBot.com/

eHealthcare Bot Deep Meta Search Engine http://www.eHealthcareBot.com/

eMarketing Bot Deep Meta Search Engine http://www.eMarketingBot.com/

ENDECA http://www.endeca.com/

Engineering Village 2 http://www.engineeringvillage2.org/

Hakia – Search For Meaning http://www.hakia.com/

Find Articles http://www.findarticles.com/PI/index.jhtml

Freely Accessible Databases for the Public http://www.istl.org/01-winter/internet.html

Ghostscript, Ghostview and GSview http://www.cs.wisc.edu/~ghost/

GlobalSpec – Engineering Search Engine http://search.globalspec.com/Search/WebSearch

Google Labs http://labs.google.com/

Google Scholar http://scholar.google.com/

HighWire Press – Largest Repository of Free Full-Text Life Science Articles in the World http://highwire.stanford.edu/

iBoogie™ http://www.iboogie.tv/ IncyWincy – The Invisible Web Search Engine http://www.incywincy.com/

INFOMINE http://infomine.ucr.edu/

Instant Information Systems http://www.docdel.com/

Institutional Archives Registry http://archives.eprints.org/eprints.php?action=browse

Intelligence Center http://www.intelligence-center.com/

Intellisonar™ http://www.quigo.com/intellisonar.htm

Internet Archive http://www.archive.org/

Internet Search Environment Number (ISEN) http://www.isen.org/ Intute http://www.intute.ac.uk/ Invisible Library http://sanchezkisser.com/blog/

Kapow Web Collector http://www.automated-info-solutions.com/

KDnuggets: Data Mining, Web Mining, and Knowledge Discovery Guide http://www.kdnuggets.com/

KeepMedia http://www.keepmedia.com/

Knowledge Discovery http://www.KnowledgeDiscovery.info/

Large-Scale Deep Web Integration: Incomplete Bibliography http://metaquerier.cs.uiuc.edu/webibib.html

Librarians’ Index to the Internet http://lii.org/

MagPortal http://www.magportal.com/

Mamma – Deep Web Search Engine http://www.mamma.com/

Mappa.Mundi Magazine http://mappa.mundi.net/

Microsoft Web Search Research and Patents http://www.webmasterworld.com/forum97/5.htm

Mining the Deep Web for Economic Data http://www.citris-uc.org/research/projects/mining_the_deep_web_for_economic_data

Mooter Search http://www.mooter.com/

MSN Sandbox http://sandbox.msn.com/

News Group Search http://newsgroups.langenberg.com/

New Zealand Digital Library http://www.nzdl.org/

OAI-PMH Implementation Guidelines – Conveying rights expressions about metadata in the OAI-PMH framework http://www.openarchives.org/OAI/2.0/guidelines-rights.htm

OAIster http://oaister.umdl.umich.edu/o/oaister/

OneLook Dictionary Search http://www.onelook.com/

Open Archives Initiative http://www.openarchives.org/

OpenIndex – Creating a Public Internet Index http://www.openindex.org/index.php

QProber: Classifying and Searching “Hidden-Web” Text Databases – PERSIVAL Project http://qprober.cs.columbia.edu/

Quigo Technologies http://www.quigo.com/

Powerset – Natural Language Semantic Based Web Search Engine http://www.powerset.com/

Pretrieve Search – Free Public Record Search Engine http://www.pretrieve.com/

Recommended Gateway Sites for the Deep Web http://people.hws.edu/hunter/deepwebgate03.htm

Science Accelerator – Search Key Resources from DOE OSTI http://www.scienceaccelerator.gov/

reSearcher http://researcher.sfu.ca/

Science and Technology Sources on the Internet http://www.library.ucsb.edu/istl/01-winter/internet.html

Scientific and Technical Information Network (STINET) http://stinet.dtic.mil/

Science Commons http://sciencecommons.org/

Science.gov – FirstGov for Science – Government Science Portal http://www.science.gov/

Scirus – Search Engine for Scientific Information http://www.scirus.com/srsapp/

SDARTS – A Protocol and Toolkit for Metasearching http://sdarts.cs.columbia.edu/

Search Adobe PDF Online http://www.SearchPDF.com/

STN International – Databases in Science and Technology http://www.stn-international.de/

Swoogle – Semantic Bot http://swoogle.umbc.edu/

TechDeepWeb – How-To Guide to the Deep Web for IT Professionals http://www.TechDeepWeb.com/

TechXtra – Indepth Academic and Scholar Search http://www.techxtra.ac.uk/

Testbed for Information Extraction from Deep Web http://research.microsoft.com/users/nickcr/pubs/yamada_www2004poster.pdf

The Internet Sleuth http://www.isleuth.com/

The Deep Web http://www.internettutorials.net/deepweb.html

The Invisible Web http://www.lib.berkeley.edu/TeachingLib/Guides/Internet/InvisibleWeb.html

THOR: Deep Web Data Extraction http://www.cc.gatech.edu/projects/disl/THOR/

Those Dark Hiding Places: The Invisible Web Revealed http://www.robertlackie.com/invisible/index.html

Turbo10 http://turbo10.com/

UNESCO Information Services – Databases http://www.unesco.org/unesdi/

Wall Street Executive Library http://www.executivelibrary.com/

Web Data Extractors http://www.WebDataExtractors.com/

Web Farming http://webfarming.com/ WebFountain™ http://www.research.ibm.com/journal/sj/431/gruhl.html

Web Intelligence Consortium http://wi-consortium.org/

Web IR & IE http://www.webir.org/ WebScales: Towards a Highly Scalable Metasearch Engine http://www.cs.binghamton.edu/~meng/pub.d/PIreport04.html

Web-Searching Agents http://www.aaai.org/AITopics/html/webagent.html

RESOURCES – Semantic Web Research

AIS SIGSEMIS – SIGSEMIS: Semantic Web and Information Systems http://www.sigsemis.org/

Analyzing Social Networks on the Semantic Web http://snipurl.com/cbdq

Bibster http://bibster.semanticweb.org/index.htm

Combining RDF and OWL with SOAP for Semantic Web http://www.ida.liu.se/~yuxzh/doc/ncws-041002.pdf

DARPA Agent Markup Language http://www.daml.org/

DBin Project – Semantic Web P2P and/or Semantic Newsgroup Client. http://www.dbin.org/

DERI International – Digital Enterprise Research Institute http://www.deri.org/

Digital Object Identifier (DOI) http://www.doi.org/ Fabl – A Native Programming Language for the Semantic Web http://fabl.net/

FOAF Project – A Semantic Web Application http://www.foaf-project.org/

Foundation for Intelligent Physical Agents (FIPA) http://www.fipa.org/

Go3R – Knowledge Based Semantic Search Engine To Avoid Animal Experiments http://www.go3r.org/

hakia – Search for Meaning http://www.hakia.com/

HP Labs Semantic Web Research http://www.hpl.hp.com/semweb/index.html

Infomesh’s Semantic Web Introduction http://infomesh.net/2001/swintro/

International Journal of Metadata, Semantics and Ontologies (IJMSO) http://www.inderscience.com/browse/index.php?journalCODE=ijmso

International Journal on Semantic Web and Information Systems (IJSWIS) http://www.ijswis.org/ Jena – A Semantic Web Framework for Java http://jena.sourceforge.net/

Journal of Web Semantics http://snipurl.com/15sdr

Journal of Web Semantics: Preprint Server http://www.websemanticsjournal.org/

Knowledge Discovery http://www.KnowledgeDiscovery.info/

KnowledgeNets http://www.inf.fu-berlin.de/inst/ag-nbi/research/wissensnetze/

Knowledge Search http://www.KnowledgeSearch.org/

Language Engineering for the Semantic Web: A Digital Library for Endangered Languages http://informationr.net/ir/9-3/paper176.html

Magpie – The Samatic Filter and Tool For the Semantic Web http://kmi.open.ac.uk/projects/magpie/main.html

MetaData at W3C http://www.w3.org/Metadata/

Metadata FAQ – Metadata for Education http://www.cetis.ac.uk/metadatafaq/FrontPage

MindRaider – Semantic Web Outliner http://mindraider.sourceforge.net/

MindSwap http://www.MindSwap.org/

MuseoSuomi http://www.museosuomi.fi/

OASIS – Advancing eBusiness Standards http://www.oasis-open.org/home/index.php

OIL – Ontology Inference Layer http://www.ontoknowledge.org/oil/index.shtml

Ontologies for Education (O4E) http://o4e.iiscs.wssu.edu/xwiki/bin/view/Blog/About

Ontology Matching http://www.ontologymatching.org/

Ontology Metadata Vocabulary (OMV) http://omv.ontoware.org/

OntoWare http://ontoware.org/

O’Reilly’s Semantic Web Primer http://www.xml.com/pub/a/2000/11/01/semanticweb/

Potential Advantages Of Semantic Web For Internet Commerce by Yuxiao Zhao and Kristian Sandahl http://www.ida.liu.se/~yuxzh/doc/iceis-030120.pdf

Powerset – Natural Language Semantic Based Web Search Engine http://www.powerset.com/

pOWL – Semantic Web Development Plattform http://powl.sourceforge.net/

Practical Semantic Analysis of Web Sites and Documents http://citeseer.ist.psu.edu/despeyroux04practical.html

RDF Context Tools http://www.dbin.org/RDFContextTools.php

RDF – Resource Description Framework http://www.w3.org/RDF/

Rules and Rule Markup Languages for the Semantic Web – RuleML-2003 http://www.informatik.uni-trier.de/~ley/db/conf/semweb/ruleml2003.html

Science and the Semantic Web http://www.mindswap.org/Science/

Semantic Blogging: Spreading the Semantic Web Meme http://jena.hpl.hp.com/~stecay/papers/xmleurope2004/040420_semblog_draft10.html

Semantic Desktop Environment – gnowsis http://www.gnowsis.org/

Semantic Email by Luke McDowell, Oren Etzioni, Alon Halevy, and Henry Levy http://www.cs.usna.edu/~lmcdowel/

Semantic Interoperability of Metadata and Information in unLike Environments (SIMILE) http://simile.mit.edu/

Semantic Knowledge Technologies and Language Computation http://gate.ac.uk/projects/sekt/

Semantic Markup Deconstructed Example http://www.cs.umd.edu/users/hendler/sciam/walkthru.html

Semantic Routing BOF http://www.neurogrid.net/SemanticRouting/SemanticRoutingBOF.htm

Semantic Translator for Enhanced Retrieval by the Bremen University (BUSTER) http://www.informatik.uni-bremen.de/agki/www/buster/new/application.html

SemanticWeb.org – The Semantic Web Community Portal http://www.semanticweb.org/

Semantic Web Activity Statement http://www.w3.org/2001/sw/Activity.html

Semantic Web Application Platform – SWAP http://www.w3.org/2000/10/swap/

Semantic Web Feeds http://semanticwebfeeds.com/

Semantic Web for AURIS-MM http://derpi.tuwien.ac.at/~andrei/AURIS-MM-plan.html

Semantic Web Laboratory http://iit-iti.nrc-cnrc.gc.ca/business-affaire/sem-web-lab_e.html

Semantic Web Primer for Object-Oriented Software Developers http://www.w3.org/TR/2006/NOTE-sw-oosd-primer-20060309/ http://www.w3.org/2001/sw/

Semantic Web Publications http://www.w3.org/2001/sw/#pub

Semantic Web Roadmap http://www.w3.org/DesignIssues/Semantic.html

Semantic Web Services Challenge http://www.sws-challenge.org/

Semantic Web W3C http://www.w3.org/2001/sw/ SemText – Semantic Hypertext – Making Latent Semantics Blatant http://semtext.org/mambo/index.php

SIG SEMIS Semantic Web and Information Systems http://www.sigsemis.org/

SIMAC – Foafing the Music – Semantic Interaction with Music Audio Contents http://foafing-the-music.iua.upf.edu/

SIMILE Project – Semantic Interoperability of Metadata and Information in unLike Environments http://simile.mit.edu/

Sindice – The Semantic Web Index http://sindice.com/

SOAPAgent – An Open SOAP Directory http://soapagent.com/

SourceForge.net: Project Info – OWL API http://sourceforge.net/projects/owlapi

Swoogle – Semantic Bot http://swoogle.umbc.edu/

SWRL: A Semantic Web Rule Language Combining OWL and RuleML http://www.daml.org/2003/11/swrl/

Technology Review: Sir Tim Berners-Lee – The Semantic Web http://www.technologyreview.com/articles/04/10/frauenfelder1004.asp

The Cover Pages http://xml.coverpages.org/

The Memetic Web http://www.memeticweb.org/

The ontoprise® GmbH http://www.ontoprise.de/ The RDF Query Language (RQL) http://139.91.183.30:9090/RDF/RQL/

The Semantic Grid http://www.semanticgrid.org/

The Semantic Social Network by Stephen Downes http://www.downes.ca/cgi-bin/website/view.cgi?dbs=Article&key=1076791198

The Semantic Web: An Introduction http://infomesh.net/2001/swintro/

The Semantic Web By Tim Berners-Lee, James Hendler and Ora Lassila http://snipurl.com/297g

The Semantic Web In Breadth http://logicerror.com/semanticWeb-long

The Semantic Indexing Project – Creating Tools To Identify the Latent Knowledge Found in Text http://www.knowledgesearch.org/

The Semantic Web Is Your Friend http://www.freepint.com/issues/270504.htm#feature

Transforming and Enriching Documents for the Semantic Web by Dietmar Roesner, Manuela Kunze, Sylke Kroetzsch http://arxiv.org/abs/cs.AI/0501096

Twine – A Semantic Web Application That Allows You To Share, Organize, and Find Information http://www.twine.com/

UDDI – Universal Description, Discovery, and Integration http://uddi.xml.org/

Web Semantics: Science, Services and Agents on the World Wide Web http://www.sciencedirect.com/science/journal/15708268

Web Service Modeling Ontology http://www.wsmo.org/

Wilbur Toolkit for Semantic Web Programming http://wilbur-rdf.sourceforge.net/

World Wide Web Reference http://www.WWWReference.info/

XML.com: Semantic Web http://www.xml.com/pub/rg/Semantic_Web

XML.org http://www.xml.org/

Yahoo Groups – SemanticWeb http://groups.yahoo.com/group/semanticweb/

BOT RESEARCH RESOURCES AND SITES

1st Spot http://1st-spot.net/topic_agents.html

Agent Construction Tools http://www.agentbuilder.com/

AgentLand http://www.agentland.com/

AgentLink http://www.AgentLink.org/

Agent Model Yields Leadership http://snipurl.com/99mh

Agent Portal AI http://www.agent.ai/

Agents http://www.aaai.org/AITopics/html/agents.html

AgentSheets – Authoring Tool to Create Agents http://www.agentsheets.com/

Alarm Growing Over Bot Software by Robert Lemos http://news.com.com/2100-7349_3-5202236.html?tag=nefd.lede

ALICEBot http://www.alicebot.org/ Android World http://www.androidworld.com/index.htm

Applied Soft Computing http://www.sciencedirect.com/science/journal/15684946B.4.1

Search Robots – The Robots.txt File http://www.w3.org/TR/REC-html40/appendix/notes.html#h-B.4.1

Bookmach – Track Your Favorite Subject Using Sticky Zine and Blog Search http://www.Bookmach.com/

Bot A Blog http://www.BotABlog.com/

Bots, Blogs and News Aggregators http://www.BotsBlogs.com

BotSpot® http://www.botspot.com/

BrowseEngine – Real-Time Meta-Data Search Engine http://www.browseengine.com/

Build a Web Spider on Linux – A Simple Spider and Scraper Collects Internet Content http://snipurl.com/128e6

Cetus Links – Mobile Agents http://www.cetus-links.org/oo_mobile_agents.html

ChatterBots http://www.ChatterBots.info/

Connotate – Intelligent Agent Technology and Competitive Intelligence Tools http://www.connotate.com/intelligent_software_agents.aspx

Data Mining Resources http://www.DataMiningResources.info/

DataparkSearch Engine – Full-Featured Open Source Web-Based Search Engine http://www.dataparksearch.org/

DataStructures http://www.DataStructures.info/

Deep Web Research http://www.deepwebresearch.info/

Design of a Parallel and Distributed Web Search Engine by Salvatore Orlando, Raffaele Perego, and Fabrizio Silvestri http://arxiv.org/abs/cs.IR/0407053

Dictionary of Algorithms and Data Structures http://www.nist.gov/dads/

Eliza – The Original ChatterBot http://www-ai.ijs.si/eliza/eliza.html

FAME (Facilitating Agents in Multiculture Exchange)Project http://cordis.europa.eu/fetch?ACTION=D&CALLER=PROJ_IST&RCN=58337

Fantomas Spider Spy™ The BotBase http://fantomaster.com/fasvsspy01.html

Foundation for Intelligent Physical Agents http://www.fipa.org/

FyberSearch http://www.fybersearch.com/

GeneSys Middleware http://sourceforge.net/projects/genesys-mw/

Google Guide http://www.googleguide.com/

IEI’s Graphical Programming Toolbox http://www.imagination-engines.com/gpt.htm

iMacros™ – Browser Based Macro Recorder and Intelligent Agent http://wiki.imacros.net/Main_Page

Imagination Engines http://www.imagination-engines.com/

Indexing Robot Crawler Checklist http://www.searchtools.com/robots/robot-checklist.html

Institute for Human and Machine Cognition (IHMC) http://www.ihmc.us/

Intellexer – Custom Built Search Engines, Knowledge Management Tools, Natural Language Processing http://www.intellexer.com/

International Journal of Agent-Oriented Software Engineering (IJAOSE) http://www.inderscience.com/ijaose

Internet Mathematics http://www.InternetMathematics.org/

KiwiLogic http://www.kiwilogic.com/

Knowledge Discovery http://www.knowledgediscovery.info/

Koders – Source Code Search Engine http://koders.com/

LAIR – Research Projects of the Laboratory of Applied Informatics Research http://lair.indiana.edu/research/

List of User-Agents (Spiders, Robots, Crawler, Browser) http://www.psychedelix.com/agents/index.shtml

Minimal-Intelligence Agents for Bargaining Behaviors in Market-Based Environments by Dave Cliff and Janet Bruten http://www.hpl.hp.com/techreports/97/HPL-97-91.html

MIT Media Lab: Software Agents http://agents.media.mit.edu/index.html

Modelling and Mining of Network Information Systems http://www.mathstat.dal.ca/~mominis/index.html

MultiAgent http://www.MultiAgent.com/

MySpiders http://myspiders.informatics.indiana.edu/

OpenKapow – Serving Mashups For the Long Tail of the Web http://www.openkapow.com/

Open Source Web Information Retrieval (OSWIR05) http://www.emse.fr/OSWIR05/

Oxyus Search Engine http://sourceforge.net/projects/oxyus/

ParsCit Project – Reference String Parsing http://wing.comp.nus.edu.sg/parsCit/

PhpDig.net – Web Spider and Search Engine http://www.phpdig.net/

Robots.Txt Checker – Validator for Robots.txt Files http://tool.motoricerca.info/robots-checker.phtml

RobotsTxt.org http://www.robotstxt.org/

Searchbots – Uniquely Searching the Internet http://www.Searchbots.net/

Search Engine Robots http://www.jafsoft.com/searchengines/webbots.html

Search Engine Watch News http://www.searchenginewatch.com/

Search Tools – Information Guides and News http://www.searchtools.com/

Semantic Indexing and Search http://www.knowledgesearch.org/

Semantic Web http://www.semanticweb.org/

ShoppingBots http://www.ShoppingBots.info/

SiteMaps.org http://www.SiteMaps.org/

Smarter Bots http://www.SmarterBots.com/

SocSciBot3 and SocSciBot 4 http://socscibot.wlv.ac.uk/

Spider Hunter http://www.spiderhunter.com/

Spidering Hacks http://www.oreilly.com/catalog/spiderhks/

Spinn3r: RSS Content, News Feeds, News Content, News Crawler and Web Crawler APIs http://spinn3r.com/

Structure and Interpretation of Computer Programs – Video Lectures by Hal Abelson and Gerald Jay Sussman http://www.swiss.ai.mit.edu/classes/6.001/abelson-sussman-lectures/

Supybot, A Superb Python IRC Bot http://freshmeat.net/projects/supybot/?branch_id=31808&release_id=181322

Swoogle – Semantic Bot http://swoogle.umbc.edu/

The Intelligent Software Agents Lab http://www-2.cs.cmu.edu/~softagents/

The Lemur Toolkit – Language Modeling and Information Retrieval Research http://www.lemurproject.org/

The Search Engine Project (TSEP) http://freshmeat.net/projects/tsep/

The Simon Lavern Page http://www.simonlaven.com/

The Web Robots Pages http://www.robotstxt.org/wc/robots.html

TSEP – The Search Engine Project http://www.tsep.info/

UMBC AgentWeb http://agents.umbc.edu/

UMBC eBiquity http://ebiquity.umbc.edu/

Webbot – the W3C libwww Robot http://www.w3.org/Robot/

Web Curator Tool (WCT) http://webcurator.sourceforge.net/

Web Data Extractors – White Paper Link Compilation http://www.WebDataExtractors.com/

Web Information Retrieval/Natural Language Processing Group (WING) http://wing.comp.nus.edu.sg/portal/

Web Intelligence Consortium http://wi-consortium.org/

Web IR & IE http://www.webir.org/

Words, Extended – Internet Text Information Retrieval, Extraction and Display Bot http://home.earthlink.net/~glenn_scheper/

Posted in: Data Mining, Features, Legal Research, Search Engines, Search Strategies