SpiderMonkey - WebBot Search Engine - MS SQL Server 7

Our search robot is registered among the top search engines of the world as a sophisticated software "engine" running from our primary internet server to reach out across the web and fetch content for our database (SQL) index.

It is automated by our programmers to systematically traverse the World Wide Web's hypertext structure and retrieve documents; thereafter recursively retrieving all documents that are linked from within the initial target document. "Recursive" here doesn't limit the definition to any specific traversal algorithm. Even though a robot might be programmed to apply some heuristic rule to the selection and order of documents it will visit; and also spaces out requests over a long span of time; it is still a "robot". Concomitantly, normal Web browsers are not robots because they are operated by a person and don't automatically retrieve referenced documents. Robots are sometimes referred to as Wanderers, Crawlers, or Spiders. Although arguably apropos, for the lay person, these names are a little misleading if they give the impression the software itself moves between sites like a virus; this not the case. The robot is software, permanently resident in its own computer, communicating from that computer its requests for website documents from other computers (the document server(s)) upon which the target site is resident.

Fig. 1: Searching For the Right Words

A search engine is a software programme resident on a computer that searches through a (usually massive) database. In the context of the World Wide Web, the word "search engine" is most often used for search forms that search through databases of HTML documents gathered by a robot.
Like most search engine service providers, for both quality and security reasons, URLs submitted by our visitors directly are stored in a temporary database before they are finally crawled and entered into the main search engine's index. We allow interested visitors viewing access to the temporary database. Use the "Pre-Index" engine here by either entering key words; entering your site name; or leave the search field blank, press "Pre-Index" and the engine will show you the entire list of recent submissions. You can see how others describe their sites and get some ideas for your own. If you have submitted your site using our Add Url form, you can check here and see how it looks. If you don't like it, remember that the final index entry will be derived from your web page, so spend your time working on your web page and it's meta-tags instead of resubmitting.

Our Crawler (Spider Monkey) visits and checks URLs during server off-peak load times and feeds the result to the index. All realms of the main database are refreshed no less than every 30 days. This temp. database is minimally crawled twice monthly and while a URL is fetched from the actual site, each entry here remains for a period of roughly 60 days to verify when and how it was submitted. Note: URLs submitted to our own Site Submit Service or submitted remotely by other authorized servers do not appear in the temp. database but can be found using the Mouse House Search Engine.

Spider Monkey abides by the Robot Exclusion Standard. Specifically, Spider Monkey adheres to the 1994 Robots Exclusion Standard (RES). Where the 1996 proposed standard supercedes the 1994 standard, the proposed standard is followed.

Spider Monkey will obey the first record in the robots.txt file with a User-Agent containing "Spider_ Monkey". If there is no such record, It will obey the first entry with a User-Agent of "*".

Before you submit your site for inclusion in our database (index), are there pages you don't want indexed? If so, put the following in the head of any web page you want excluded. Our crawler (Spider Monkey) will obey this instruction and skip the document.

<META NAME="robots" CONTENT="noindex">

Do you use meta content tags? You should at least set out the content of the page as succinctly as possible. If present, this will become the introduction to your page in the search results our visitors see. An example follows:

<meta name="Description" content="Learn, laugh and enjoy at the same time. International Information Technology firm has superb entertainment website for clients, employees and guests.">

You can link words and numbers together into phrases if you want specific words or numbers to appear together in your result pages. If you want to find an exact phrase, use "double quotation marks" around the phrase when you enter words in the search box.

Some Terminology Related To Search Engines

Boolean search: A search allowing the inclusion or exclusion of documents containing certain words through the use of operators such as AND, NOT and OR.

Concept search: A search for documents related conceptually to a word, rather than specifically containing the word itself.

Full-text index: An index containing every word of every document cataloged, including stop words (defined below).

Fuzzy search: A search that will find matches even when words are only partially spelled or misspelled.

Index: The searchable catalog of documents created by search engine software. Also called "catalog." Index is often used as a synonym for search engine.

Keyword search: A search for documents containing one or more words that are specified by a user.

Phrase search: A search for documents containing a exact sentence or phrase specified by a user.

Precision: The degree in which a search engine lists documents matching a query. The more matching documents that are listed, the higher the precision. For example, if a search engine lists 80 documents found to match a query but only 20 of them contain the search words, then the precision would be 25%.

Proximity search: A search where users to specify that documents returned should have the words near each other.

Query-By-Example: A search where a user instructs an engine to find more documents that are similar to a particular document. Also called "find similar."

Recall: Related to precision, this is the degree in which a search engine returns all the matching documents in a collection. There may be 100 matching documents, but a search engine may only find 80 of them. It would then list these 80 and have a recall of 80%.

Relevancy: How well a document provides the information a user is looking for, as measured by the user.

Spider: The software that scans documents and adds them to an index by following links. Spider is often used as a synonym for search engine.

Stemming: The ability for a search to include the "stem" of words. For example, stemming allows a user to enter "swimming" and get back results also for the stem word "swim."

Stop words: Conjunctions, prepositions and articles and other words such as AND, TO and A that appear often in documents yet alone may contain little meaning.

Thesaurus: A list of synonyms a search engine can use to find matches for particular words if the words themselves don't appear in documents.