About our Robot and Search Engine [ Robot Tech. Specs. ]
Here is a sample of what you see when you visit SpiderMonkey
Our search robot is registered among the top search engines of the world as a sophisticated software "engine" running from our primary internet server to reach out across the web and fetch content for our database (SQL) index. It is automated by our programmers to systematically traverse the World Wide Web's hypertext structure and retrieve documents; thereafter recursively retrieving all documents that are linked from within the initial target document. "Recursive" here doesn't limit the definition to any specific traversal algorithm. Even though a robot might be programmed to apply some heuristic rule to the selection and order of documents it will visit; and also spaces out requests over a long span of time; it is still a "robot". Concomitantly, normal Web browsers are not robots because they are operated by a person and don't automatically retrieve referenced documents. Robots are sometimes referred to as Wanderers, Crawlers, or Spiders. Although arguably apropos, for the lay person, these names are a little misleading if they give the impression the software itself moves between sites like a virus; this not the case. The robot is software, permanently resident in its own computer, communicating from that computer its requests for website documents from other computers (the document server(s)) upon which the target site is resident. A search engine is a software programme resident on a computer that searches through a (usually massive) database. In the context of the World Wide Web, the word "search engine" is most often used for search forms that search through databases of HTML documents gathered by a robot.
Like most search engine service providers, for both quality and security reasons, URLs submitted by our visitors directly are stored in a temporary database before they are finally crawled and entered into the main search engine's index. We allow interested visitors viewing access to the temporary database. Use the "Pre-Index" engine here by either entering key words; entering your site name; or leave the search field blank, press "Pre-Index" and the engine will show you the entire list of recent submissions. You can see how others describe their sites and get some ideas for your own. If you have submitted your site using our Add Url form, you can check here and see how it looks. If you don't like it, remember that the final index entry will be derived from your web page, so spend your time working on your web page and it's meta-tags instead of resubmitting.
Our Crawler (Spider Monkey) visits and checks URLs during server off-peak load times and feeds the result to the index. All realms of the main database are refreshed no less than every 30 days. This temp. database is minimally crawled twice monthly and while a URL is fetched from the actual site, each entry here remains for a period of roughly 60 days to verify when and how it was submitted. Note: URLs submitted to our own Site Submit Service or submitted remotely by other authorized servers do not appear in the temp. database but can be found using the Mouse House Search Engine.
Spider Monkey abides by the Robot Exclusion Standard. Specifically, Spider Monkey adheres to the 1994 Robots Exclusion Standard (RES). Where the 1996 proposed standard supercedes the 1994 standard, the proposed standard is followed.
Spider Monkey will obey the first record in the robots.txt file with a User-Agent containing "Spider_ Monkey". If there is no such record, It will obey the first entry with a User-Agent of "*".Before you submit your site for inclusion in our database (index), are there pages you don't want indexed? If so, put the following in the head of any web page you want excluded. Our crawler (Spider Monkey) will obey this instruction and skip the document.
<META NAME="robots" CONTENT="noindex">Do you use meta content tags? You should at least set out the content of the page as succinctly as possible. If present, this will become the introduction to your page in the search results our visitors see. An example follows:
<meta name="Description" content="Learn, laugh and enjoy at the same time. International Information Technology firm has superb entertainment website for clients, employees and guests.">
Searched for dns | 1-10 of 73 | 901155 pages searched |
Our search engine finds documents at Mouse House and throughout the World Wide Web. Here's how it works: you tell our search engine what you're looking for by typing in keywords, phrases, or questions in the search box. Our search engine responds by giving you a list of all the Web pages in our crawler's (we call it SpiderMonkey and you can read its technical details from the WWW robot registry by clicking here) index relating to those topics. The most relevant content will appear at the top of your results. Most foul language is ignored by our Search Engine. Conclude it is not a tool for seeking porn sites. |
Spider Monkey's index is a large, growing, organized collection of data comprised of Web pages, their content and location and discussion group pages from around the world. The 'index' becomes larger every day as people send us the addresses for new Web pages and as our systems administrators search for new material. We own sophisticated technology that crawls the Web daily during lower server load periods looking for links to new pages. When you use the Mouse House search engine, you search the entire collection using keywords or phrases, just like other search engines such as Yahoo or Alta Vista |