This site is for sale,
Learn More
Search Engine Spiders Miss Pages
Errors Which Confuse And Block The Search Engine Spiders
Originally Published: January, 2004
Introduction
Even after your site has been spidered and indexed by the major search engines, you may find that some pages aren't visited by the search engine spiders. This article discusses some of the potential problems and solutions for this incomplete spidering.
Bad HTTP Redirect
A poorly coded 301 redirect can block the spiders from your page. Use our
HTTP redirect tester
to confirm the server response code for your page.
Orphaned Pages
If your page is an orphan (has no incoming links from your other pages) then the spider won't find it. Some site development applications (e.g. Dreamweaver) can check your site for orphaned pages.
You can also check your site for orphaned pages using our
search engine spider map creator. This finds simple links in your pages so if this spider finds your page then the search engine spider should be able to as well.
You should also link check your site, broken links can create orphaned files.
Note that non-standard links using Java or Flash may effectively create orphan pages for the search engine spiders because the spiders can't follow these links. If your site has this problem then you should create simple links pointing to these "orphaned" files.
Pages Too Many Clicks From Default Page
Search engine spiders vary in how deeply they will spider your site. Often, pages more than 3-4 clicks from your home page are never spidered. This problem is easily solved by creating a
search engine spider map
for your site.
Site Has Too Many Pages
Some engines limit the number of pages they will spider and index. If you have a very large site you may be hitting this limit.
You may get more pages spidered by using the robots.txt file to block spiders from pages which won't bring traffic (such as a contact page) to make room for pages which will bring traffic. If your site it far larger than the limit then you should consider splitting the content across multiple domains.
Mangled HTML
Sometimes your HTML code is so bad that it appears the spider is ignoring the file. Use a HTML verification tool (http://www.w3.org/) to be sure your pages can be parsed as valid HTML. Learn more about
search engine spiders and validated HTML
Bad robots.txt File
The robots.txt file is created to block search engine robots and it can be tricky to set up correctly. If you don't find a file named "robots.txt" in your root directory you can skip this section.
You should read through your existing robots.txt file for problems which may be blocking the wrong files on your site. Here is a subtle situation which can cause problems:
User-agent: * # applies to all robots
Disallow: /help # intended to block all files in the directory named "help"
This robots.txt entry is intended to block all spiders from the directory named "help" but it will also block all spiders from
any file at the same level whose name starts with "help"
such as "helpful.htm" or "help_notes.php" or even a directory such as ""helpful-pages/".
You should avoid this type of ambiguity by writing the entry as:
User-agent: * # applies to all robots
Disallow: /help/ # blocks all files in the directory named "help"
Use the
robots.txt tester
to check your robots file for errors or create a new one from scratch using the
Simple robots.txt Creator.
See
http://www.robotstxt.org/wc/norobots.html
and
Search Engine Spider Control with robots.txt
for more information.
Bad Meta Index Tag
Your pages may contain an incorrect robots META tag. Look in the header of your HTML file for tags like these:
<META NAME="robots" CONTENT="noindex,follow">
<META NAME="robots" CONTENT="index,nofollow">
<META NAME="robots" CONTENT="noindex,nofollow">
These will block the spider from pages you may want indexed.
In general, the META robots should be avoided in most cases. For more information see:
http://www.robotstxt.org/wc/meta-user.html
This article covers some of the major reasons that pages don't get indexed. Hopefully this guide will help you find and fix any of these problems on your site.
Editor's Note:
This article discusses why the search engine spiders may not be finding or indexing some pages on your site. For more information about indexing and spidering problems see
Search Engines Indexing Problems, 22 Reasons Why Your Page Did Not Get Indexed
©2005 SearchEnginePromotionHelp.com, All Rights Reserved.
Site Promotion Articles Indexes:
|