Error » Search Engines » Search Engine Optimization » The Robots.txt file

Search Engine Optimization search engine optimization discussion.

Post New Thread Reply
  The Robots.txt file
LinkBack Thread Tools Display Modes
Old 22-Dec-2006, 02:50 AM   #1 (permalink)
Administrator
 
Anilrgowda's Avatar

Posts: 18,715
Join Date: Jan 2006
Rep Power: 10 Anilrgowda is on a distinguished road

IM:
Default The Robots.txt file

A search engine crawler or spider is a Web robot and, as such, normally chooses to follow the robots.txt file, if present. The robots.txt protocol per se was developed at the end of 1993 and even today, still remains the Web's standard for controlling how search engine robots actually access a particular Web site. Most major search engines claim to support it, but no robot, including a search engine spider, has to support it.
The purpose of the robots.txt protocol is to provide a mechanism for web servers to indicate to search engine crawlers which parts of their server should not be accessed, in other words, to prevent robots from reading certain parts of their server witch could contain sensitive or confidential information. How does this purpose relate to preventing a search engine from indexing a particular resource? Unfortunately, the general answer to this question is "It doesn't".
If the robots.txt file can be used to prevent access to certain parts of a web site, it can also prevent access to the whole site too! During my practice, on more than one instance I have found the robots.txt file to be the main culprit of why a site wasn't listed in certain search engines. One I cleared that, all was ok and the site was listed. If the robots.txt file isn't written correctly, it can cause all kinds of problems and, the worst part is, you will probably never find out about it just looking at your actual HTML code. When a client asks us to analyse a web site that has been online for about a year and is not listed in certain engines, the first place we look is the robots.txt file. Once we have corrected that and have optimized their most important keywords and key phrases, usually the rankings go way up within the next thirty to sixty days thereafter.
More on the robots.txt file The Disallow line in a robots.txt file means "disallow reading", but that does not mean "disallow indexing". In other words a disallowed resource may be listed in a search engineÆs index, even if the search engine follows the protocol. The most obvious demonstration of this is the Google search engine. Google can add files to its index without reading them, merely by considering links to those files. In theory, Google can build an index of an entire Web site without ever visiting that site or ever retrieving its robots.txt file. In so doing it is not breaking the robots.txt protocol because it is not reading any disallowed resources, it is simply reading other web sites' links to those resources, witch Google constantly uses in its page rank algorithm .
A web site does not necessarily need to be read in order to be indexed. To the question of how the robots.txt file can be used to prevent a search engine from listing a particular resource in its index, in practice, most search engines have placed their own interpretation on the robots.txt file which allows it to be used to prevent them from adding resources to their index. Most search engines interpret a resource being disallowed by the robots.txt file as meaning they should not add it to their index, and if it is already in their index (placed there by previous crawling activity), they remove it. This last point is important and the following example will illustrate that important subject.
The anomalies and inadequacies of the robots.txt file and robots meta tag properties are indicative of what sometimes could be a bigger problem. It is impossible to prevent any directly accessible resource on a site from being linked to by external sites, be they partner sites, competitive sites or, search engines. Even with the robots.txt file, there is no legal or technical reason why they should be used, least of all by humans creating links, for witch the standards were not even written. This may not seem a bad thing, but there are many instances when a site owner would rather a particular page would never be linked to from any other site on the Web. If such is the case, the robots.txt file will, to a certain degree help the site owner achieve his or her goals.
Anilrgowda is offline  
Digg this Post!Add Post to del.icio.usBookmark Post in TechnoratiFurl this Post!Spurl this Post!Reddit!
Reply With Quote
   


   
Post New Thread Reply


Currently Active Users Viewing This Thread: 1 (0 members and 1 guests)
 
Thread Tools
Display Modes

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off
Trackbacks are On
Pingbacks are On
Refbacks are On
Forum Jump

Similar Threads
Thread Thread Starter Forum Replies Last Post
Robots Power SEO Game Cheats 0 02-Apr-2008 10:29 AM
Robots Optimization Game Cheats 0 02-Apr-2008 10:28 AM
Rise of the Robots Optimization Game Cheats 0 02-Apr-2008 10:20 AM
Robots Spirit-X xbox cheats 0 14-Aug-2007 07:05 AM
Advanced Use of Robots.txt Anilrgowda Search Engine Optimization 0 07-Dec-2006 10:56 PM


All times are GMT -8. The time now is 04:30 AM.

Powered by vBulletin® Version 3.7.2
Copyright ©2000 - 2008, Jelsoft Enterprises Ltd.
Search Engine Friendly URLs by vBSEO 3.2.0

DMCA Policy

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228