Error » Microsoft Error! » Microsoft live error » Search robots in disguise

Post New Thread Reply
  Search robots in disguise
LinkBack Thread Tools Display Modes
Old 08-Jan-2007, 11:17 PM   #1 (permalink)
Administrator
 
Anilrgowda's Avatar

Posts: 18,715
Join Date: Jan 2006
Rep Power: 10 Anilrgowda is on a distinguished road

IM:
Default Search robots in disguise

[FONT='Verdana','sans-serif']There are plenty of bots out there and, as a result, some conventions have arisen. Well-behaved bots identify themselves with a unique user-agent. They also follow the robots.txt conventions, which allow webmasters to control how their sites are crawled.[/font]
[FONT='Verdana','sans-serif'] [/font]
[FONT='Verdana','sans-serif']Here at Live Search, our crawlers are identified by the user-agent ‘MSNBot’. This may seem a little non-intuitive, but many webmasters depend on this, and so we chosen not to change it. In order to make things a little more transparent, we also identify our different types of crawlers. The complete list is as follows:[/font]
[FONT='Verdana','sans-serif'] [/font]
[FONT='Verdana','sans-serif'] MSNBot Main web crawler (www.live.com)[/font]
[FONT='Verdana','sans-serif'] MSNBot-Media Images & all other media (images.live.com)[/font]
[FONT='Verdana','sans-serif'] MSNBot-NewsBlogs News and blogs (search.live.com/news)[/font]
[FONT='Verdana','sans-serif'] MSNBot-Products Products & shopping (products.live.com)[/font]
[FONT='Verdana','sans-serif'] MSNBot-Academic Academic search (academic.live.com)[/font]
[FONT='Verdana','sans-serif'] [/font]
[FONT='Verdana','sans-serif']But what about crawlers that aren’t so well-behaved? After all, anyone could call themselves ‘MSNBot’, and proceed to be as rude and aggressive as they like. Fortunately, there is a way you can catch these impersonators. Here is how it works:[/font]
[FONT='Verdana','sans-serif'] [/font]
  1. [FONT='Verdana','sans-serif']When you get a page view request, it specifies a user-agent and an IP address. As I described above, all requests from Live Search use a user agent starting with the word ‘MSNBot’.[/font]
  2. [FONT='Verdana','sans-serif'][/font][FONT='Verdana','sans-serif']If you see the MSNBot user-agent, it’s time to check the identity of the bot. Starting with the IP address (i.e. 207.46.98.149), you can use reverse DNS lookup to find out the registered name of the machine.[/font]
  3. [FONT='Verdana','sans-serif'][/font][FONT='Verdana','sans-serif']Once you have the host name (in this case, livebot-207-46-98-149.search.live.com), you can check that it really is coming from Live Search. The name of all live search crawlers will end with ‘search.live.com’. If the name doesn’t end with ‘search.live.com’, you know it’s not really our crawler.[/font]
  4. [FONT='Verdana','sans-serif'][/font][FONT='Verdana','sans-serif']Finally, you need to verify that the name is accurate. In order to do this, you can use Forward DNS to see the IP address associated with the host name. This should match the IP address you used in Step 2 – if it doesn’t, it means the name was fake.[/font]
[FONT='Verdana','sans-serif'] [/font]
[FONT='Verdana','sans-serif']By verifying the crawler’s identity, you can catch masquerading crawlers. When you do catch one, you can simply return an HTTP Error, thus blocking them from seeing your content. [/font]
[FONT='Verdana','sans-serif'] [/font]
[FONT='Verdana','sans-serif']We are constantly looking for your feedback to help improve our engine – please send it our way using this link.[/font]
Anilrgowda is offline  
Digg this Post!Add Post to del.icio.usBookmark Post in TechnoratiFurl this Post!Spurl this Post!Reddit!
Reply With Quote
   


   
Post New Thread Reply


Currently Active Users Viewing This Thread: 1 (0 members and 1 guests)
 
Thread Tools
Display Modes

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off
Trackbacks are On
Pingbacks are On
Refbacks are On
Forum Jump


All times are GMT -8. The time now is 01:23 PM.

Powered by vBulletin® Version 3.7.2
Copyright ©2000 - 2008, Jelsoft Enterprises Ltd.
Search Engine Friendly URLs by vBSEO 3.2.0

DMCA Policy

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228