Error » Search Engines » Yahoo » The Yahoo SLURP Crawler

Yahoo THE search engine. This is for discussion about Yahoo

Post New Thread Reply
  The Yahoo SLURP Crawler
LinkBack Thread Tools Display Modes
Old 07-Dec-2006, 10:38 PM   #1 (permalink)
Administrator
 
Anilrgowda's Avatar

Posts: 18,704
Join Date: Jan 2006
Rep Power: 10 Anilrgowda is on a distinguished road

IM:
Default The Yahoo SLURP Crawler

As SEOs and webmasters, we're always looking for ways to get the search engine spiders to crawl our sites, and the deeper, the better. This article shows you how to target Yahoo's crawler and convince it to stop by regularly.
The search engine wars are fought with strategies, alliances, and robots. As Yahoo! primes itself to be the number one contender for market share after Google, websites that want to optimize for Yahoo must study how Yahoo ranks pages and how it indexes pages. The Yahoo web crawler SLURP should be studied; your site server logs should have recorded visits from various robots, including SLURP. If you do not have records of SLURP visiting your site, then this article will give tips on how to get SLURP to crawl (hopefully deep crawl) your site.
The Preamble
Yahoo SLURP evolved from Inktomi SLURP. The Yahoo SLURP robot is an upgrade from Inktomi’s SLURP. Yahoo used Inktomi’s search engine to replace Google, which used to take care of its search results. This officially triggered the second search engine wars (the first was won by Google without it declaring hostilities).
Yahoo has at least 130 million registered users on its network. Granted, Google is the definitive search engine, but Yahoo is large enough that it should not be ignored.\SLURP crawls websites, scans their contents and meta tags, and travels down the links contained on the page. It then brings back information for the search engine to index. Yahoo SLURP 2.0 stores the full text of the page it crawls in its memory and then returns to Yahoo’s searchable database. This is one of the semi-unique points of Yahoo SLURP; not all search engine crawlers store the entire text of the pages they crawl.
While SLURP has some features unique to it, it also obeys the robots.txt command. This command is very important since it ensures that you have control over which pages the crawler searches and indexes. This lets you protect the sensitive pages which you need to keep secure, pages which contain information you would rather not have in the hands of hackers (who regularly try and infiltrate search engines databases), or pages which you don’t want indexed at all (for whatever reason).
Another good thing about the robots.txt file is that it enables you to exclude specific robots, so you can inhibit the Googlebot but enable SLURP to crawl a particular page. This can be useful if you have optimized different pages for separate search engines. This may occur in order to give you flexibility, but a search engine may think you have duplicate pages and may penalize you. So careful use of the robots.txt file should definitely be on our list of how to make your website more search engine friendly. So how do you use the robots.txt file? You open notepad and type in the following lines:
User-Agent: Slurp
Disallow: whatsisname.html
Disallow: page_optimized_for_google.html
Disallow: credit_card_list.html
Disallow: whatnot.html
Save it as robots.txt and upload it into your root directory. You can disallow as many pages for each crawler robot as you want, but to disallow certain pages for another crawler, you start a new line of code.
User-Agent: Slurp
Disallow: whatsisname.html
Disallow: page_optimized_for_google.html
Disallow: credit_card_list.html
Disallow: whatnot.html
User-Agent: Googlebot
Disallow: page_optimized_for_yahoo.html
Disallow: credit_card_list.html
Disallow: whatnot.html
If you want to disallow all crawlers, you replace the name of the user agent with the wildcard command (*)
Robots.txt is useful for not getting banned on search engines and can also be used to pinpoint crawlers when they come calling. Only crawlers request Robots.txt, and these requests show up on the server logs.Another way of shutting out SLURP is by using the noindex meta-tag. Yahoo SLURP obeys this command in the document's head, and the code inserted in between the head tags of your document is
<META NAME=”robots” CONTENT=”noindex”>
This snippet will ensure that that Yahoo SLURP does not index the document in the search engine database. Another useful command is the nofollow meta-tag. The code inserted is
<META NAME=”robots” CONTENT=”nofollow”>
This snippet ensures that the links on the page are not followed.
Dynamic Page Indexing
This is the real charm of SLURP. Most search engine crawlers don’t bother crawling and indexing dynamic pages (.php, .asp, .jsp) since their content is subject to rapid change, which makes the process of indexing useless. Yahoo SLURP, however, does daily crawls in order to refresh the content on their indexed dynamic pages. It also does bi-weekly crawls which enables the search engine to discover new content and add it to its website incrementally. This enables a complex site's URLs, generated by forms and content management software, to be indexed.
This frequent crawls show up in your server logs as frequent download requests, as the crawler moves, stops, and restarts. Yahoo says that these frequent download requests should not be a cause for alarm.
SLURP's ability to index dynamic pages and to constantly refresh its content is a great relief to web designers (like me) who like having dynamic pages to enable fast loading and rapid updating. Websites which were not search engine friendly are suddenly in contention to be ranked number one.
However, the down side to this is that SLURP may never deliberately crawl your dynamic pages, unless you trigger the crawler via techniques which Yahoo encourages (to the benefit of their bottom line).
Getting Framed
Yahoo SLURP also has the ability to support frames, although it will not follow the SRC tag links to stand alone framesets; it only follows the HREF tags (as all good crawlers do).After having said all this about Yahoo SLURP, there is now the little issue of getting your site crawled by this particular search engine spider. There are some ways to go about this task, and here we begin to see the inklings of what would be the order of the day in a search engine market dominated by Yahoo! (who seems to be very, very concerned about its bottom line).
Linking
The first strategy is good old linking; just get a link on a site on which Yahoo! regularly crawls, and voila. You have SLURP knocking on your door. This can be done by corresponding with a site which ranks well on Yahoo, or by submitting your web site to directories which SLURP regularly crawls (you can find these by searching for “directories” on Yahoo). If SLURP deep crawls (crawls lots of pages instead of just one or two) your site regularly, you have a high chance of getting a good ranking on the key word or topic for which you have optimized your site.
Yahoo Companion Toolbar
This is supposed to trigger the SLURP robot to crawl your site. And it also enables searchers to search within your site, offering value for your audience and attracting Yahoo SLURP as well.
Sitematch
This is done by paying Yahoo's fees and submitting your site. This guarantees you will be added to the index (at a price) but is no guarantee of your website's ranking in the SERPs.
This is a scary service, and some reviewers speculate that it is a foretaste of what site owners would face in a market dominated by Yahoo. It is carried over from Overture (which Yahoo purchased) and involves an annual fee for submitted pages. The URLs are submitted into Yahoo’s index and are then crawled by SLURP every 48 hours.
However, apart from the one off fee, there is a cost per click fee charged for each lead driven to your site (so you better have deep pockets)
Apart from SLURP visiting every two days, you also get listed on searches done on about.com, Excite, Overture and other Yahoo partners. However there is no guarantee of a high ranking, and frankly I do not like this method (because I absolutely love free stuff).
There is a way to submit your site for free, however Yahoo does not guarantee that websites submitted through such means will ever be crawled by SLURP.
By now you should know enough about SLURP to spot it, track it, attract it, and prevent it from crawling specific pages of your site.
Anilrgowda is offline  
Digg this Post!Add Post to del.icio.usBookmark Post in TechnoratiFurl this Post!Spurl this Post!Reddit!
Reply With Quote
   


   
Post New Thread Reply


Currently Active Users Viewing This Thread: 1 (0 members and 1 guests)
 
Thread Tools
Display Modes

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off
Trackbacks are On
Pingbacks are On
Refbacks are On
Forum Jump


All times are GMT -8. The time now is 06:51 AM.

Powered by vBulletin® Version 3.7.2
Copyright ©2000 - 2008, Jelsoft Enterprises Ltd.
Search Engine Friendly URLs by vBSEO 3.2.0

DMCA Policy

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227