![]() |
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]()
![]() |

|
| Search Engine Optimization search engine optimization discussion. |
![]() |
|
You, Some SEO and a Spider
|
LinkBack | Thread Tools | Display Modes |
|
|
#1 (permalink) |
|
Administrator
Posts: 18,715
Join Date: Jan 2006
Rep Power: 10
IM:
|
What do you imagine when you think of successful seduction? Right now I’m thinking of thousands of tiny spiders crawling over my computer screen. No, I’m not mentally ill – I’m talking about making your website seductive; or rather, attractive to webspiders and net-bots. What are webspiders, what are net-bots? Web-spiders, ants and crawlers are just some of the names for the automatic scripts that browse the Internet in a methodological fashion. They harvest data for different kinds of processing. They can be used internally - a website may employ a net-bot to check for broken links, or they can be used by search engines to index new and updated websites. For some examples of these webcrawlers please have a browse through Wikipedia’s selection; http://en.wikipedia.org/wiki/Web_cra...f_web_crawlers Why would I seduce a spider? Never thought I’d write that. Crawlers are good for your website because they let the search engines find you. Without them your website would be very difficult to find. The benefits of webcrawlers:
Spiders like Googlebot (please see How Google Crawls my site for more details) want to index your website and they will find you if you have:
However, you do not want a crawler to index all the information in your website. It would be a waste of time having your /image directory listed on Google, for example, so you must disallow the crawlers from accessing this content. You may also want to protect your e-mail addresses from malignant crawlers (Please see ‘Are all crawlers safe?’ below). To do this you should create a Robot.txt file. A robot.txt file is a simple, but potent, document that every website should keep in its root directory. This file is your ‘fart in the lift’; it is small, but very powerful in effect. With it you may stop a crawler harA Mini robot.txt tutorial: 1. Start a notepad document and name it robot.txt Learn How VoIP is Dramatically Cutting Telecom Costs for Small Businesses With VoIP... Businesses Beware - New Battlefront in Email and Web FierceIPTV Research Report: Magic Quadrant for E-Mail Security Boundary, 2006 Digital Transactions Hundreds more titles... 2. Address the webcrawlers like this: User-agent: * The ‘user-agent’ denotes that you are addressing a webcrawler. If you place an asterisk in the way that I have done here you will address every webcrawler that happens upon your website. If you wish to address individual crawlers you should list them by name like this: User-agent: Googlebot But you must list the disallowed pages/directories for each crawler individually. For example: User-agent: * Disallow: /user-list/email/ Disallow: /products/images/ Disallow: /articles/contributors/ All files and folders listed in these directories will be blocked and will not be indexed. Bear in mind that you should list the directories as relative to the position of the robot.txt file, or the robot.txt will not be referring to the correct information. The robot.txt cannot refer to material in directories above it, for example; http://www.yoururl.co.uk/index/robot.txt The robot.txt cannot refer to anything that is higher than ‘index/’ directory, in other words –it will not refer to material above itself. 3. You may also want to disallow certain files, you can do so like this: Disallow: /articles/jubjub.html Disallow: /index/error_page.html Are all crawlers safe? No, some can and will bite you. There are many webcrawlers and they may visit your website for reasons other than indexing. You should attempt to protect certain information by disallowing the crawlers as I have shown you in the tutorial above. Malignant Crawlers They can be (much to my upset) used for Spamming. Malignant crawlers look through your website with a view to capture all the e-mail addresses and other useful data displayed there. If they do this you can expect an inbox full of Spam. I discovered 20 e-mails from a Japanese Adult dating website in my Herds of Words inbox today. I was not a happy bunny. However, you can avoid this (I was just that little bit too late) if you encode the addresses differently making it harder for these evil Bots to trap you. If you are using Cascading Style Sheets (.css): 1. Create an html-tag to fit around the text you want to use as an e-mail address. 2. In the css file you must define that tag, so: postmaster:after{ content: "postmaster40herdsofwords.co.uk";} If that doesn’t help you, or you don’t use cascading style sheets, please have a look through this useful article by Daniel Cody, http://evolt.org/article/Using_Apach...8/15126vesting certain pages or even entire directories by using the command - Disallow: |
|
|
|
|
|
|
![]() |
| Currently Active Users Viewing This Thread: 1 (0 members and 1 guests) | |
| Thread Tools | |
| Display Modes | |
|
|