Error » Search Engines » Google » google

Google THE search engine. This is for discussion about Google

Post New Thread Reply
  google
LinkBack Thread Tools Display Modes
Old 10-Oct-2007, 04:21 AM   #1 (permalink)
Fix my Error!
 
Bobby's Avatar

Posts: 22
Join Date: Aug 2007
Rep Power: 0 Bobby is on a distinguished road

IM:
Smile google

I originally discussed this at HighRankings and am in the process of refining my thoughts on the subject. Matt Cutts has mentioned "reputation" a couple of times on his blog. A lot of people speak about "reputation", and it has accumulated many contexts within the search engine and search optimization industries.

I'll start out with a few definitions.

Web site (aka website) - A collection of Web pages that form a collective whole. A Web site may cover one topic or many topics but it is structurally unified in the creator's view and intent.

Host - Any domain (domain.name) or sub-domain (sub.domain.name). This expression is used in the technical patents and papers published by search engineers in the academic and professional communities. I don't believe it should be equated with "Web site". A host may contain many distinct, separately owned Web sites within its content (e.g., Geocities.com, Stormpages.com, et. al.).

Link Popularity - The measure of a page's importance or value to the Web community as determined by the raw number of links pointing to it. Variations on link popularity have been proposed, such as qualifying links before counting them, disqualifying links, and normalizing links that point to secondary pages by counting them as if they point to the main pages of sites.

Click Popularity - The measure of a page's importance or value to the surfing communing as determined by the raw number of clicks on the page's URL in a directory or search engine. Variations have been proposed such as qualifying clicks by time spent on target sites, whether users click on the BACK button, etc.

PageRank - Larry Page and Sergey Brin's controversial method for measuring the importance or value of a site to both the Web and surfing communities as determined by the number and value of links pointing to a document. The PageRank is a probabilistic measurement of the chances that a surfer will land on a given page by randomly clicking on lnks. The combined sum of all PageRanks cannot exceed 1 (probabilities are measured as values between 0 and 1).

PageRank is arbitrarily assigned evenly to all indexed documents (individual Web pages, not sites or hosts). A series of iterative processes then follows in which PageRank values are adjusted on the basis of the value of links pointing to the documents. A link's value is equal to the current PageRank of its mother document divided by the number of normalized links contained in the document as adjusted by an arbitrary damping factor. It is assumed that normalization includes discarding duplicate links so that each document is treated as pointing to any other document only once.

Links pointing to unindexed documents are discarded until the last iteration, when the unindexed documents are assigned their PageRanks.

Documents with no outbound links are treated as if they link to every other document in the collection.

In the Link Popularity model all links share a single value. In the PageRank model, each link's value is dependent upon its document's PageRank value and the number of outbound links on the document.

Mike Grehan reports that engineers at Ask and Yahoo! believe Google has not yet (fully) implemented PageRank. Google does not actually rank search results by PageRank, except in its directory. But they do claim that PageRank is one of more than 100 factors used to rank search results. Several technical papers published by academic and professional search engineers have proposed methods for reducing the amount of time and resources required to calculate PageRank for the Web.

Link Popularity, Click Popularity, and PageRank are all vulnerable to manipulation. PageRank researchers have proposed a variety of methods for refining the PageRank calculation process to account for (and filter out) manipulative links.

Like other people who have followed these issues through the years, based on my own study of the various technical papers and patents, my feeling is that Google, Yahoo!, Ask, and MSN probably maintain core setx of Good or Trusted Sites from which value (like PageRank) is conferred out to other sites. Based on comments made by Matt Cutts and other people, my feeling is that these search engines also maintain core sets of Flagged or Suspicious Sites from which outgoing value is reduced or blocked.

Google seems to be taking punitive action against Web sites in one of three ways:

Delisting - This is the most radical action. A document is completely removed from the searchable index and won't even come up for its own title tag or URL.

Penalization - A document appears in the index but won't rank well for any search expressions except the most obscure. You usually have to find the document by URL.

Devaluation - A document appears in the index and may even rank well on the basis of its own factors. But none of its outbound links confer PageRank or reputation. The links may still be crawled, but Matt has not indicated whether this is so.

Google also affects page performance indirectly as a consequence of taking punitive actions against other sites. That is, innocent documents may experience:

Scope Reduction - Having done nothing wrong, a document suddenly loses position or all ranking for one or more (but not all) of its targeted search expressions.

Rank Loss - Having done nothing wrong, a document suddenly loses significant position within the search results.

Rank Depression - Having done nothing wrong, a document suddenly loses all position within the search results. It suffers as if it has been Penalized for no apparent reason.

Scope Reduction, Rank Loss, and Rank Depression are believed to be due to the loss of the value of inbound links from other documents. Very specifically, the July 2005 and October 2005 Google updates either delisted or devalued many documents matching the footprint (form and structure) of directory pages. Not all directory page-like documents were devalued or delisted, but many were. The most notable delistings were for SpamAd pages created to rank solely for the purpose of presenting third-party ads to visitors.

Every document appears to have an inherent Reputation within Google's internal database. This Reputation may consist of a single valuation or it may be a function of several separate, distinct valuations. Document Reputation appears to reflect something like PageRank (Importance), Trust (Conferring value), and Status (Good, Unknown, or Bad).
_________________
Cheap downtown hotels austin apartment finder
Bobby is offline  
Digg this Post!Add Post to del.icio.usBookmark Post in TechnoratiFurl this Post!Spurl this Post!Reddit!
Reply With Quote
   


   
Post New Thread Reply


Currently Active Users Viewing This Thread: 1 (0 members and 1 guests)
 
Thread Tools
Display Modes

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off
Trackbacks are On
Pingbacks are On
Refbacks are On
Forum Jump


All times are GMT -8. The time now is 10:45 AM.

Powered by vBulletin® Version 3.7.2
Copyright ©2000 - 2008, Jelsoft Enterprises Ltd.
Search Engine Friendly URLs by vBSEO 3.2.0

DMCA Policy

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227