Deprecated: mysql_connect(): The mysql extension is deprecated and will be removed in the future: use mysqli or PDO instead in /home/nebupook/public_html/include.database.php on line 2
NebuPookins.net - NP-Complete - An introduction to Google's PageRank
 

Deprecated: Function ereg_replace() is deprecated in /home/nebupook/public_html/include.parse.php on line 32

Deprecated: Function ereg_replace() is deprecated in /home/nebupook/public_html/include.parse.php on line 33
An introduction to Google's PageRank
[Computer]

When you use a search engine, you typically type in a few keywords describing what it is you are looking for, and you expect the search engine to return documents (typically from the Internet) that contain the information you wanted. Now it's very easy for a computer to check if a particular document contains the keywords you specified, but it's rather difficult to determine, amongst all the documents that contain your keywords, which one is the most relevant one.

I won't get into details of all the strategies involved, except for Google's, because it was the first to use its innovative "PageRanktm" system which made it so popular.

Google describes PageRank as "[relying] on the uniquely democratic nature of the web by using its vast link structure as an indicator of an individual page's value. In essence, Google interprets a link from page A to page B as a vote, by page A, for page B. But, Google looks at more than the sheer volume of votes, or links a page receives; it also analyzes the page that casts the vote. Votes cast by pages that are themselves 'important' weigh more heavily and help to make other pages 'important.'"

How does Google actually perform these calculations? Well, it starts by downloading every page on the Internet. Seriously. Then it creates a directed graph in which each page is a node, and each link is a edge coming from the page that contains it, and going to the page that it links to. Then it assigns every node a default score. Let's just say for the sake of simplicity, that the default score for every page is 1, so "score of the entire Internet" is the number of pages on the Internet.

Then, for every node, that page keeps 15% of it's current score, and distributes 85% of it's score equally to all the nodes it lie on its outgoing edges. This procedure is repeated until the numbers settle down (might take a day or two), and in the end, the total score of the Internet should not change, but nodes with lots of inbound links should have a higher score, while nodes with lots fewer inbounds links should have a lower score. This is how the voting process is done.

This is just a guess, but the page http://www.microsoft.com/ probably has a lot of other pages linking to it. Maybe a billion pages point that to page. We know there's at least one, at least (this page). However, probably not so many pages point to this page. Maybe two or three (they'll be a link to this page from the np.net archive, maybe one from the np.net frontpage temporarily, and maybe some blogger somewhere will find this page interesting enough to link to as well). As a result, even though both this page, and Microsoft's page contain the keyword "Microsoft", if you search for that keyword in google, the result to http://www.microsoft.com/ is probably going to be ranked higher than the result pointing to this page, just 'cause http://www.microsoft.com/ has a higher Page Rank than this page.

That is why, BTW, I humbly ask you all to link to my site whereever possible, so that I may increase my PageRank. The URL to use is http://www.nebupookins.net/. Thanks.

 
Deprecated: Function ereg_replace() is deprecated in /home/nebupook/public_html/include.parse.php on line 60

Deprecated: Function ereg_replace() is deprecated in /home/nebupook/public_html/include.parse.php on line 61
E-mail this story to a friend.

You must be logged in to post comments.