![]() |
![]() |
The algorithm considers a small part of the hyperlink-induced Web graph in the order of a few hundred or a few thousand documents. The node set of the graph is constructed in two steps:
Finally, the edge set for the entire set of nodes is constructed by fetching all documents and analyzing their hyperlinks. The actual link analysis computes an authority score x and a hub score y for each document in the node set S, based on the following mutual reinforcement relationship:
The intuition behind these mutually recursive definitions is that a good hub is a page that points to many good authorities and a good authority is a page that is pointed to by many good hubs. The recursion in the above formulas is resolved by computing the x and y vectors in an iterative manner: we start with assigning initial hub and authority scores uniformly, and then compute x(i) and y(i) from the previous iteration's values x(i-1) and y(i-1); after each iteration the two vectors are normalized such that they have length (i.e., L2 norm) one.
Since we are not really interested in computing the absolute values of authority and hub scores of all documents in S but merely want to identify the top N authorities and hubs (with N in the order of 100), we interpret the notion of convergence in a somewhat loose manner and simply stop after a fixed number of iterations (the current setup of BINGO! uses 30 iterations).
![]() |
![]() |
![]() |