My blog has moved!

You will be automatically redirected to the new address. If that does not occur, visit
http://www.kdmcgregor.wordpress.com
and update your bookmarks.

Friday, December 5, 2008

Searching the web - Part II & III

In my continuing series on searching the web I will look at the ARC (Automatic resource compilation) algorithm and the SALSA Algorithm

ARC Algorithm
It is actually an extension of the HITS algorithm, and uses the notion of hubs and authorities. This algorithm also uses a term based search engine to create a root set. The only difference with this algorithm is that it performs textual analysis of the web pages, and assigns a weight on the hub and the authority scores based on the textual analysis.

SALSA algorithm
The stochastic approach for link structure analysis algorithm is an extension of the HITS algorithm. This algorithm also uses the concepts of hub and authority pages;however this algorithm uses the theory of Markov chains to perform two random walks on the web graph. One walk is conducted on the authority side of a web graph (authority chain) and the other walk is conducted on the hub side of the web graph (hub chain).The algorithm creates a matrix that consists of the links between pages. This link matrix is applied to the hub and authority matrices in an iterative manner. What is produced are eigenvectors of the hub and authority matrices. The web pages with the highest eigenvectors are the highest ranked.

I have not found any practical applications that use these algorithms. As soon as I find thm I will post the links.

No comments: