Kevin's Blog: November 2008

This week I took a cuil one week challenge. I was curious to find out how this search engine stacks up against goolge's search engine. My aim was to use no other search engine apart from cuil. However, within three days of the challenge, I had to switch. Search results were poor. A search on say 'agile development' resulted in no wikipedia hits. There were no spell checker, and no add ons for firefox.

One reason for for abrupt end to my cuil one week challenge was that I was introduced to another search engine called clusty.Clusty not only returned better results when compared to cuil,but instead of delivering millions of search results in one long list, clusty grouped similar results together into clusters. Clusters help you see your search results by topic so you can zero in on exactly what you’re looking for or discover unexpected relationships between items. What is great you can search withing clusters.

Clusty used a clustering algorithm for its search engine. Everyone is familiar with Google's PageRank algorithm.Clustering involves the separation of , say, unrelated documents and group related documents together.Using the contents of a web pages and their link information, the content-link hypertext clustering algorithm groups similar web pages into more complete web pages that can be searched or combined into larger clusters. To generate clusters, the algorithm uses similarity functions based on the contents of the web pages and the hyperlink information. There are two similarity functions for this algorithm, a similarity function that examines the hyperlinks of the pages and a similarity function that examine the contents of the web pages.Combining the hyperlink and content similarity functions together in an iterative nature produces web pages that are similar, grouped in clusters.

Other web search algorithms of note are:
HITS -Hyper text induced topic selection
ARC - Automatic resource compilation
SALSA - Stochastic Approach for Link Structure Analysis

These I will discuss in my next blog post.

Kevin's Blog

Tuesday, November 25, 2008

Searching the web 1.5

Wednesday, November 12, 2008

vmware 2.0

Saturday, November 8, 2008

Searching the web Part I

Twitter