My blog has moved!

You will be automatically redirected to the new address. If that does not occur, visit
http://www.kdmcgregor.wordpress.com
and update your bookmarks.

Tuesday, November 25, 2008

Searching the web 1.5

As the semantic web forges ahead, there exists a semantic search engine called hakia. I have not tried it out as yet, so I guess a hakia one week challenge is necessary.

Wednesday, November 12, 2008

vmware 2.0

I finally upgraded to vmware 2.0 on my debian file server.I'm loving it. It uses a web interface.
One thing I like about it is that your mouse moves from the host computer to the virtual machine.In the older version of vmware, when your mouse was in the virtual machine you had to type a control sequence to go back to the host machine.
When you install vmware 2.0 and start it, you have to enter a user name and password. The default will be the administrative credentials for the host machine (root for Linux). You may need to install a plugin in the browser to start the vm.

I am going to need more memory.

Saturday, November 8, 2008

Searching the web Part I

This week I took a cuil one week challenge. I was curious to find out how this search engine stacks up against goolge's search engine. My aim was to use no other search engine apart from cuil. However, within three days of the challenge, I had to switch. Search results were poor. A search on say 'agile development' resulted in no wikipedia hits. There were no spell checker, and no add ons for firefox.

One reason for for abrupt end to my cuil one week challenge was that I was introduced to another search engine called clusty.Clusty not only returned better results when compared to cuil,but instead of delivering millions of search results in one long list, clusty grouped similar results together into clusters. Clusters help you see your search results by topic so you can zero in on exactly what you’re looking for or discover unexpected relationships between items. What is great you can search withing clusters.

Clusty used a clustering algorithm for its search engine. Everyone is familiar with Google's PageRank algorithm.Clustering involves the separation of , say, unrelated documents and group related documents together.Using the contents of a web pages and their link information, the content-link hypertext clustering algorithm groups similar web pages into more complete web pages that can be searched or combined into larger clusters. To generate clusters, the algorithm uses similarity functions based on the contents of the web pages and the hyperlink information. There are two similarity functions for this algorithm, a similarity function that examines the hyperlinks of the pages and a similarity function that examine the contents of the web pages.Combining the hyperlink and content similarity functions together in an iterative nature produces web pages that are similar, grouped in clusters.

Other web search algorithms of note are:
HITS -Hyper text induced topic selection
ARC - Automatic resource compilation
SALSA - Stochastic Approach for Link Structure Analysis

These I will discuss in my next blog post.