Sam Palmisano, at the IBM’s Annual Stockholders Meeting, reported to have said and we quote:
"Here's something neat — our Almaden Research Lab in California has developed a unique on demand service called WebFountain, which sits squarely at the intersection of technology innovation and business transformation.
With privacy and security concerns of paramount importance, WebFountain crawls through computer networks to collect, analyze and store massive amounts of text, and then it discovers patterns, trends and relationships that otherwise would never have been detected. Every few weeks or so, we literally capture the entirety of the Internet on our servers so we can study it!
The business implications for this technology are endless — in optimizing marketing resources, in gaining whole new insights, in creating new kinds of applications and services."
WebFountain will mine the internet (chat rooms, news groups, messages boards, news, webpages, etc), extract useful and semantically meaningful and relevant information from these pages, and perform text analytics and heuristics which will literally change the landscape of business intelligence software.
Think hardware and processing speed (!) and IBM is going to use a 1000-node Intel Linux cluster and half a petabyte of storage for this project. So something to marvel at, both in terms of web-mining as well as using IBM's super computing power to do the same.
It appears that WebFountain has stemmed from the projects like Web Graph Structure and Grand Central Station at Almaden Research Lab in California.
WebFountain is something worth watching out for!