Emerging Face of
Information Search: The Complete Report
August 04, 2004
1. Understanding Users’ Intention.
Online search has been gaining in prominence
ever since Google disclosed their intention of taking the IPO route
about six months ago. It is being viewed in a new light - as a technology,
as well as a marketing method. While Google IPO, search & advertising
markets or web-site promotion through Goggle’s newly spawned
cottage industry, “Search Engine Optimizers”, have grabbed
the center-stage, there are many other important issues and questions
waiting in the wings that need to be addressed. This article series
focuses on a number of aspects related to search and tries to evaluate
“the
emerging face of information search” today.
Initial focus on two key issues:
- How well does a search engine understand the
users’ intention?
- What are the challenges and questions
that arise while interpreting the users’ intent?
1.1 User Intention and Search
The majority of search engine users seldom
have an idea of how searches work. The technology itself, which
is usually shrouded in secrecy, provides very little assistance
to the user. Of course, here we are not talking about “advanced
search” instructions given by search engines like Google.
Usually the end user is an easily satisfied person. He/she is in
awe of certain technology, has little understanding of what goes
on behind searches and is also unaware of all that one can do with
the concept of search. While on the one hand this ignorance creates
a user whose needs are simple and easily met, on the other hand
we also have a very confused user. The ignorance combined with an
overload of information and choices that some of the major search
engines are now unleashing on the unsuspecting user often leaves
him overwhelmed and baffled.
So, although the primary function of a search engine is information
retrieval, it also needs to understand what users want when they
key in a word or phrase into the search box. The search equation
is based on understanding the users’ intention and matching
that against information that is available.
1.2 User Intention Vis-à-vis Recent Search Engine
User / Usability Surveys
Recently, two surveys were widely reported in
search media, both done from a search engine marketing perspective
: Inside the Mind of the Searcher done by Enquiro
and Search
Engine User Attitudes Survey Results (April-May 2004) done by
iProspect. Although these surveys focus mainly on how paid listings
hold their own against organic listings in order to gauge the popularity
of paid search ads on major search engines, they also offer some
insights into how search engines try to understand users’
intentions.
Our purpose here being to understand the interplay between search
engines and users’ intention, we would like to maintain a
distinction between usability analysis and analysis of users’
intent. It is also important to note here that merely understanding
the keywords inputted by the user will not give a complete understanding
of the user. The way users interact with the search engines also
forms an integral part of understanding the search user as a whole.
1.3 Simple vs. Complex Queries
Simple vs complex queries seem to be an old
problem with search engines - a problem mostly treated with an “
advanced search” page of a search engine. The OneStat
report that came out in February this year and which set off
discussions in the media, said that users have started using two
or more terms while searching.
As the number of search users increase and search engines become
the default gateway to reach the web, it is obvious that searchers
are not going to restrict themselves to simple queries.
Here is a short comparison between simple vs. complex queries
| Simple Queries: |
Usually contain single
keyword/term; contains no qualifying word; tends to be more
generic in nature e.g., "birds" or "resorts" |
| Complex Queries: |
Usually contain two or
more keywords/terms; contain qualifying word/s; tend to be more
specific in nature e.g., "low carb diet benefits" |
1.4 Complex Queries and Search Engines
Complex user queries are posing a big challenge
to search engines today who have difficulty in understanding and
making an intelligent sense of these queries. In this context, there
have also been some interesting comments by Udi Manbar, founder
of A9. According to a report on his keynote address at the recent
www2004 conference, he sees the users depending on one word searches
as a huge barrier to advancement of search technologies. It is quite
obvious that balancing simplicity and advancement is a tight-rope
walk.
It is interesting to note here that the Enquiro report mentioned
earlier contained a very relevant piece of information; the search
terms becomes more specific as the user is closing in on the purchase
of an item. The more specific the item or thing or object the user
is searching for, the more complex the queries become.
This brings us to the key issues faced by search engines today:
understanding complex queries is pretty hard for a completely automated
system. Even though companies like Google are putting huge might
behind the “brute force” of computing to understand
the users’ intent, it still relies ( within the framework
of PageRank, of course) on either finding exact matches or breaking
the keywords occurrence to understand exactly what the user is saying.
The problem is, since most search engines are just trying to match
keywords in the documents, the documents themselves should contain
the exact term the users are searching for; otherwise the search
engines will quickly breakdown the term in its constituent words
and then do a search. We think that just brute force NLP is not
going to help in understanding the intent of the users.
Two aspects of usernet
- Developing search engine technology
that is trying to make sense of users words and phrases
- Making users aware of their interactions
with search engines.
It will be interesting to watch how
search engines ramp up their technology to understand their users
better and how searchers equip themselves to cope with the vagaries
of the search engines. It is here that the question regarding the
relevance of ranking results will be highlighted.
2. Relevance Ranking of Results
One of the major concerns of search engines
today, besides understanding the users’ intent, is that of
ranking information. A search engine application on the web or inside
an enterprise needs to match information with the search query.
The sheer volume of information that is available makes it pertinent
that search results are ranked against the user’s query. We
will now examine the world of information ranking mainly by analyzing
information ranking methods deployed by most popular search engines.
2.1 Information Relevance Ranking
We have seen so far that search engines only
deploy simple ways to understand the user queries yet the user comes
back feeling great because her expectations are partly limited by
what is being offered to her OR her knowledge about how search works.
Also, the sheer volume of information on the Internet and inside
enterprise networks is so huge that you are bound to get some results
back.
On their part, search engines like Google and Teoma (Ask Jeeves)
have made significant improvements in information retrieval technologies
and processes, attempting to make search a more useful and meaningful
experience for the user. But as we will see later in this article,
it seems that they are just scratching at the surface of what’s
possible.
Here are some of the methods/technologies used by online search
engines to rank information against a query:
Google: Uses more than 100 methods
of ranking information including their trademark PageRank system.
This famous and sometimes infamous system tries to be democratic
by assessing the popularity of a webpage based on how many other
webpages link to it.
In terms of information ranking PageRank - to
quote from one of our earlier articles on K-Praxis: “Contextualized
Tabbed OR Categorized Indexes and the Future of Search”,
- system (to a great extent) assumes that the more linked a web
page is, the greater is its value. And whatever algorithm Google
uses to normalize this effect - to bring in other aspects such as
keywords, relatedness of the content and so forth - because the
basic system is PageRank, the results that are produced by Google
tilt towards a theory where the more “networked” you
are the more popular and trustworthy you are.
Teoma/AskJeeves: Teoma uses
among other well know techniques, technology based on the Subject-Specific
Popularity method to rank web pages. In this method a document is
ranked higher because of its affinity to well-recognized expert
documents on related subject/topic. Despite attempts by Google to
compete with Teoma with its Hilltop algorithm, Teoma still works
wonders for many search terms. May be the search engine optimization
(SEO) community has not attacked Teoma/AskJeeves as yet as Google
the brand, is so powerful for them; or may be it is almost impossible
for the SEO community to spam these results with their techniques.
Vivisimo: Vivisimo re-groups, re-organizes and
ranks results based on its clustering technology that allow on the
fly clustering results from other engines such as MSN, Lycos, Looksmart,
Wisenut, Open Directory and Overture. This is an interesting way
to rank information allowing users to discover themes and concepts
in the pages they are looking for.
Yahoo/MSN: use Inktomi/Yahoo crawling and search
technology that organize results based on various factors such as
link and domain popularity and keyword analysis. Although not much
information is available, Yahoo ranking algorithms possibly use
technologies put together from Inktomi, AllTheWeb and Altavista.
Besides ranking algorithms enumerated above, search engines also
use the following ways of information ranking
1. Text in the Title
2. Key Word Frequency and Density
3. Key Word Positioning
4. Information in the metatags
5. Content Analysis
2.2 Information Relevance Ranking and Enterprise Search
Enterprise search engines face slightly different
problems and hence have to follow different strategies. Enterprise
level documents are mostly longer and denser than web content and
do not have the luxury of using any form of link popularity; but
they are not as unorganized and unstructured as web pages. Major
enterprise search vendors (aka Unstructured Data Management players)
like Autonomy, Verity, Inxight, Google Search Appliance, Fast Search
& Transfer, etc., use various methodologies to rank document
including information clustering, classification and categorization
to rank search results.
2.3 Information Relevance Ranking: Issues
Despite all the efforts done by both web search
engines and enterprise search companies there is still a lot more
work that needs to be done before search technologies perfect the
art of ordering information against the search terms. As for online
search engines this task is much tougher because not only are they
engaged in providing quality organic/original results, but they
are also engaged in commercial activities and it is/will be difficult
for them to make this distinction keeping the relevance of organic/original
results intact. Besides, they also face huge problems from spammers
and search engine optimizers who are ready to do anything to get
better ranking in the search results.
As the commercial buzz around search reaches its crescendo, the
relevance of ranking could become the major point in this battle
as commercial aspect of search results - especially for the web
search - is directly linked to information ranking.
Another important aspect of search is the interface for the search
and display of results.
3. Search Interfaces and Information Display
3.1 Online Search Interfaces and Information Display
Google has led the minimalist revolution in
search interfaces that has almost forced everybody to look at search
from a very simple and clean interface perspective (even MSN
Search or a far off Sensis
could not resist the temptation!), which helped the users in getting
clutter free search experience. So far so good; but what happens
when Google the king of simple interfaces needs to expand? Oddly
enough Google has cleaned up even whatever was left on the home
page. It replaced it search tabs with links, making almost “bare”
minimalist.
In the light of Goggle’s minimalist search
interface revolution, the question that needs some inquiry is how
search interfaces allow information to be displayed in a particular
manner and what that means for the users as they interact with this
information? Maybe restricting our attention to just Google will
be limiting the scope of this article.
Most of the search engines provide the following
components on their search interfaces:
1. List view: Almost all of
the search engines offer a list of search results, ranked and numbered
mostly on the basis of relevance or date. This is usually available
through advanced interfaces. Ranked list view of search result display
seems to be a dominant metaphor in search interfaces - a metaphor
that represents a top to bottom and hierarchical view of information.
This is something that is so ingrained in our
view of information. Since it is so familiar it makes it very easy
to navigate and use. One down side of this view is that only the
first few results are seen by the user and as we saw in the last
article, since the information ranking done by search engines is
still not very reliable, there is a huge possibility that what you
are looking for is lost in those thousands or hundreds of thousands
of results that you do not see when you search for something.
List view is not just limited to web, even in
enterprise search this metaphor seems to be very dominant.
2. Title, Text Snippet or Summary:
Now most search engines offer a text snippet with search terms high
lighted in the text snippet, a tradition not started but popularized
by Google, so searchers get a preview of the web page. Many enterprise
search engines use technologies that automatically generate summaries
of documents.
3. Other information about the URL:
Search engines offer other information like “cache”
or saved copy of the web page or the URL. Google offers information
like “similar pages”; others offer ability to view pages
in new windows or inline preview of the web pages (Vivisimo), so
that you don’t have to open a new window to see the page.
Besides this, search engines offer update time (Google), RSS feed
if available (Yahoo), File types, ability etc. In the enterprise
search environment taxonomies and date wise selection is very common.
3.2 Visual Search and Search Engine Interfaces
It is important to understand the juxtaposition
of ideas of Visual Search, Information Visualization and search
interfaces. K-Praxis had looked at Visual Search at length in earlier
articles — Visual
Search in the Context of Information Visualization and Grokker:
Visual Search and Information Visualization that defined visual
search as follows:
Broadly speaking, information visualization
is a graphical presentation for manipulating information extracted
from a larger document corpus or an information database. This ability
to represent information in a graphical user interface enables users
to understand and grasp the information faster, recognize and discover
meaningful trends, patterns and important information clusters.
This provides the user with more actionable information, adding
to his/her decision-making capacity. So information visualization
in a way shifts the focus of
information retrieval to information processing
from the lexical to the spatial and visual sphere.
Visual search - used either for web information retrieval or for
non-internet information retrieval, is then the ability to browse
search results by using 2D or 3D color graphics and animation. These
search results can reveal the structure of information giving it
a spatial dimension allowing users to navigate and interact with
it in a completely different way than text-based results.
Interestingly some of the examples given in
the articles Visual
Search in the Context of Information Visualization and Grokker:
Visual Search and Information Visualization do try to present
a different method of providing search results in a visual format
(KarToo, Anacubis, WebBrain Google Browser, Browse3D, Google Viewer,
MapStan and Grokker)
3.3 Attempts at Alternative Search Interfaces
There have been several attempts done by a growing
community of designers and usability experts known as “information
architects” and by the search engines themselves. Google Viewer
mentioned above is a good example that allowed to view Google results
as a slide show or Vivisimo that clustered results for better organization.
In case of Vivisimo, it is only regrouping results from other search
engines - no doubt a valuable service - but managing both information
retrieval and clustering could be a difficult proposition. See how
Find.com
is struggling with or attempting to have a go at this idea in its
attempt to offer different view of results.
Ask Jeeves has also recently introduced a new
preview tool in the form of a binoculars icon next to the result
link. Bringing your cursor over it gives you a preview of the page.
But apart from these experimentations, the list view seems to be
dominant across the Internet.
In the enterprise search arena companies like
Inxight (StarTree) do provide some ways of information visualization
but even there the list view seems to be the dominant way of displaying
search results.
3.4 Future of Interfaces
The Wired Magazine recently as part of its coverage
of Google Mania asked various artists to redraw Google
Interface; but the best interface that came out of this experiment
- done by Joshua Davis seemed like it has nothing to do with how
a user will interact with a search engine but more like an information
designer’s or information architect’s fantasy of what
Google might look like.
New ways of thinking about how information from
search results to be presented to the users are required and at
least this point Google seems to be ruling the roost as far as interfaces
are concerned. Interestingly however, shopping search on the Internet
seems to have gone for more a categorized view of information since
it is dealing with individual shopping items rather documents that
can be very multi-thematic.
4. Paid Listings
Vs Organic Listings
As search engines strive to keep up with advertising
demand and become the de facto information intermediaries between
advertisers and buyers, the debate over paid vs. organic listings
is going to haunt the search players - more so when the issue of
trust
between users of search engines and search players becomes an overriding
factor for success or failure for the search industry.
4.1 Paid Inclusion Controversy
Search media has been reporting about various
pros and cons of paid inclusion - a system used by many search engine
to accept payment for preferred listing in their indexes - singling
out the player like Yahoo, Ask Jeeves and MSN for their paid inclusion
programs. Now it seems that both Ask and MSN have reportedly dropped
their paid inclusion programs, but Yahoo, it seems has still not
come clean on the issue of paid
submission and continues to accept payment for paid listings
through its
Site Match and Site Match Xchange programs.
Interestingly, many of the search media advocates
and big wigs seem to have taken a clear stand against paid listing
almost unanimously. Many cite that the FCC
guidelines of 2002 are not enough and Yahoo may finally have
to cave in to this demand - however, as yet there seems to be no
sign of Yahoo relenting.
4.2 Paid Inclusion vs. Algorithmic Search Results
Argument for maintaining the purity of search
results: The argument is fairly simple. Search engines are not
just commercial entities, but because they are crawling and maintaining
a database of information that is publicly available and cater to
the general need for information they have the responsibility to
maintain the integrity of search results. Remember that even though
search algorithms like PageRank are written by humans, we largely
trust that a company like Google maintains the integrity of the
PageRank algorithm search results and does not tamper with it unless
and until it is expressed explicitly - good example is the Ethics
Committee at Google. The point here is that if a search engine
maintains purity of results, users are more likely to trust the
search engine because they are confident that the information they
are seeking and getting is free from any editorial control by the
search intermediary.
Demarcating Paid Inclusion: But how does
a search engine demarcate search results? The only guidelines made
available by FCC suggest that search players should say that an
ad is an ad, yet retain the control over how they want to represent
search results. Search media (including Search
Engine Watch ) have many times argued that most of the players
are trying to indicate and clearly demarcate paid listings allowing
the users to choose from organic or paid listings. Users also, many
times find listing useful and if they are looking for a specific
type of product information they find paid listings an easy way
to satisfy that information need.
But are organic listings really organic?:
May be this is one of the most relevant yet very difficult questions
to ask. Given the attempt at trying to fudge search engine results
and attempt to create search spam, it is possible that webmasters
could try to cheat on search engines by using various methods that
are rampant in search engine optimization. Can the user be sure
that what he/she is seeing is organic in real sense where people
create information in “good faith”? Recent search engine
optimization competition is a very interesting example of this phenomenon.
Two months back when you searched for “Nigritude
Ultramarine” you did not get a single result and today
a similar query throws up 369,000 results. Even though in real sense
the result of this competition were a near triumph for the blogging
community, this competition throws open a number of questions about
the organic vs. non-organic listings.
4.3 Organic Vs Paid Listing, Shopping
Sites and Internet Yellow Pages (IYPs)
It is interesting to understand the logic of
organic vs. paid listing from the point of view of IYPs and shopping
sites. In the case of IYPs, each of their listings is a paid listing
if data is coming from print yellow pages; this makes it very difficult
for the users to understand the demarcation between the paid and
organic listings. There is similar debate going on in another segment.
Shopping sites many times hide the fact whether they are showing
organic results or paid results. Most of their advertisers seem
naturally to want better search engine positioning, and many times
these sites over-rule what is retrieved from the databases organically.
5. Crawling and Indexing
5.1 Information Crawling and Indexing: Introduction
So far this series has focused on how several
facets of search affect the user of the search results. In order
to give completeness to our understanding of information search,
we need to pay heed to another very important and crucial facet:
information crawling and indexing. The ability of the search engines
(online as well as enterprise) to ferret out information from all
the nooks and crannies of the Internet and the enterprise network
are the very nerves of a search engine. These nerves allow search
engines to gather and harvest information to be served up in the
search results. Against this backdrop and the race towards bigger
indexes started by online search engines, the future of our search
experience will depend how search engines could innovate in the
areas of information crawling and indexing. K-Praxis continues search
by formulating certain pertinent questions regarding the crawling
technologies and processes.
5.2 Online Information Indexes: Do Bigger Indexes Always
Mean Better Indexes?
Online information indexes have grown by leaps
and bounds and the search
media has many times delved deeper into the race between search
engines to grow their indexes. But nobody seems to be asking the
right question. Is this growth commiserating with the growth of
online information? An ongoing survey called “How
Much Information” at the University of Berkley, California
estimated that in 2003, the World Wide Web contained about 170 terabytes
of information on its surface. This according to them tantamount
to seventeen times the size of the Library of Congress print collections.
Although it is a very simple and straightforward
fact that indexes have grown and search engines have improved their
capability to crawl the web by using massive and effective use of
hardware and processing power (e.g. Google Linux clusters), but
it appears that search engines are still lagging behind the very
growth of online information. Online information is growing at a
much faster pace than the ability of search engines to crawl and
index it. So one could argue that the so-called “race”
is not between different search engines but between search engine
capabilities and the amount of information that is out there.
Here are a few factors that could challenge
the theory that assumes that bigger indexes are better indexes:
1. Many times it appears that even though top
pages of a site are crawled the inner pages are missing from the
search indexes, and many times search engines seem to not keeping
track of what is indexed and what is not. Many times indexes are
so volatile that pages keep appearing and disappearing.
2. Although now most of the search engines have
started indexing major file formats, penetration of search engines
into these formats is still limited.
3. The biggest issue among the ones that are
listed here, is the search spamming by search engine marketeers
and search engine optimizers (SEOs), the example of the recent
search engine competition is very pertinent here - increase
in pages from 0 - 500k in flat 2 months, just imagine how many pages
that are indexed by search engines could be similar spam from webmasters
trying to secure higher ranking position in the search results.
4. There are minor issues like duplicate pages,
pages from one site appearing many times over for a search query.
Search engines like Google seem to have had good success in tackling
this issue but the problem still remains.
So it seems that the quality of indexes in a
way has nothing to do with numbers that are being flashed around
by search engines. Online search engines will have to start looking
seriously at the quality of their indexes rather just bulging them
and boasting the numbers for marketing purposes.
5.3 Innovations in Information Crawling and Indexing
There are a number of innovations that are likely
to take place (or at least being talked about) in the near future
that will have an impact on how search engines crawl and index information.
One significant idea doing the rounds is the idea of focused crawling
and focused indexing of in other words subject specific crawling.
As pointed elsewhere on K-Praxis the biggest problem with focused
or subject-specific crawling could be that these systems will have
to depend on statistical, language-neutral technologies to make
them work and since these technologies have had quite an infamous
history of not-working rather than working, much more commercial
and real-world work is required in this field rather than just academic
research.
Another initiative being talked about is the
efforts going into the field of indexing the so-called “deep
web” and “invisible web”, an idea that has really
never taken off as this requires going behind public databases and
many times there is a question of information holding rights. Making
databases available and information extracted from them could be
impinging on the copyrights of that information for search engines.
Of course one should not overlook the efforts
being made at improving XML, RSS and Atom Feed standardization and
inter-polarity, these efforts could revolutionize as well as economize
the way information is indexed and crawled. XML and RSS feeds are
already making huge inroads into news and blogs crawling and aggregation.
In near future we could see crawling being tackled
by smaller players from a completely different angle that allows
for requisite size and quality than what we see with big search
engine players. Let us now see what the road ahead has to offer
us.
6. The Road Ahead
6.1 The Future of Search and Search Engine Users
Search for “the future of search”
(Google,
Yahoo,
Teoma)
on any of the major search engines and you get at least a dozen
perspectives on what is going to be the future of search technology.
May be it is too early to start putting on our thinking hats and
predicting a definitive future of information search. In keeping
with K-Praxis’ analytical method, we have examined all the
major components in this dossier and we are now looking ahead at
the possible road map that search engine innovations and improvement
could take - especially from the perspective of search engine users.
6.2 A March Towards Understanding User’s Intention
Possibly the most important element in the search
- where a small search input box on a web page tries to figure out
what the searchers want - is the ability of search engine technology
to understand users’ intent. What he/she means when a search
term is entered? This almost seems like the classic AI problem,
(Remember Turing’s
test?). Can computers understand human intentions? It is still
a long journey ahead before search engines will be able to understand
what users really want. Unless of course users are ready to part
with their personal data and search engines are able to use that
information - either from user’s PC or stored online - with
security and privacy guarantees provided to the users. And also
provided that users are ready to trust the search engines.
Another important thing about understanding
user’s intention is that search engines - while building the
techniques and algorithm to do so - will have to make sure they
balance out what is the “perceived” notion of what the
user wants and while attempt at going closer to the user’s
real intention. Up until now the technology world has been very
good at trying to build imaginary castles out what users would or
could want, but very few technologies have really tried to be “humble”
to the users and understand what they really want. Search engines
will have to put this perspective at the top of their future agenda.
It quite clear that some of the ideas that are
being tossed around (personalization and contextual search), do
indicate that search engines have started working on these issues.
Google, being the trailblazer of search engine industry, it seems
has several on-going projects that could be crucial for the road
map towards understanding user’s intentions.
Recently Director, Marketing at Vivisimo wrote
to us about their perspective on how clustering could help search
engines letting users do unstructured queries. The email interaction
with Saman Haqqi suggested that “Vivisimo’s main contention
is that with clustering, search engines do not need to make a presumption
of intent and users do not need to frame exact queries. Since results
are subdivided into the main ideas contained in them, the user can
easily identify the folder of interest and focus on it - thus engaging
in ‘selective ignorance’ - ignoring results based upon
knowledge rather than blindly”
But we also believe that it is one thing to
organize search after having retrieved the search results and another
when one has to integrate clustering at the retrieval stage itself.
Given the problems faced by statistical a-contextual natural language
processing technologies, clustering algorithms could really obfuscate
search results, unless there is some revolution in using these technologies,
current state of algorithms are just not sophisticated enough to
deal with diverse set of information found on web pages.
6.3 Relevance Ranking Of Search Results, and Paid Vs. Organic
Listings
It is important to note that all the facets
of search we have talked about are so interlinked that many times
it is difficult to segregate them. Understanding user’s intention
is closely connected with ranking the search results. Making the
right connection between these two facets is important for search
engines so that the user can make optimum sense of search results.
Since most of the search engines are engaged
in making money out of the search results it will be very important
for them maintain the distinction between algorithmic ranking and
commercial ranking of search results. The issue of trust discussed
earlier will largely depend on the ability of search engines to
clearly demarcate the experience of search and its commercialization.
The battle between paid search and organic search results is going
to be fought along these lines.
Again collaboration, personalization and conceptualization
seem to be the possible paths the search engines could take in order
to achieve better ranking of search results. The biggest hurdle
on the possible path to these innovations: search engines spammers,
adware and spyware programmers and the craze for obsessive search
engine optimization.
6.4 Possibilities in Search Interfaces and Information Display
Eventually search engines will have to find
ways to innovate further from the existing list view. This does
not mean the list view of search results is not useful, but it offers
very limited possibilities and search engines will have to lead
and nudge the users onto newer way of information usage.
On the other hand, all the attempts being made right now in alternative
information displays and information visualization space appear
to offer so cluttered a view of information that users might prefer
to stick with the list view.
6.5 Future of Information Crawling and Indexing
As suggested earlier, perhaps the future of
crawling and information indexing will largely be guided by the
possible adoption of XML, RSS or Atom standards by site publisher
and content management technologies. These formats could possibly
embed the ideas of indexing and crawling into the site itself. This
could mean that search engines don’t have to do all this information
indexing they do now to provide clean results. And may be this itself
could provide a competitive threat to search engines? This could
mean potentially anybody could build a search engine?
6.6 Future of Search: A Few Tangential Ideas
Having done some constrained analysis of possibilities
in this field, may be its time to throw some slightly tangential
ideas:
How about search engine capabilities embedded
in the hardware at the microchip level. Scalability of search engines
with respect to information availability is going to be an issue
so projects like this from Huazhong
University of Science and Technology, in China could really
mean a lot in this game. Or think about search embedded in storage
networks built by companies like EMC.
It seems that CCortex from Artificial
Development is based on “Autonomous Cognitive Model (“ACM”),
a realistic representation of the workflow of a functioning human
cortex. The ACM may have immediate applications for data mining,
network security, search engine technologies and natural language
processing”. May be some thing like this will bring a completely
new dimension to search?
So whichever path search takes the future of
search seems very exciting indeed!
|