Making sense of online textual information and information management technologies
   
 
Contextual SearchHome

   
K- Praxis Editorial: Understanding Context, Keywords and Natural Language Processing (NLP)
July 1, 2004

Any body who has worked on or used a solution that processes natural language would tell you that many a times natural languages context just refuses to yield itself to the process of brute force automation. Because of this, many companies that deal with natural language take a convenient, a-contextual statistical route that makes things even more complicated. In any case, the core NLP research is left for computer and linguistics departments in universities. K-Praxis raises a few questions about automatically understanding context from natural language text.

Do NLP based approaches Works?

There are number of examples where vendors that claim that they have a solution for clustering of documents of extracting entities (there are number of players who now claim that they have systems capable of extracting entities like place names, organization names and people); however these only seem to be telling half truths about their capabilities. Named Entity Extraction (NED) seems to be an interesting example. Any body who has tested these products will tell you that computers go completely haywire trying to figure out difference between Abhu Ghraib as place name and Abhu Ghraib as the name of a person and such examples can be easily multiplied. There must, however, be some research conducted in a few computer science labs that have come up with a solution or a partial solution to this problem, but, as is the case with much academic research, many times the motivation of the researcher is to complete her thesis or research. Once this is done, the researcher usually moves on, and ends up working on completely unrelated or only partially related fields. Same is the case with clustering of documents: most of the time, clustering does not seem to work: barring few interesting players like Vivisimo, who at least are savvy enough to have cleaned up labels for each cluster.

As for grammar based NLP approaches, it seems that the commercial world only looks at them either for translation or as continuing research field for university labs-to keep a safe distance from them. The question is whether grammar based NLP approaches are useful. There are other approaches to NLP but very few of them are really been accessed and used by the commercial vendors.

NLP and Search Engines.

It seems that search engines like Google have managed to utilize NLP for various text processing tasks. Since most search algorithms are heavily dependent on keywords (for processing text, queries and display), and because they depend equally on PageRank, or equivalent algorithms, most search engines seem to have circumvented the problems faced by NLP researchers and vendors. Can keyword lead you to context? Are "keywords" then, answers to some of the big issues been tackled by NLP players? These are interesting questions and we need to seek answers to them.

NLP and Understanding the Context

Can NLP help in getting to the Holy Grail of context? Or is it not possible to understand the context because it is very dynamic, variable and time-space specific? Is it possible to build applications or services like personalized search, or contextual search, or automated clustering, or entity extraction through NLP (statistical or otherwise)? Or is it better to focus on how numbers of pages a search engine can crawl or how many file formats a vendor can understand, or how fast one can process data - even if that processing does not add any value, or is a-contextual?

Please feel free to send your comments, questions to us at info@k-praxis.com

 
Home | Contact K-Praxis | About K-Praxis | Copyright© 2003-2004 K-Praxis. All rights reserved.