Making sense of online textual information and information management technologies
   
 
XBRL and ebXML: A Second Look
January 23, 2004

Semi-structured and unstructured data have always presented certain problems for information management and information processing, including text mining, or opinion mining. Two ways of encoding and presenting data have been in the news recently, and these are XBRL (extensible Business Reporting Language) and ebXML. Recently more and more vendors have started integrating these two languages into their platform. But one bigger question remains answered, can these frameworks and languages help in organizing and making sense of textual data?

While the idea of a reporting language has been around for some time, business terms have to be standardized for these ways to succeed fully, and they seem to be further down the road to full acceptance by businesses for reporting and presenting data. Attempts are being made to include the ability to build a limited ontology as well, as in the DAML example. A comparison between XML, RDF (Resource Description Framework) and DAML (DARPA Agent Markup Language) is available here. It is clear that information stored and presented in this manner will prove to be more usable, and there will be more leverage available.

The two non-profit organizations, XBRL and ebXML are closer now to the process of establishing XML based standards that will enable information exchange across various businesses and application as well. This will allow an easier sharing of semi-structured and unstructured data.

XBRL received a boost in early January, 2004 when it announced its newest version, the press release for the version said:

“In this latest release of the specification, the manner in which business reporting software applications produce or consume XBRL-tagged data has been clarified and enhanced,” said Walter Hamscher, XBRL International Steering Committee Member at Large and Acting Chair. “We worked with the leading software developers and users represented in our consortium over the last few months and this collaboration succeeded in enhancing the specification to make it more readily usable in their XBRL application development.”

Eventhough these languages will make it easy for organizations to organize information and eventually make it easy for business to find that information, these standards are still talking about preformatted data. However, it is necessary to think what new ways can be found to store, retrieve and present text data that is not easy to classify and store in taxonomies or ontologies. While naturally attempts at standardization must be encouraged, we must also remember that in the very processes of standardization, some of the possible ways of thinking about storage, retrieval and presentation might actually get restricted.

Consider for example, the really long lists of business nametags that are available for ebXML, or other encoding. Eventually such lists might turn out too cumbersome, if not unmanageable. It is here that a text analysis solution could perhaps be found, and be made to intervene. Text analysis in collaboration with XML type taxonomies, or DAML type ontologies might perhaps solve that future problem. These issues become even more starker if one starts thinking of new compliance regulations such as Sarbanes-Oxley HIPAA, GLBA and SEC.