Classification and Organizing
After a chunk of data has been collected, the immediate task at hand is the sorting and organizing of this data. Usually the individual researcher who is faced with several kinds of documents is compelled to use parameters of classification, where documents are either put aside for one particular reading or the other. In the classification of the material which is collected for a particular study, there are several nuances which are overlooked, because of the size of the document corpus and the prejudices of the researcher
The size of the document corpus: While a smaller document corpus is inadequate to draw out scientific inferences, a sizable document corpus creates other kinds of problems. Even a diligent and meticulous researcher would not be able to capture all the nuances of the text and thus overlook some important evidence.
Prejudices of the researcher: Due to the inevitable pattern in the process of human understanding, a researcher would only be able to look at a given body of evidence in a particular manner. Though (s)he may not overlook any aspects, the varying facets of the same text may not always register in his analysis.
It is here that automation can again brush over the human failings. Clusters of data can be formed based on parameters which are not predetermined by the researcher but generated from within the document itself. Thus what is presented at the end of the classification is not only sorted data but also clues of the parameters which are generated through a scientific method.
Hypothesis
Very little has been said about the generation of the hypothesis. It remains till today, one of the most unscientific stages in even the most scientific experiments. For example, (going back to our favorite scientist) Newton hypothesized gravity because an apple fell on his head! Beyond creativity and chance, (which is no doubt a bonus) that is unpredictable and unscientific, how does one generate a hypothesis on the basis of a given data? The automated clustering of data that is generated, throws up several parameters or grids within which data is classified. This provides the clues for the hypothesis.
Interpretation
The way in which the hypothesis is drawn out from the given clusters of opinions, allows for an understanding based on the criteria drawn out from within the data rather than one that is based on the creativity or insight of the researcher. This eliminates most of the biases and prejudices that the researcher may inadvertently carry with him. Other than that, the automation of the classification and hence the generation of hypothesis also leaves room for an interpretation given by the researcher. The terms of the study are, however, not dictated by the automated process. The classified and sorted documents in fact present a neatly classified bundle of information which allows the researcher to interpret and present an opinion generated out of a scientific analysis.