Data mining

The building blocks of todays data mining techniques date back to the 1950s when the work of mathematicians, logicians, and

005, p. 5).Weiss, et al. (2005), the process of getting the text ready for text mining is very much like the knowledge discovery steps described earlier. In text mining, the text is usually converted first to XML format for consistency. It is then converted to a series of tokens (sometimes punctuation is interpreted as a token, sometimes as a delimiter). Then, some form of stemming is applied to the tokens to create the standardized dictionary. Familiar IR/data mining processes such as TF-IDF can be applied to assign different weights to the tokens. Once this has been done, classification and clustering algorithms are applied.on the goal of the text mining operation, it may or may not be important to incorporate linguistic processing in the text mining process. Examples of linguistic processing include marking certain types of words (part-of-speech tagging), clarifying the meaning of words (disambiguation) and parsing sentences. Per Benoit (2002),mining brings researchers closer to computational linguistics, as it tends to be highly focused on natural language elements in texts (Knight, 1999). This means TM applications (Church & Rau, 1995) discover knowledge through automatic content summarization (Kan & McKeown, 1999), content searching, document categorization, and lexical, grammatical, semantic, and linguistic analysis (Mattison, 1999). (p. 291)




Data mining is a synonym for knowledge discovery. Data mining also refers to a specific step in the knowledge discovery process, a process that focuses on the application of specific algorithms used to identify interesting patterns in the data repository. These patterns are then conveyed to an end user who converts these patterns into useful knowledge and makes use of that knowledge.mining has evolved out of the need to make sense of huge quantities of information. Usama M. Fayyad says that stored data is doubling every nine months and the Ѓgdemand for data mining and reduction tools increase exponentially (Fayyad, Piatetsky-Shapiro, & Uthurusamy, 2003, p. 192).Ѓh In 2006, $6 billion in text and data mining activities are anticipated (Zanasi, Brebbia, & Ebecken, 2005).U.S. government is involved in many data mining initiatives aimed at improving services, detecting fraud and waste, and detecting terrorist activities. One such activity, the work of Able Danger, had identified one of the men who would, one year later, participate in the 9/11 attacks (Waterman, 2005). This fact emphasizes the importance of the final step of the knowledge discovery process: putting the knowledge to use.U.S. governments data mining activities have helped stir concerns about data mining and their impact on privacy (Boyd, 2006). Privacy preserving data mining has only recently caught the attention of researchers (Verykios, Bertino, Fovino, Provenza, Saygin & Theodoridis, 2004).is much work to done in the area of knowledge discovery and data mining, and its future depends on developing tools and techniques that yield useful knowledge without causing undue threats to individualsЃf privacy.





