Data Mining is a set of models that facilitates decision making processes by identifying relationships, patterns, and rules among data and building useful models. Due to increasingly competitive business environments, success of a company has become ever-more reliant on acquiring and analyzing data. Moreover, the most frequently asked question regarding data has changed from a retrospective “what has happened” to a prospective “what will happen”. The study of data mining is flourishing as the amount of available data has been exploding along with the fast advancing computing technology to the point where the term “big data analytics” is often used.
There are two kinds of Data mining: descriptive and predictive. The former focuses on gaining insight through finding relational patterns and clusters in the data. The latter focuses on forecasting through classification and prediction from pre-designed models. The numerical data are usually structured while text, image, and video data are unstructured.
Data mining is used in various fields such as finance, marketing, communications, medicine, energy, and manufacturing. A few points of interest among many others in finance are credit assessment, credit card fraud detection, stock price forecasting, and portfolio evaluation. In marketing, it is used to analyze customer expenditure patterns or trends and to target customers with right campaigns. In medical fields, data mining techniques are used to improve diagnosis process and extensively used in the study of genetics. It also helps provide insight into energy consumption detection and resource exploration of energy conservation field. Moreover, it is used in new product development, defect detection, automation, and supply chain management in manufacturing.
There is a definite shortage of data mining experts around the world, and data mining technology is consequently not being utilized to its full potential. The New York Times reported in February 2012 that in the US alone needs 140,000 to 190,000 data analysts.
Since we started as Neural Network Lab at POSTECH in 1993, our lab has performed research on customer response modeling, keystroke-based security modeling, manufacturing automation and fault detection, financial forecasting, medical assessment, and intelligent sampling. We have published over 160 research papers, received a number of patents and produced commercial software products. As of March, 2013, our lab has a total of 8 graduates with a Ph.D. degree and 55 graduates with a Master’s degree. Currently, we have 12 Ph.D. students and 10 Masters’ students. Our current research is focused on data sampling, anomaly detection, text analysis, and keystroke-based biometrics.