Improving the adaptability of classification algorithms in text streams by detecting and handling concept drift

Instituto Informática

mayo 3rd, 2016


El día lunes 18 de abril, el profesor Matthieu Vernier, expuso sobre el tema de “la La mejora de la adaptabilidad de los algoritmos de clasificación de flujos de texto mediante la detección y el manejo del concepto drift”

In Data Mining, “supervised classification” is a classical task that consists to identify to which class of a more understandable structure a new data belongs. Typically, this type of problem is solved with a supervised learning approach that build a statistical model, called classifier, which learn to classify data from examples. Then, this classifier is used to classify new incoming data. Historically, classifiers are learned “offline” from a set of examples centralized in the memory of a computer. Over the last decade, the increase of real-world applications using large data streams posed a new challenge: how to improve the scalability of data mining techniques and classification algorithms to deal with the volume, the velocity and the variability of such data streams ? In particular, the meaning of data can change over time and an important challenge consists to detect this drift and automatically adapt the classifier in order to avoid that it becomes obsolete. The adaptability of classification algorithms are particularly critical for data streams that contain textual data because the natural language is very conducive to meaning drift. At the same time, Concept Drift handling is a very challenging issue in this context because textual data is naturally unstructured, ambiguous and noisy. We propose an overview of this problem and present tracks of reflexion to approach this problem.

Comments are closed.