Classifying unlabeled short texts using a fuzzy declarative approach.
No Thumbnail Available
Date
2013
Journal Title
Journal ISSN
Volume Title
Publisher
Abstract
Web 2.0 provides user-friendly tools that allow persons to create and
publish content online. User generated content often takes the form of short texts
(e.g., blog posts, news feeds, snippets, etc). This has motivated an increasing interest
on the analysis of short texts and, specifically, on their categorisation. Text categorisation
is the task of classifying documents into a certain number of predefined
categories. Traditional text classification techniques are mainly based on word
frequency statistical analysis and have been proved inadequate for the classification
of short texts where word occurrence is too small. On the other hand, the classic
approach to text categorization is based on a learning process that requires a large
number of labeled training texts to achieve an accurate performance. However
labeled documents might not be available, when unlabeled documents can be easily
collected. This paper presents an approach to text categorisation which does not
need a pre-classified set of training documents. The proposed method only requires
the category names as user input. Each one of these categories is defined by means
of an ontology of terms modelled by a set of what we call proximity equations.
Hence, our method is not category occurrence frequency based, but highly depends
on the definition of that category and how the text fits that definition. Therefore, the
proposed approach is an appropriate method for short text classification where the
frequency of occurrence of a category is very small or even zero. Another feature of
our method is that the classification process is based on the ability of an extension of
the standard Prolog language, named Bousi*Prolog, for flexible matching and
knowledge representation. This declarative approach provides a text classifier which is quick and easy to build, and a classification process which is easy for the user to
understand. The results of experiments showed that the proposed method achieved a
reasonably useful performance.
Description
Keywords
Text categorization, Ontologies, Thesauri, Unlabeled short texts
Citation
ROMERO, F. P. et al. Classifying unlabeled short texts using a fuzzy declarative approach. Language Resources and Evaluation, v. 47, p. 151-178, 2013. Disponível em: <https://link.springer.com/article/10.1007/s10579-012-9203-2>. Acesso em: 28 jul. 2017.