Dynamic topic hierarchies and segmented rankings in textual OLAP technology.
No Thumbnail Available
Date
2017
Journal Title
Journal ISSN
Volume Title
Publisher
Abstract
The OLAP technology emerged 20 years ago and recently has been redesigned so that its dimensions,
hierarchies and measures can support the particularities of textual data. Organizing textual data
hierarchically can be solved with topic hierarchies. Currently, the topic hierarchy is defined only once
in the data cube, i.e., for the entire lattice of cuboids. However, such hierarchy is sensitive to the
document collection content. Thus, a data cube cell can contain a collection of documents distinct from
others in the same cube, causing potential changes in the topic hierarchy. Furthermore, the text segment
used in OLAP analysis also changes this hierarchy. In this work, we present a textual data cube with
multiple dynamic topic hierarchies for each cube cell. Multiple hierarchies, since the presented approach
builds a topic hierarchy per text segment. Another contribution of this work refers to query response.
The state-of-the-art normally returns the top-k documents to the topic selected in the query. We go
beyond by returning other text segments, such as the most significant titles, abstracts and paragraphs.
The approach is designed in four additional steps and each step attenuates a bit more the impact of
building multiple topic hierarchies and segmented rankings per cube cell. Experiments using part of the
DBLP papers as a document collection reinforce our hypotheses.
Description
Keywords
Data cube, Text database, Ranking, Topic hierarchy
Citation
SOUZA, A. N. de P. e; FORTES, R. S.; LIMA, J. de C. Dynamic topic hierarchies and segmented rankings in textual OLAP technology. Journal of Convergence Information Technology, Gyeongju, v. 12, p. 1-17, 2017. Disponível em: <http://www.globalcis.org/jcit/home/index.html>. Acesso em: 16 jan. 2018.