Self-training author name disambiguation for information scarce scenarios.
No Thumbnail Available
Date
2014
Journal Title
Journal ISSN
Volume Title
Publisher
Abstract
We present a novel 3-step self-training method for
author name disambiguation—SAND (self-training associative
name disambiguator)—which requires no manual
labeling, no parameterization (in real-world scenarios)
and is particularly suitable for the common situation in
which only the most basic information about a citation
record is available (i.e., author names, and work and
venue titles). During the first step, real-world heuristics
on coauthors are able to produce highly pure (although
fragmented) clusters. The most representative of these
clusters are then selected to serve as training data for
the third supervised author assignment step. The third
step exploits a state-of-the-art transductive disambiguation
method capable of detecting unseen authors not
included in any training example and incorporating reliable
predictions to the training data. Experiments conducted
with standard public collections, using the
minimum set of attributes present in a citation, demonstrate
that our proposed method outperforms all representative
unsupervised author grouping disambiguation
methods and is very competitive with fully supervised
author assignment methods. Thus, different from other
bootstrapping methods that explore privileged, hard to
obtain information such as self-citations and personal
information, our proposed method produces topnotch
performance with no (manual) training data or parameterization
and in the presence of scarce information.
Description
Keywords
Citation
FERREIRA, A. A. et al. Self-training author name disambiguation for information scarce scenarios. Journal of the Association for Information Science and Technology, v. 65, n. 6, p. 1257-1278, jun. 2014. Disponível em: <http://onlinelibrary.wiley.com/doi/10.1002/asi.22992/epdf>. Acesso em: 17 fev. 2017.