On the combination of domain-specific heuristics for author name disambiguation : the nearest cluster method.
No Thumbnail Available
Date
2015
Journal Title
Journal ISSN
Volume Title
Publisher
Abstract
Author name disambiguation has been one of the
hardest problems faced by digital libraries since their early
days. Historically, supervised solutions have empirically outperformed
those based on heuristics, but with the burden of
having to rely on manually labeled training sets for the learning
process. Moreover, most supervised solutions just apply
some type of generic machine learning solution and do not
exploit specific knowledge about the problem. In this article,
we follow a similar reasoning, but in the opposite direction.
Instead of extending an existing supervised solution, we
propose a set of carefully designed heuristics and similarity
functions, and apply supervision only to optimize such parameters
for each particular dataset. As our experiments show,
the result is a very effective, efficient and practical author
name disambiguation method that can be used in many different
scenarios. In fact, we show that our method can beat
state-of-the-art supervised methods in terms of effectiveness
in many situations while being orders of magnitude faster.
It can also run without any training information, using only
default parameters, and still be very competitive when compared
to these supervised methods (beating several of them) and better than most existing unsupervised author name disambiguation
solutions.
Description
Keywords
Supervised methods
Citation
SANTANA, A. F. et al. On the combination of domain-specific heuristics for auhor name disambiguation : the nearest cluster method. International Journal on Digital Libraries, n. 16, p. 229-246, 2015. Disponível em: <https://link.springer.com/article/10.1007/s00799-015-0158-y>. Acesso em: 20 jan. 2017.