Browsing by Author "Santana, Alan Filipe"
Now showing 1 - 2 of 2
Results Per Page
Sort Options
Item Incremental author name disambiguation by exploiting domain-specific heuristics.(2017) Santana, Alan Filipe; Gonçalves, Marcos André; Laender, Alberto Henrique Frade; Ferreira, Anderson AlmeidaThe vast majority of the current author name disambiguation solutions are designed to disambiguate a whole digital library (DL) at once considering the entire repository. However, these solutions besides being very expensive and having scalability problems, also may not benefit from eventual manual corrections, as they may be lost whenever the process of disambiguating the entire repository is required. In the real world, in which repositories are updated on a daily basis, incremental solutions that disambiguate only the newly introduced citation records, are likely to produce improved results in the long run. However, the problem of incremental author name disambiguation has been largely neglected in the literature. In this article we present a new author name disambiguation method, specially designed for the incremental scenario. In our experiments, our new method largely outperforms recent incremental proposals reported in the literature as well as the current state-of-the-art non-incremental method.Item On the combination of domain-specific heuristics for author name disambiguation : the nearest cluster method.(2015) Santana, Alan Filipe; Gonçalves, André Gonçalves; Laender, Alberto Henrique Frade; Ferreira, Anderson AlmeidaAuthor name disambiguation has been one of the hardest problems faced by digital libraries since their early days. Historically, supervised solutions have empirically outperformed those based on heuristics, but with the burden of having to rely on manually labeled training sets for the learning process. Moreover, most supervised solutions just apply some type of generic machine learning solution and do not exploit specific knowledge about the problem. In this article, we follow a similar reasoning, but in the opposite direction. Instead of extending an existing supervised solution, we propose a set of carefully designed heuristics and similarity functions, and apply supervision only to optimize such parameters for each particular dataset. As our experiments show, the result is a very effective, efficient and practical author name disambiguation method that can be used in many different scenarios. In fact, we show that our method can beat state-of-the-art supervised methods in terms of effectiveness in many situations while being orders of magnitude faster. It can also run without any training information, using only default parameters, and still be very competitive when compared to these supervised methods (beating several of them) and better than most existing unsupervised author name disambiguation solutions.