Browsing by Author "Veloso, Adriano Alonso"
Now showing 1 - 4 of 4
Results Per Page
Sort Options
Item Cost-effective on-demand associative author name disambiguation.(2012) Veloso, Adriano Alonso; Ferreira, Anderson Almeida; Gonçalves, Marcos André; Laender, Alberto Henrique Frade; Meira Júnior, WagnerAuthorship disambiguation is an urgent issue that affects the quality of digital library ser-vices and for which supervised solutions have been proposed, delivering state-of-the-art effectiveness. However, particular challenges such as the prohibitive cost of labeling vast amounts of examples (there are many ambiguous authors), the huge hypothesis space (there are several features and authors from which many different disambiguation func-tions may be derived), and the skewed author popularity distribution (few authors are very prolific, while most appear in only few citations), may prevent the full potential of such techniques. In this article, we introduce an associative author name disambiguation approach that identifies authorship by extracting, from training examples, rules associating citation features (e.g., coauthor names, work title, publication venue) to specific authors. As our main contribution we propose three associative author name disambiguators: (1) EAND (Eager Associative Name Disambiguation), our basic method that explores associa-tion rules for name disambiguation; (2) LAND (Lazy Associative Name Disambiguation), that extracts rules on a demand-driven basis at disambiguation time, reducing the hypoth-esis space by focusing on examples that are most suitable for the task; and (3) SLAND (Self-Training LAND), that extends LAND with self-training capabilities, thus drastically reducing the amount of examples required for building effective disambiguation functions, besides being able to detect novel/unseen authors in the test set. Experiments demonstrate that all our disambigutators are effective and that, in particular, SLAND is able to outperform state-of-the-art supervised disambiguators, providing gains that range from 12% to more than 400%, being extremely effective and practical.Item Self-training author name disambiguation for information scarce scenarios.(2014) Ferreira, Anderson Almeida; Veloso, Adriano Alonso; Gonçalves, Marcos André; Laender, Alberto Henrique FradeWe present a novel 3-step self-training method for author name disambiguation—SAND (self-training associative name disambiguator)—which requires no manual labeling, no parameterization (in real-world scenarios) and is particularly suitable for the common situation in which only the most basic information about a citation record is available (i.e., author names, and work and venue titles). During the first step, real-world heuristics on coauthors are able to produce highly pure (although fragmented) clusters. The most representative of these clusters are then selected to serve as training data for the third supervised author assignment step. The third step exploits a state-of-the-art transductive disambiguation method capable of detecting unseen authors not included in any training example and incorporating reliable predictions to the training data. Experiments conducted with standard public collections, using the minimum set of attributes present in a citation, demonstrate that our proposed method outperforms all representative unsupervised author grouping disambiguation methods and is very competitive with fully supervised author assignment methods. Thus, different from other bootstrapping methods that explore privileged, hard to obtain information such as self-citations and personal information, our proposed method produces topnotch performance with no (manual) training data or parameterization and in the presence of scarce information.Item SyGAR – A synthetic data generator for evaluating name disambiguation methods.(2009) Ferreira, Anderson Almeida; Gonçalves, Marcos André; Almeida, Jussara Marques de; Laender, Alberto Henrique Frade; Veloso, Adriano AlonsoName ambiguity in the context of bibliographic citations is one of the hardest problems currently faced by the digital library community. Several methods have been proposed in the literature, but none of them provides the perfect solution for the problem. More importantly, basically all of these methods were tested in limited and restricted scenarios , which raises concerns about their practical applicability. In this work, we deal with these limitation s by proposing a synthetic generator of ambiguous authors hip records called SyGAR . The generator was validated against a gold standard collection of d is ambiguated records , and aplied to evaluate three d is ambiguation method s in a relevant scenario.Item A tool for generating synthetic authorship records for evaluating author name disambiguation methods.(2012) Ferreira, Anderson Almeida; Gonçalves, Marcos André; Almeida, Jussara Marques de; Laender, Alberto Henrique Frade; Veloso, Adriano AlonsoThe author name disambiguation task has to deal with uncertainties related to the possible many-to-many correspondences between ambiguous names and unique authors. Despite the variety of name disambiguation methods available in the literature to solve the problem, most of them are rarely compared against each other. Moreover, they are often evaluated without considering a time evolving digital library, susceptible to dynamic (and therefore challenging) patterns such as the introduction of new authors and the change of research-ers’ interests over time. In order to facilitate the evaluation of name disambiguation meth-ods in various realistic scenarios and under controlled conditions, in this article we propose SyGAR, a new Synthetic Generator of Authorship Records that generates citation records based on author profiles. SyGAR can be used to generate successive loads of citation records simulating a living digital library that evolves according to various publication pat-terns. We validate SyGAR by comparing the results produced by three representative name disambiguation methods on real as well as synthetically generated collections of citation records. We also demonstrate its applicability by evaluating those methods on a time evolving digital library collection generated with the tool, considering several dynamic and realistic scenarios.