Browsing by Author "Laender, Alberto Henrique Frade"
Now showing 1 - 10 of 10
Results Per Page
Sort Options
Item A brief survey of automatic methods for author name disambiguation.(2012) Ferreira, Anderson Almeida; Gonçalves, Marcos André; Laender, Alberto Henrique FradeName ambiguity in the context of bibliographic citation records is a hard problem that affects the quality of services and content in digital libraries and similar systems. The challenges of dealing with author name ambiguity have led to a myriad of disambiguation methods. Generally speaking, the proposed methods usually attempt to group citation records of a same author by finding some similarity among them or try to directly assign them to their respective authors. Both approaches may either exploit supervised or unsupervised techniques. In this article, we propose a taxonomy for characterizing the current author name disambiguation methods described in the literature, present a brief survey of the most representative ones and discuss several open challenges.Item Cost-effective on-demand associative author name disambiguation.(2012) Veloso, Adriano Alonso; Ferreira, Anderson Almeida; Gonçalves, Marcos André; Laender, Alberto Henrique Frade; Meira Júnior, WagnerAuthorship disambiguation is an urgent issue that affects the quality of digital library ser-vices and for which supervised solutions have been proposed, delivering state-of-the-art effectiveness. However, particular challenges such as the prohibitive cost of labeling vast amounts of examples (there are many ambiguous authors), the huge hypothesis space (there are several features and authors from which many different disambiguation func-tions may be derived), and the skewed author popularity distribution (few authors are very prolific, while most appear in only few citations), may prevent the full potential of such techniques. In this article, we introduce an associative author name disambiguation approach that identifies authorship by extracting, from training examples, rules associating citation features (e.g., coauthor names, work title, publication venue) to specific authors. As our main contribution we propose three associative author name disambiguators: (1) EAND (Eager Associative Name Disambiguation), our basic method that explores associa-tion rules for name disambiguation; (2) LAND (Lazy Associative Name Disambiguation), that extracts rules on a demand-driven basis at disambiguation time, reducing the hypoth-esis space by focusing on examples that are most suitable for the task; and (3) SLAND (Self-Training LAND), that extends LAND with self-training capabilities, thus drastically reducing the amount of examples required for building effective disambiguation functions, besides being able to detect novel/unseen authors in the test set. Experiments demonstrate that all our disambigutators are effective and that, in particular, SLAND is able to outperform state-of-the-art supervised disambiguators, providing gains that range from 12% to more than 400%, being extremely effective and practical.Item Incremental author name disambiguation by exploiting domain-specific heuristics.(2017) Santana, Alan Filipe; Gonçalves, Marcos André; Laender, Alberto Henrique Frade; Ferreira, Anderson AlmeidaThe vast majority of the current author name disambiguation solutions are designed to disambiguate a whole digital library (DL) at once considering the entire repository. However, these solutions besides being very expensive and having scalability problems, also may not benefit from eventual manual corrections, as they may be lost whenever the process of disambiguating the entire repository is required. In the real world, in which repositories are updated on a daily basis, incremental solutions that disambiguate only the newly introduced citation records, are likely to produce improved results in the long run. However, the problem of incremental author name disambiguation has been largely neglected in the literature. In this article we present a new author name disambiguation method, specially designed for the incremental scenario. In our experiments, our new method largely outperforms recent incremental proposals reported in the literature as well as the current state-of-the-art non-incremental method.Item Incremental unsupervised name disambiguation in cleaned digital libraries.(2011) Carvalho, Ana Paula de; Ferreira, Anderson Almeida; Laender, Alberto Henrique Frade; Gonçalves, Marcos AndréName ambiguity in the context of bibliographic citation sisone of t h e hardest problems currently faced by the Digital Library ( DL) community. Here we deal with the problem of disambiguating new citations records insertedint o a cleaned DL, without the need t process the whole collection , which is usually necessary for un supervised methods. Although supervised solutions can deal with this situation , there is the costly burden of generating training data besides the fact that these methods cannot and le well the insertion of record s of new author not already existent in the repository. I n t h is article, we propose a new unsupervised method that identifies the correct author sof the new citation records to be inserted in a DL. The method is based on heuristics that are also used to identify whet her the new record s belong to authors already in t h e digital library or not , correctly identifying new authors in most cases. Our experiment al evaluation , using synthetic an d real data sets, shows gains of u p t o 19% when compared to a state- of- t h e- art method without the cost of having to disambiguate the whole DL at each new load ( as d on e by u n supervised methods) or the need for any train in g ( as d on e by supervised methods) .Item On the combination of domain-specific heuristics for author name disambiguation : the nearest cluster method.(2015) Santana, Alan Filipe; Gonçalves, André Gonçalves; Laender, Alberto Henrique Frade; Ferreira, Anderson AlmeidaAuthor name disambiguation has been one of the hardest problems faced by digital libraries since their early days. Historically, supervised solutions have empirically outperformed those based on heuristics, but with the burden of having to rely on manually labeled training sets for the learning process. Moreover, most supervised solutions just apply some type of generic machine learning solution and do not exploit specific knowledge about the problem. In this article, we follow a similar reasoning, but in the opposite direction. Instead of extending an existing supervised solution, we propose a set of carefully designed heuristics and similarity functions, and apply supervision only to optimize such parameters for each particular dataset. As our experiments show, the result is a very effective, efficient and practical author name disambiguation method that can be used in many different scenarios. In fact, we show that our method can beat state-of-the-art supervised methods in terms of effectiveness in many situations while being orders of magnitude faster. It can also run without any training information, using only default parameters, and still be very competitive when compared to these supervised methods (beating several of them) and better than most existing unsupervised author name disambiguation solutions.Item Projeto/Reprojeto de bancos de dados relacionais : a ferramenta DB-Tool.(1997) Ferreira, Anderson Almeida; Laender, Alberto Henrique Frade; Silva, Altigran Soares daThis paper describes a tool that supp orts the design and redesign of relational databases The tool produces optimized relational representations of entity relationship ER schemas and is implemented using Informix as its target database management system DBMS The tool operates in two phases In the first phase it receives as input an ER schema and generates a list of commands to implement the corresponding Informix schema In the second phase it receives a list of redesign commands specifying changes to the ER schema and generates a redesign plan to reestructure the database accordingly An example illustrates the use of the tool.Item Reducing fragmentation in incremental author name disambiguation.(2014) Espiridião, Luciano Vilas Boas; Ferreira, Anderson Almeida; Laender, Alberto Henrique Frade; Gonçalves, Marcos André; Gomes, David Menotti; Tavares, Andréa Iabrudi; Assis, Guilherme Tavares deAuthor name ambiguity is a hard problem that occurs when several authors publish articles with the same name or when a same author publishes their articles under different names. Traditionally, automatic disambiguation methods process the author names of all citation records in a repository. Aiming efficiency, incremental methods disambiguate author names only when new citation records are inserted into the repository. As a side effect, several citation records of a same author may be associated with different authors, aka, the fragmentation problem. To diminish this problem, we propose a new merge-oriented incremental method capable of reducing such side effect, without the need to apply a traditional disambiguation method on the whole repository. Our experimental evaluation shows that our method produces significant improvements when compared to an incremental baseline and is very competitive with batch-mode methods.Item Self-training author name disambiguation for information scarce scenarios.(2014) Ferreira, Anderson Almeida; Veloso, Adriano Alonso; Gonçalves, Marcos André; Laender, Alberto Henrique FradeWe present a novel 3-step self-training method for author name disambiguation—SAND (self-training associative name disambiguator)—which requires no manual labeling, no parameterization (in real-world scenarios) and is particularly suitable for the common situation in which only the most basic information about a citation record is available (i.e., author names, and work and venue titles). During the first step, real-world heuristics on coauthors are able to produce highly pure (although fragmented) clusters. The most representative of these clusters are then selected to serve as training data for the third supervised author assignment step. The third step exploits a state-of-the-art transductive disambiguation method capable of detecting unseen authors not included in any training example and incorporating reliable predictions to the training data. Experiments conducted with standard public collections, using the minimum set of attributes present in a citation, demonstrate that our proposed method outperforms all representative unsupervised author grouping disambiguation methods and is very competitive with fully supervised author assignment methods. Thus, different from other bootstrapping methods that explore privileged, hard to obtain information such as self-citations and personal information, our proposed method produces topnotch performance with no (manual) training data or parameterization and in the presence of scarce information.Item SyGAR – A synthetic data generator for evaluating name disambiguation methods.(2009) Ferreira, Anderson Almeida; Gonçalves, Marcos André; Almeida, Jussara Marques de; Laender, Alberto Henrique Frade; Veloso, Adriano AlonsoName ambiguity in the context of bibliographic citations is one of the hardest problems currently faced by the digital library community. Several methods have been proposed in the literature, but none of them provides the perfect solution for the problem. More importantly, basically all of these methods were tested in limited and restricted scenarios , which raises concerns about their practical applicability. In this work, we deal with these limitation s by proposing a synthetic generator of ambiguous authors hip records called SyGAR . The generator was validated against a gold standard collection of d is ambiguated records , and aplied to evaluate three d is ambiguation method s in a relevant scenario.Item A tool for generating synthetic authorship records for evaluating author name disambiguation methods.(2012) Ferreira, Anderson Almeida; Gonçalves, Marcos André; Almeida, Jussara Marques de; Laender, Alberto Henrique Frade; Veloso, Adriano AlonsoThe author name disambiguation task has to deal with uncertainties related to the possible many-to-many correspondences between ambiguous names and unique authors. Despite the variety of name disambiguation methods available in the literature to solve the problem, most of them are rarely compared against each other. Moreover, they are often evaluated without considering a time evolving digital library, susceptible to dynamic (and therefore challenging) patterns such as the introduction of new authors and the change of research-ers’ interests over time. In order to facilitate the evaluation of name disambiguation meth-ods in various realistic scenarios and under controlled conditions, in this article we propose SyGAR, a new Synthetic Generator of Authorship Records that generates citation records based on author profiles. SyGAR can be used to generate successive loads of citation records simulating a living digital library that evolves according to various publication pat-terns. We validate SyGAR by comparing the results produced by three representative name disambiguation methods on real as well as synthetically generated collections of citation records. We also demonstrate its applicability by evaluating those methods on a time evolving digital library collection generated with the tool, considering several dynamic and realistic scenarios.