Geração de impressão digital para recuperação de documentos similares na web
No Thumbnail Available
Date
2004
Journal Title
Journal ISSN
Volume Title
Publisher
Abstract
This paper presents a mechanism for the generation of the “finger-print” of a Web document. This mechanism is part of a system for detecting and retrieving documents from the Web with a similarity relation to a suspicious do-cument. The process is composed of three stages: a) generation of a fingerprint of the suspicious document, b) gathering candidate documents from the Web and c) comparison of each candidate document and the suspicious document. In the first stage, the fingerprint of the suspicious document is used as its identifica-tion. The fingerprint is composed of representative sentences of the document. In the second stage, the sentences composing the fingerprint are used as queries submitted to a search engine. The documents identified by the URLs returned from the search engine are collected to form a set of similarity candidate do-cuments. In the third stage, the candidate documents are “in-place” compared to the suspicious document. The focus of this work is on the generation of the fingerprint of the suspicious document. Experiments were performed using a collection of plagiarized documents constructed specially for this work. For the best fingerprint evaluated, on average87.06%of the source documents used in the composition of the plagiarized document were retrieved from the Web.
Description
Keywords
Citation
PEREIRA JUNIOR, A. R.; ZIVIANI, N. Geração de impressão digital para recuperação de documentos similares na web. In. II Workshop de Tecnologia da Informação e Linguística, II. 2004. Salvador. Anais. Salvador: Workshop de Tecnologia da Informação e Linguística, 2004. Disponível em: <http://homepages.dcc.ufmg.br/~nivio/papers/til04.pdf>. Acesso em: 18/10/2012.