Strategies for automatic determination of similarity threshold for genre-aware focused crawling processes.

dc.contributor.authorSiqueira, Gustavo Oliveira de
dc.contributor.authorAssis, Guilherme Tavares de
dc.contributor.authorFerreira, Anderson Almeida
dc.contributor.authorMangaravite, Vítor
dc.contributor.authorPádua, Flávio Luis Cardeal
dc.date.accessioned2018-10-15T12:22:04Z
dc.date.available2018-10-15T12:22:04Z
dc.date.issued2017
dc.description.abstractThe great popularity and, specially, the fast Web growth have led to the proposal and analysis of new techniques for helping users to locate effectively the needed information in a satisfactory time, without much difficulty. Traditional crawlers are not capable to identify relevant sub-spaces on Web related to a specific theme; however, focused crawlers are capable to solve, effectively and efficiently, the mentioned problem. Usually, a focused crawler process requires a specific value, called similarity threshold value, for determining whether a crawled Web page is relevant or not according to a topic of interest; such value is distinct for each specific topic. In order to determine automatically such a value for focused crawlers related to a genre-aware approach, we propose three strategies in this work. Our experimental evaluation achieved, as the best result, 100% of precision and 98% of F1, considering a specific crawling process for which it was determined automatically a similarity threshold value: a great result compared with the baseline.pt_BR
dc.identifier.citationSIQUEIRA, G. O. de et al. Strategies for automatic determination of similarity threshold for genre-aware focused crawling processes. IADIS International Journal on WWW/Internet, v. 15, p. 15-30, 2017. Disponível em: <http://www.iadisportal.org/ijwi/papers/2017151102.pdf>. Acesso em: 16 jun. 2018.pt_BR
dc.identifier.issn16457641
dc.identifier.urihttp://www.repositorio.ufop.br/handle/123456789/10363
dc.identifier.uri2http://www.iadisportal.org/ijwi/papers/2017151102.pdfpt_BR
dc.language.isoen_USpt_BR
dc.rightsrestritopt_BR
dc.subjectSimilarity thresholdpt_BR
dc.subjectWeb crawlingpt_BR
dc.subjectFocused crawlingpt_BR
dc.titleStrategies for automatic determination of similarity threshold for genre-aware focused crawling processes.pt_BR
dc.typeArtigo publicado em periodicopt_BR
Files
Original bundle
Now showing 1 - 1 of 1
No Thumbnail Available
Name:
ARTIGO_StrategiesAutomaticDetermination.pdf
Size:
618.5 KB
Format:
Adobe Portable Document Format
License bundle
Now showing 1 - 1 of 1
No Thumbnail Available
Name:
license.txt
Size:
924 B
Format:
Item-specific license agreed upon to submission
Description: