Strategies for automatic determination of similarity threshold for genre-aware focused crawling processes.
dc.contributor.author | Siqueira, Gustavo Oliveira de | |
dc.contributor.author | Assis, Guilherme Tavares de | |
dc.contributor.author | Ferreira, Anderson Almeida | |
dc.contributor.author | Mangaravite, Vítor | |
dc.contributor.author | Pádua, Flávio Luis Cardeal | |
dc.date.accessioned | 2018-10-15T12:22:04Z | |
dc.date.available | 2018-10-15T12:22:04Z | |
dc.date.issued | 2017 | |
dc.description.abstract | The great popularity and, specially, the fast Web growth have led to the proposal and analysis of new techniques for helping users to locate effectively the needed information in a satisfactory time, without much difficulty. Traditional crawlers are not capable to identify relevant sub-spaces on Web related to a specific theme; however, focused crawlers are capable to solve, effectively and efficiently, the mentioned problem. Usually, a focused crawler process requires a specific value, called similarity threshold value, for determining whether a crawled Web page is relevant or not according to a topic of interest; such value is distinct for each specific topic. In order to determine automatically such a value for focused crawlers related to a genre-aware approach, we propose three strategies in this work. Our experimental evaluation achieved, as the best result, 100% of precision and 98% of F1, considering a specific crawling process for which it was determined automatically a similarity threshold value: a great result compared with the baseline. | pt_BR |
dc.identifier.citation | SIQUEIRA, G. O. de et al. Strategies for automatic determination of similarity threshold for genre-aware focused crawling processes. IADIS International Journal on WWW/Internet, v. 15, p. 15-30, 2017. Disponível em: <http://www.iadisportal.org/ijwi/papers/2017151102.pdf>. Acesso em: 16 jun. 2018. | pt_BR |
dc.identifier.issn | 16457641 | |
dc.identifier.uri | http://www.repositorio.ufop.br/handle/123456789/10363 | |
dc.identifier.uri2 | http://www.iadisportal.org/ijwi/papers/2017151102.pdf | pt_BR |
dc.language.iso | en_US | pt_BR |
dc.rights | restrito | pt_BR |
dc.subject | Similarity threshold | pt_BR |
dc.subject | Web crawling | pt_BR |
dc.subject | Focused crawling | pt_BR |
dc.title | Strategies for automatic determination of similarity threshold for genre-aware focused crawling processes. | pt_BR |
dc.type | Artigo publicado em periodico | pt_BR |
Files
Original bundle
1 - 1 of 1
No Thumbnail Available
- Name:
- ARTIGO_StrategiesAutomaticDetermination.pdf
- Size:
- 618.5 KB
- Format:
- Adobe Portable Document Format
License bundle
1 - 1 of 1
No Thumbnail Available
- Name:
- license.txt
- Size:
- 924 B
- Format:
- Item-specific license agreed upon to submission
- Description: