Browsing by Author "Siqueira, Gustavo Oliveira de"
Now showing 1 - 1 of 1
Results Per Page
Sort Options
Item Strategies for automatic determination of similarity threshold for genre-aware focused crawling processes.(2017) Siqueira, Gustavo Oliveira de; Assis, Guilherme Tavares de; Ferreira, Anderson Almeida; Mangaravite, Vítor; Pádua, Flávio Luis CardealThe great popularity and, specially, the fast Web growth have led to the proposal and analysis of new techniques for helping users to locate effectively the needed information in a satisfactory time, without much difficulty. Traditional crawlers are not capable to identify relevant sub-spaces on Web related to a specific theme; however, focused crawlers are capable to solve, effectively and efficiently, the mentioned problem. Usually, a focused crawler process requires a specific value, called similarity threshold value, for determining whether a crawled Web page is relevant or not according to a topic of interest; such value is distinct for each specific topic. In order to determine automatically such a value for focused crawlers related to a genre-aware approach, we propose three strategies in this work. Our experimental evaluation achieved, as the best result, 100% of precision and 98% of F1, considering a specific crawling process for which it was determined automatically a similarity threshold value: a great result compared with the baseline.