Web content extraction by using decision tree learning

Uzun E.; Agun H.V.; Yerlikaya T.

Web content extraction by using decision tree learning

dc.authorscopusid	54783608800
dc.authorscopusid	55293388500
dc.authorscopusid	16232085100
dc.contributor.author	Uzun E.
dc.contributor.author	Agun H.V.
dc.contributor.author	Yerlikaya T.
dc.date.accessioned	2024-06-12T10:25:24Z
dc.date.available	2024-06-12T10:25:24Z
dc.date.issued	2012
dc.description	2012 20th Signal Processing and Communications Applications Conference, SIU 2012 -- 18 April 2012 through 20 April 2012 -- Fethiye, Mugla -- 90786	en_US
dc.description.abstract	Via information extraction techniques, web pages are able to generate datasets for various studies such as natural language processing, and data mining. However, nowadays the uninformative sections like advertisement, menus, and links are in increase. The cleaning of web pages from uninformative sections, and extraction of informative content has become an important issue. In this study, we present an decision tree learning approach over DOM based features which aims to clean the uninformative sections and extract informative content in three classes: title, main content, and additional information. Through this approach, differently from previous studies, the learning model for the extraction of the main content constructed on DIV and TD tags. The proposed method achieved 95.58% accuracy in cleaning uninformative sections and extraction of the informative content. Especially for the extraction of the main block, 0.96 f-measure is obtained. © 2012 IEEE.	en_US
dc.identifier.doi	10.1109/SIU.2012.6204476
dc.identifier.isbn	9.78147E+12
dc.identifier.scopus	2-s2.0-84863462457	en_US
dc.identifier.scopusquality	N/A	en_US
dc.identifier.uri	https://doi.org/10.1109/SIU.2012.6204476
dc.identifier.uri	https://hdl.handle.net/20.500.14551/16328
dc.indekslendigikaynak	Scopus	en_US
dc.language.iso	tr	en_US
dc.relation.ispartof	2012 20th Signal Processing and Communications Applications Conference, SIU 2012, Proceedings	en_US
dc.relation.publicationcategory	Konferans Öğesi - Uluslararası - Kurum Öğretim Elemanı	en_US
dc.rights	info:eu-repo/semantics/closedAccess	en_US
dc.subject	Data Sets; Decision Tree Learning; F-Measure; Information Extraction Techniques; Learning Models; Natural Language Processing; Web Content; Computational Linguistics; Data Mining; Decision Trees; Natural Language Processing Systems; Signal Processing; Websites; Information Retrieval Systems	en_US
dc.title	Web content extraction by using decision tree learning	en_US
dc.title.alternative	Karar a?aci ö?renmesik? kullanarak web i?çeri?k çikarimi	en_US
dc.type	Conference Object	en_US

Koleksiyon

Scopus İndeksli Yayınlar Koleksiyonu

Web content extraction by using decision tree learning

Dosyalar

Koleksiyon