Database Lab

사이트 도구

decision_tree_learning

차이

문서의 선택한 두 판 사이의 차이를 보여줍니다.

 decision_tree_learning [2018/08/14 07:25]mwpark 만듦 decision_tree_learning [2020/04/14 08:25] (현재) 2018/11/13 07:49 mwpark 2018/08/14 07:25 mwpark 만듦 다음 판 이전 판 2018/11/13 07:49 mwpark 2018/08/14 07:25 mwpark 만듦 줄 5: 줄 5: - CHAID (CHi-squared Automatic Interaction Detector). Performs multi-level splits when computing ​    ​classification trees.[11] - CHAID (CHi-squared Automatic Interaction Detector). Performs multi-level splits when computing ​    ​classification trees.[11] - MARS: extends decision trees to handle numerical data better. - MARS: extends decision trees to handle numerical data better. + + + # Automatic Web Content Extraction by Combination of Learning and Grouping + + Created: Nov 09, 2018 6:38 PM + Tags: Paper + + # 1. INTRODUCTION + + - The main content in a webpage is often accompanied by a lot of additional and often distracting content such ad branding banners, navigation elements, advertisements and copyright etc. + - The web pages in the World Wide Web are highly heterogeneous. + - Previous work + - Heuristic + - Template based Approach + - TED + + # 2. RELATED WORK + + - CETR + - CETD + - VIPS + + # 3. PROBLEM FORMULATION AND SOLUTION + + ![](Untitled-c3f13dfd-e5f1-486c-aaf6-4580e50223b5.png) + + # 4. FEATURE SELECTION + + $$F_x(v_i)=F'​_x(v_i)\bigcup\{{\bigcup_{v_j\subseteqq Children(v_i)}F_x(v_j)}\}$$ + + ## 4.1 Position and Area Features + + - We consider the left, right, top, bottom, horizontal center and vertical center positions. + + $$POS\_LEFT = 1 - |BEST\_LEFT\_LEFT|$$ + + ## 4.2 Font Features + + $$FONT\_COLOR\_POPULARITY=\sum_i\varphi_{ki} \varphi_{ri}$$ + + $$FONT\_SIZE=\sum_i{\rho_{ki}(z_i-z_{min}) \over (z_{max}-z_{min})}$$ + + ## 4.3 Text, Tag and Link Features + + $$TEXT\_RATIO={A_{text} \over A_{text} +A_{image} + 1}$$ + + $$TAG\_DENSITY={numTags \over numChars+1}$$ + + $$LINK\_DENSITY={numLinks \over numTags+1}$$ + + # 5 LEARNING + + # 6 GROUPING AND REFINING + + 1. Grouping + 2. Group Selection + 3. Refining + 4. EXPERIMENTAL EVALUATION + 1. Evaluation Data  Set and Metrics + 2. Comparison with the Baseline Methods + - LR_A + - SVM_A + - LR + - SVM + - MSS + 3. Parameter Sensitivity Analysis + 5. CONCLUSIONS
decision_tree_learning.txt · 마지막으로 수정됨: 2020/04/14 08:25 (바깥 편집)