사용자 도구

사이트 도구


decision_tree_learning

차이

문서의 선택한 두 판 사이의 차이를 보여줍니다.

차이 보기로 링크

decision_tree_learning [2018/11/13 07:49]
mwpark
decision_tree_learning [2021/04/13 06:54]
줄 1: 줄 1:
-* Algorithms 
-  - ID3 (Iterative Dichotomiser 3) 
-  - C4.5 (successor of ID3) 
-  - CART (Classification And Regression Tree) 
-  - CHAID (CHi-squared Automatic Interaction Detector). Performs multi-level splits when computing     classification trees.[11] 
-  - MARS: extends decision trees to handle numerical data better. 
  
- 
-# Automatic Web Content Extraction by Combination of Learning and Grouping 
- 
-Created: Nov 09, 2018 6:38 PM 
-Tags: Paper 
- 
-# 1. INTRODUCTION 
- 
-- The main content in a webpage is often accompanied by a lot of additional and often distracting content such ad branding banners, navigation elements, advertisements and copyright etc. 
-- The web pages in the World Wide Web are highly heterogeneous. 
-- Previous work 
-    - Heuristic 
-    - Template based Approach 
-        - TED 
- 
-# 2. RELATED WORK 
- 
-- CETR 
-- CETD 
-- VIPS 
- 
-# 3. PROBLEM FORMULATION AND SOLUTION 
- 
-![](Untitled-c3f13dfd-e5f1-486c-aaf6-4580e50223b5.png) 
- 
-# 4. FEATURE SELECTION 
- 
-$$F_x(v_i)=F'_x(v_i)\bigcup\{{\bigcup_{v_j\subseteqq Children(v_i)}F_x(v_j)}\}$$ 
- 
-## 4.1 Position and Area Features 
- 
-- We consider the left, right, top, bottom, horizontal center and vertical center positions. 
- 
-$$POS\_LEFT = 1 - |BEST\_LEFT\_LEFT|$$ 
- 
-## 4.2 Font Features 
- 
-$$FONT\_COLOR\_POPULARITY=\sum_i\varphi_{ki} \varphi_{ri}$$ 
- 
-$$FONT\_SIZE=\sum_i{\rho_{ki}(z_i-z_{min}) \over (z_{max}-z_{min})}$$ 
- 
-## 4.3 Text, Tag and Link Features 
- 
-$$TEXT\_RATIO={A_{text} \over A_{text} +A_{image} + 1}$$ 
- 
-$$TAG\_DENSITY={numTags \over numChars+1}$$ 
- 
-$$LINK\_DENSITY={numLinks \over numTags+1}$$ 
- 
-# 5 LEARNING 
- 
-# 6 GROUPING AND REFINING 
- 
-1. Grouping 
-2. Group Selection 
-3. Refining 
-4. EXPERIMENTAL EVALUATION 
-    1. Evaluation Data  Set and Metrics 
-    2. Comparison with the Baseline Methods 
-        - LR_A 
-        - SVM_A 
-        - LR 
-        - SVM 
-        - MSS 
-    3. Parameter Sensitivity Analysis 
-5. CONCLUSIONS 
decision_tree_learning.txt · 마지막으로 수정됨: 2021/04/13 06:54 (바깥 편집)