사용자 도구

사이트 도구


decision_tree_learning

차이

문서의 선택한 두 판 사이의 차이를 보여줍니다.

차이 보기로 링크

decision_tree_learning [2020/04/14 08:25]
decision_tree_learning [2021/04/13 06:54] (현재)
줄 1: 줄 1:
 +* Algorithms
 +  - ID3 (Iterative Dichotomiser 3)
 +  - C4.5 (successor of ID3)
 +  - CART (Classification And Regression Tree)
 +  - CHAID (CHi-squared Automatic Interaction Detector). Performs multi-level splits when computing     classification trees.[11]
 +  - MARS: extends decision trees to handle numerical data better.
  
 +
 +# Automatic Web Content Extraction by Combination of Learning and Grouping
 +
 +Created: Nov 09, 2018 6:38 PM
 +Tags: Paper
 +
 +# 1. INTRODUCTION
 +
 +- The main content in a webpage is often accompanied by a lot of additional and often distracting content such ad branding banners, navigation elements, advertisements and copyright etc.
 +- The web pages in the World Wide Web are highly heterogeneous.
 +- Previous work
 +    - Heuristic
 +    - Template based Approach
 +        - TED
 +
 +# 2. RELATED WORK
 +
 +- CETR
 +- CETD
 +- VIPS
 +
 +# 3. PROBLEM FORMULATION AND SOLUTION
 +
 +![](Untitled-c3f13dfd-e5f1-486c-aaf6-4580e50223b5.png)
 +
 +# 4. FEATURE SELECTION
 +
 +$$F_x(v_i)=F'_x(v_i)\bigcup\{{\bigcup_{v_j\subseteqq Children(v_i)}F_x(v_j)}\}$$
 +
 +## 4.1 Position and Area Features
 +
 +- We consider the left, right, top, bottom, horizontal center and vertical center positions.
 +
 +$$POS\_LEFT = 1 - |BEST\_LEFT\_LEFT|$$
 +
 +## 4.2 Font Features
 +
 +$$FONT\_COLOR\_POPULARITY=\sum_i\varphi_{ki} \varphi_{ri}$$
 +
 +$$FONT\_SIZE=\sum_i{\rho_{ki}(z_i-z_{min}) \over (z_{max}-z_{min})}$$
 +
 +## 4.3 Text, Tag and Link Features
 +
 +$$TEXT\_RATIO={A_{text} \over A_{text} +A_{image} + 1}$$
 +
 +$$TAG\_DENSITY={numTags \over numChars+1}$$
 +
 +$$LINK\_DENSITY={numLinks \over numTags+1}$$
 +
 +# 5 LEARNING
 +
 +# 6 GROUPING AND REFINING
 +
 +1. Grouping
 +2. Group Selection
 +3. Refining
 +4. EXPERIMENTAL EVALUATION
 +    1. Evaluation Data  Set and Metrics
 +    2. Comparison with the Baseline Methods
 +        - LR_A
 +        - SVM_A
 +        - LR
 +        - SVM
 +        - MSS
 +    3. Parameter Sensitivity Analysis
 +5. CONCLUSIONS