문서의 선택한 두 판 사이의 차이를 보여줍니다.
decision_tree_learning [2018/11/13 07:49] mwpark |
decision_tree_learning [2021/04/13 06:54] |
||
---|---|---|---|
줄 1: | 줄 1: | ||
- | * Algorithms | ||
- | - ID3 (Iterative Dichotomiser 3) | ||
- | - C4.5 (successor of ID3) | ||
- | - CART (Classification And Regression Tree) | ||
- | - CHAID (CHi-squared Automatic Interaction Detector). Performs multi-level splits when computing | ||
- | - MARS: extends decision trees to handle numerical data better. | ||
- | |||
- | # Automatic Web Content Extraction by Combination of Learning and Grouping | ||
- | |||
- | Created: Nov 09, 2018 6:38 PM | ||
- | Tags: Paper | ||
- | |||
- | # 1. INTRODUCTION | ||
- | |||
- | - The main content in a webpage is often accompanied by a lot of additional and often distracting content such ad branding banners, navigation elements, advertisements and copyright etc. | ||
- | - The web pages in the World Wide Web are highly heterogeneous. | ||
- | - Previous work | ||
- | - Heuristic | ||
- | - Template based Approach | ||
- | - TED | ||
- | |||
- | # 2. RELATED WORK | ||
- | |||
- | - CETR | ||
- | - CETD | ||
- | - VIPS | ||
- | |||
- | # 3. PROBLEM FORMULATION AND SOLUTION | ||
- | |||
- | ![](Untitled-c3f13dfd-e5f1-486c-aaf6-4580e50223b5.png) | ||
- | |||
- | # 4. FEATURE SELECTION | ||
- | |||
- | $$F_x(v_i)=F' | ||
- | |||
- | ## 4.1 Position and Area Features | ||
- | |||
- | - We consider the left, right, top, bottom, horizontal center and vertical center positions. | ||
- | |||
- | $$POS\_LEFT = 1 - |BEST\_LEFT\_LEFT|$$ | ||
- | |||
- | ## 4.2 Font Features | ||
- | |||
- | $$FONT\_COLOR\_POPULARITY=\sum_i\varphi_{ki} \varphi_{ri}$$ | ||
- | |||
- | $$FONT\_SIZE=\sum_i{\rho_{ki}(z_i-z_{min}) \over (z_{max}-z_{min})}$$ | ||
- | |||
- | ## 4.3 Text, Tag and Link Features | ||
- | |||
- | $$TEXT\_RATIO={A_{text} \over A_{text} +A_{image} + 1}$$ | ||
- | |||
- | $$TAG\_DENSITY={numTags \over numChars+1}$$ | ||
- | |||
- | $$LINK\_DENSITY={numLinks \over numTags+1}$$ | ||
- | |||
- | # 5 LEARNING | ||
- | |||
- | # 6 GROUPING AND REFINING | ||
- | |||
- | 1. Grouping | ||
- | 2. Group Selection | ||
- | 3. Refining | ||
- | 4. EXPERIMENTAL EVALUATION | ||
- | 1. Evaluation Data Set and Metrics | ||
- | 2. Comparison with the Baseline Methods | ||
- | - LR_A | ||
- | - SVM_A | ||
- | - LR | ||
- | - SVM | ||
- | - MSS | ||
- | 3. Parameter Sensitivity Analysis | ||
- | 5. CONCLUSIONS |