# Database Lab

### 사이드바

decision_tree_learning

* Algorithms

1. ID3 (Iterative Dichotomiser 3)
2. C4.5 (successor of ID3)
3. CART (Classification And Regression Tree)
4. CHAID (CHi-squared Automatic Interaction Detector). Performs multi-level splits when computing classification trees.
5. MARS: extends decision trees to handle numerical data better.

# Automatic Web Content Extraction by Combination of Learning and Grouping

Created: Nov 09, 2018 6:38 PM Tags: Paper

# 1. INTRODUCTION

1. Heuristic
2. Template based Approach
1. TED

# 2. RELATED WORK

- CETR - CETD - VIPS

# 3. PROBLEM FORMULATION AND SOLUTION

![](Untitled-c3f13dfd-e5f1-486c-aaf6-4580e50223b5.png)

# 4. FEATURE SELECTION

$$F_x(v_i)=F'_x(v_i)\bigcup\{{\bigcup_{v_j\subseteqq Children(v_i)}F_x(v_j)}\}$$

## 4.1 Position and Area Features

- We consider the left, right, top, bottom, horizontal center and vertical center positions.

$$POS\_LEFT = 1 - |BEST\_LEFT\_LEFT|$$

## 4.2 Font Features

$$FONT\_COLOR\_POPULARITY=\sum_i\varphi_{ki} \varphi_{ri}$$

$$FONT\_SIZE=\sum_i{\rho_{ki}(z_i-z_{min}) \over (z_{max}-z_{min})}$$

## 4.3 Text, Tag and Link Features

$$TEXT\_RATIO={A_{text} \over A_{text} +A_{image} + 1}$$

$$TAG\_DENSITY={numTags \over numChars+1}$$

$$LINK\_DENSITY={numLinks \over numTags+1}$$

# 5 LEARNING

# 6 GROUPING AND REFINING

1. Grouping 2. Group Selection 3. Refining 4. EXPERIMENTAL EVALUATION

  1. Evaluation Data  Set and Metrics
2. Comparison with the Baseline Methods
- LR_A
- SVM_A
- LR
- SVM
- MSS
3. Parameter Sensitivity Analysis

5. CONCLUSIONS

decision_tree_learning.txt · 마지막으로 수정됨: 2018/11/13 16:49 저자 mwpark

### 문서 도구 