ML Algorithm 대표적인 Hyper parameter

머신러닝/ML Hyperparameter

ML Algorithm 대표적인 Hyper parameter

Stevelee_HMC 2022. 8. 25. 07:31

가장 많이 쓰이는 대표적인 알고리즘 4가지 (catboost , lgbm , xgboost , random forest)에 대해

하이퍼 파라미터에 대한 간략한 소개 및 세팅을 다뤄보고자 한다.

1) Catboost

하이퍼파라미터가 이미 최적화되어 있어서(default setting) 추가적으로 더 많이 하는것이 큰 의미는

없으나 굳이 한다면 iterations , learning _rate , random_strength , L2_regulariser 정도로 보면 좋다고 한다.

iterations : 모델 몇번 돌릴것인가 , 500 ~ 10000(데이터가 가볍다면)
learning_rate : 학습률 , 0.01 ~ 0.3
random_strength : 모델 과적합 방지 0~100 (정수형)

2) LightGBM

https://lightgbm.readthedocs.io/en/latest/Parameters-Tuning.html 참조

Parameters Tuning — LightGBM 3.3.2.99 documentation

num_leaves. This is the main parameter to control the complexity of the tree model. Theoretically, we can set num_leaves = 2^(max_depth) to obtain the same number of leaves as depth-wise tree. However, this simple conversion is not good in practice. The re

lightgbm.readthedocs.io

num_leaves : 2의 n제곱보다 작아야함 (n = max_depth) ex) max_depth = 4이면 대충 16이하인 10~15
max_depth : 모델의 깊이 (1~10) ,feature가 많지 않은 경우일때(100개 이하)
min_data_in_leaf : 오버피팅 예방 , 100~1000사이면 충분

3) XGBoost

https://zzinnam.tistory.com/entry/XGboost-%EC%A3%BC%EC%9A%94-%ED%95%98%EC%9D%B4%ED%8D%BC%ED%8C%8C%EB%9D%BC%EB%AF%B8%ED%84%B0-with-%ED%8C%8C%EC%9D%B4%EC%8D%AC

XGboost 주요 하이퍼파라미터 (with 파이썬)

본 포스팅에서는 파이썬의 XGBoost 알고리즘의 주요 하이퍼파라미터에 대해 설명하겠습니다. 해당 알고리즘에 대한 설명은 아래 포스팅을 참고하시면 됩니다. R에서 XGBoost 알고리즘을 사용한 분

zzinnam.tistory.com

eta : learning_rate와 동일한 파라미터 0.01~0.3 사이
max_depth : 모델 깊이(1~10) 과적합 조절용도
colsample_bytree : 샘플링 비율 지정 0.5~1사이값 사용

4) RandomForest

https://injo.tistory.com/30

[Chapter 4. 분류] 랜덤포레스트(Random Forest)

<!DOCTYPE html> 4-3. Random Forest In [1]: from IPython.core.display import display, HTML display(HTML(" ")) 1. 배깅(Bagging)이란?¶ 배깅(Bagging)은 Bootstrap Aggregating의 약자로, 보팅(Voting)과는..

injo.tistory.com

n_estimators : 10~1000 (예측기 갯수 , 늘리면 성능은 좋아지나 느려짐)
min_sample_split : 2~10(높을수록 과적합 방지) 최소단위로 나누는 샘플링 갯수 , 작으면 너무 fine하게 쪼개어짐 -> 과적합
max_depth : 2~10 (높을수록 깊어서 과적합)
min_samples_leaf : 2~10(높을수록 과적합 방지)

알고리즘에 맞게 사용하면 된다.