Started this discussion. Last reply by Vincent Granville May 28, 2018. 3 Replies 1 Like

Hi everybody, here's a summary of my study followed with few question on randomforestPopulation : 3300 observables, minority class 150 observables (~4%)Predictors : ~70 , just 1 numerical, all others…Continue

Started this discussion. Last reply by Fabrice JOURDAN May 14, 2018. 2 Replies 1 Like

Hi everybody, here's a summary of my study followed with few question on randomforestPopulation : 3300 observables, minority class 150 observables (~4%)Predictors : ~70 , just 1 numerical, all others…Continue

Fabrice JOURDAN has not received any gifts yet

Tim Matteson liked Fabrice JOURDAN's discussion How to check/optimize cross validation with randomforest on imbalanced classes ?

May 31, 2018

Vincent Granville replied to Fabrice JOURDAN's discussion How to check/optimize cross validation with randomforest on imbalanced classes ?

"Yes 10 predictors are OK, but the data set seems a bit small, so the risk of over-fitting is higher than with (say) 50,000 observations."

May 28, 2018

Fabrice JOURDAN replied to Fabrice JOURDAN's discussion How to check/optimize cross validation with randomforest on imbalanced classes ?

"Thank's Vincent,
I understand for over-sampling i'll see what to do.
Why did you say " I would use less than 5 predictors", is it in comparison of 150 observables in minority class ?
Let say 30 observables per predictor ?…"

May 28, 2018

Vincent Granville replied to Fabrice JOURDAN's discussion How to check/optimize cross validation with randomforest on imbalanced classes ?

"Your data set is a bit small. The classic solution is to over-sample under-represented classes. I've been doing it routinely but on data sets with 50+ million observations, where the class "fraud" (versus "non fraud")…"

May 27, 2018

Fabrice JOURDAN's discussion was featured### How to check/optimize cross validation with randomforest on imbalanced classes ?

Hi everybody, here's a summary of my study followed with few question on randomforestPopulation : 3300 observables, minority class 150 observables (~4%)Predictors : ~70 , just 1 numerical, all others are booleanI use features selection in order to reduce the number of predictorsI remove predictors with lowest variance, lowest correlation with my target variable, also i use t-test (mean difference between 2 classes)I keep around 20 predictors for 150 observables in my signalNB:I didnt use yet…See More

May 25, 2018

Fabrice JOURDAN posted a discussion### How to check/optimize cross validation with randomforest on imbalanced classes ?

Hi everybody, here's a summary of my study followed with few question on randomforestPopulation : 3300 observables, minority class 150 observables (~4%)Predictors : ~70 , just 1 numerical, all others are booleanI use features selection in order to reduce the number of predictorsI remove predictors with lowest variance, lowest correlation with my target variable, also i use t-test (mean difference between 2 classes)I keep around 20 predictors for 150 observables in my signalNB:I didnt use yet…See More

May 25, 2018

Fabrice JOURDAN replied to Fabrice JOURDAN's discussion RandomForest for imbalanced classes

""F1-score"
I determine f1-score during "Parameters tuning" of RandomForest. For each set of parameters (few hundreds) i determine the thresholdwhich give me the best f1score (mostly between 0.09 and 0.13). So i do not use display…"

May 14, 2018

Danylo Zherebetskyy replied to Fabrice JOURDAN's discussion RandomForest for imbalanced classes

"These are some questions that, hopefully, may help to move on:
- for f1-score, what is the probability threshold for the classification? is it standard 0.5 or you determined it from AUROC curves?
- since there is one continuous feature, the trees…"

May 14, 2018

Fabrice JOURDAN's discussion was featured### RandomForest for imbalanced classes

Hi everybody, here's a summary of my study followed with few question on randomforestPopulation : 3300 observables, minority class 150 observables (~4%)Predictors : ~70 , just 1 numerical, all others are booleanI use features selection in order to reduce the number of predictorsI remove predictors with lowest variance, lowest correlation with my target variable, also i use t-test (mean difference between 2 classes)I keep around 20 predictors for 150 observables in my signalNB:I didnt use yet…See More

May 9, 2018

Fabrice JOURDAN posted a discussion### RandomForest for imbalanced classes

Hi everybody, here's a summary of my study followed with few question on randomforestPopulation : 3300 observables, minority class 150 observables (~4%)Predictors : ~70 , just 1 numerical, all others are booleanI use features selection in order to reduce the number of predictorsI remove predictors with lowest variance, lowest correlation with my target variable, also i use t-test (mean difference between 2 classes)I keep around 20 predictors for 150 observables in my signalNB:I didnt use yet…See More

May 9, 2018

- Professional Status
- Consultant

- Years of Experience:
- 15

- Your Job Title:
- Consultant

- How did you find out about DataScienceCentral?
- Internet

- Interests:
- Finding a new position, Networking, New venture

- No comments yet!

© 2019 Data Science Central ® Powered by

Badges | Report an Issue | Privacy Policy | Terms of Service