Steven Smiley
1 min readJan 13, 2020

--

Thank you Peter. Cross validation is used with GridSearchCV using a k-fold of 10. There is also the split so that 20% of the unseen data is tested. It’s common to use training, validation, and testing. Sorry that wasn’t clear.

I actually tried different forms of scaling and standardizing. I went with the MinMaxScaler because it gave the best diagnostic. That is true that the outliers are not as reduced as with standardizing the data, however the data is more intact. They usually recommend MinMaxScaler first. Here is the article on that.

https://towardsdatascience.com/scale-standardize-or-normalize-with-scikit-learn-6ccc7d176a02

--

--

Steven Smiley
Steven Smiley

Written by Steven Smiley

Lead Machine Learning Engineer who also enjoys writing about Data Science, CV, DL, ML, AI, Python https://www.linkedin.com/in/stevensmiley1989/

No responses yet