I'm using a Data Set of 60.000 registers and 60 variables. My objective is to predict the target variable which is made up of 0's and 1's. The 1's are about 5% of the cases.
I am currently using an approach of building n-1 models (n being the number of variables), chosing the best model and then creating another n-2 models and so forth.
The problem is that
If I use the same model, with exactly the same parameters and variables, but on a different structure I get very different results in the classification (confusion) matrix. I know that the Holdout seed is random and as such minor alterations are to be expected.
But these are actually big alterations. My data set should be big enough to avoid those types of alterations.
In fact my best model from structure A if applied on structure B is actually much worst then most of the discarded models from structure A.
Could it be that the partition between training set, validation set and test set do not keep the proportion of the target variable?
Also is there a way to get the random holdout seed values used in one of the models?
Thanks in advanced
View Complete Post