I am new to data mining and I am following the basic data mining tutorial on adventureworks. The tutorial shows how to get the optimal model using the decision tree algorithm. When I use the
explorer, it shows that there are 8 variables used to predict bike buyers. I know that I can use the slider to find the predictors that share the strongest links with bike buyers. However, what are the criteria to select the optimal number of variables?
In the basic tutorial example, what’s the optimal number of variables? How can I view the model as a linear combination of the independent variables?
I come from a statistics background. We use R^2 and p-value to determine the strength of the model and whether a particular variable is statistically significant. We also use coefficients to
build a model but I don’t know how to do that in SQL Server 2008. What are the metrics to use to determine the fit of the model after selecting the appropriate algorithium?
View Complete Post