I am taking a BI Developer course and enrolled at a school where the teacher knows nothing about data mining.
So I designed a project for myself and am trying to surmount the curve without the benefit of a mentor.
Please have patience with me.
I have a database full of documents where the entire text resides in a single field (nvarchar(max))
I want to find documents that are similar to the one I pick.
I am not sure what approach I should take to get there..
1) create clusters for the entire database and then see what cluster my document is in
2) some-how train a model on the single document and then let the DM engine go find the matches.
I have tried approach 1, but the similarity between the documents is not very high (at least in the few of thousands I looked at)
I do not see how to do approach 2, unless I cluster with a test set size of 1, and that doesnt seem right.
Is clustering even the way to go?
Thanks, struggling in Austin
View Complete Post