.NET Tutorials, Forums, Interview Questions And Answers
Welcome :Guest
Sign In
Win Surprise Gifts!!!

Top 5 Contributors of the Month
Gaurav Pal
Post New Web Links

Clustering Goodness Measure?

Posted By:      Posted Date: April 10, 2011    Points: 0   Category :


In the MS Sequence Clustering algorithm there is a parameter named “CLUSTER_COUNT”, and the definition of the parameter states that if we set this value to 0, the algorithm automatically choose the best number of clusters.

So far, so good.

assume we run the algorithm with CLUSTER_COUNT=0 and got 5 clusters.

My questions is:

How does the algorithm understand the best number of clusters is 5? How can you show me that 5 clusters is better than 4 or 6 clusters? I mean, what goodness measure does the algorithm use?

Thanks for any help

View Complete Post

More Related Resource Links

CLR Inside Out: Measure Early and Often for Performance, Part 2


In the second of a two-part series, Vance Morrison delves into the meaning of performance measurements, explaining what the numbers mean to you.

Vance Morrison

MSDN Magazine May 2008

CLR Inside Out: Measure Early and Often for Performance, Part 1


In this month's column, get the inside scoop on how to build performance into your apps from the start, rather than dealing with the fallout after you deploy them.

Vance Morrison

MSDN Magazine April 2008

Decision tree dependency measure

The dependency network viewer executes the stored procedure System.Microsoft.AnalysisServices.System.DataMining.DecisionTreesDepNet.DTGetNodeGraph which yields some integer measure that represents the strength of influence of input variables on the output variable. How this measures are calculated (information gain, chi-square, correlation etc)? It seems as if conditional influences are not taken into account: if A and B are two major factors which impact variable C, but A and B are strongly correlated so that C given A is not dependent on B, dependency algorithm will still depict B as the second major factor. Am I right? So there is no analogy of tests like conditional chi-square, conditional mutual information or partial correlation?   It seems to me that MS Data Mining lacks some kind of Bayesian Networks algorithm wich would illustrate conditional dependencies. That would give a useful insight on how various factors are related to each other and through what kind of chains a change in some input variable transforms to the output.   Thank you.


Hi  I am a newbie for data mining. As part of a project I need to categorise a set of customers of a grocery based on their age, gender, and average spending. I believe I should use the Clustering algorithm. I created a mining model a structure and a cube as well from the existing data base. But i can't get much further than that. What kind of queries should i write to "cluster" the customers and get their relevant clusters??  

Would you use clustering to find documents similar to this one?

Hi, I am taking a BI Developer course and enrolled at a school where the teacher knows nothing about data mining. So I designed a project for myself and am trying to surmount the curve without the benefit of a mentor. Please have patience with me. I have a database full of documents where the entire text resides in a single field (nvarchar(max)) I want to find documents that are similar to the one I pick. I am not sure what approach I should take to get there..         1) create clusters for the entire database and then see what cluster my document is in OR                 2) some-how train a model on the single document and then let the DM engine go find the matches.   I have tried approach 1, but the similarity between the documents is not very high (at least in the few of thousands I looked at) I do not see how to do approach 2, unless I cluster with a test set size of 1, and that doesnt seem right.   Is clustering even the way to go?   Thanks, struggling in Austin

Clustering on cube slices

Hi, I would like to use the clustering algorithm "seperately" on each cube slice.  The problem description: Having many employes and their arrival-to-work time I created a cube with a fact table and 3 dimensions: Employee, Time, Office and I would like to run the clustering algorithm seperately for each employee to receive his arrival-to-work clusters and to find his exceptions e.g., for the employee John who has the following arrival time records: 08:05, 07:55, 8:10, 8:07, 10:30, 10:31, 10:28, 15:00 I want to receive two clusters, the first one will include  08:05, 07:55, 8:10, 8:07 The second one will include 10:30, 10:31, 10:28 (and the exception will be 15:00) Having many employees I would like to create one data mining model and to use the cube slice and dice options    Searching the web and the MSDN I didn't find whether and how it can be done - I will very appreciate your help   Thanks in advance, Lora

display non measure, non dimension fields in drill down

Hi All, I created my fact table with more fields than just the foreign keys linking dimensions and the field(s) to be used as mesures.  I did this hoping that on drill down I would be able to see the extra fields so that the user would have access to detail information on the records making up the measure amount.  The extra fields do not appear on drill down.  How can I make them appear, or am I on the wrong track? Thanks for any help

SQL 2008 R2 Full Clustering

I was recently informed during a Microsoft Roadmap meeting that SQL2008R2 can now be fully clustered. That is full load balancing across SQL servers not just Failover. However, having come away from the meeting and done some investigation I cannot find any references as to how to setup SQL2008R2 in anything other than a failover cluster. My question is ... Is SQL 2008 R2 fully clusterable ... and if Yes where can I find some documentation helping with its setup ?   Thanks in advance Derek

20,000 foot view: Sql Server clustering vs. Windows OS Clustering

I read that Sql Server Failover requires Windows Sevrer clustering to be in place first. (The former builds on the latter). Why is this the case? In other words, what does "sql server clustering" bring to the mix  that normal Windows clustering doesn't supply? TIA, Barkingdog  

troubleshooting clustering

i need step by step for installations of clustering in sql server 2008 environment? please help me on this.

Clustering of Sql server

Hi Everybody, please answer this questions this will be very helpful to me     1. discuss at a high level how to add a new node to an existing SQL cluster in both SQL 2008 and SQL 2005. Begin with SQL 2005.   2. You are a DBA.  The SAN Admin has provisioned a LUN on the SAN and presented it to the nodes and made it clustered storage. At a high level, in either 2005 or 2008 walk through adding the new storage to the SQL instance and what is very important not to forget?   3. Beginning in 2005, there is a SERVERPROPERTY we can now use in T-SQL to discover which node the SQL instance is running on?     Thank you Regards rcnj

SQL clustering 2008

While going to install sql2008 in 2008 OS sp1 unable to install Error " Managed SQL Server Landing Page has stopped Working"

Measure Dependent Calculated Measure

Hi Was looking to create a calculated Measure Which needs to be dependent on another column of the FACT table. I need to have the Average of the particular Measure based on the fact that this particular 'set' has a common Related dimension's The example would give a better picture. I have a set of dimensions like STUDENT, TEACHER, COURSE etc as my dimensions and the grades, class standing based on marks for the subject, percentage of marks scored, Percentile, Marks Scored,StudentID and the related PK's etc are the measures on my fact table Here I want the Avg of the marks scored by the Students based on the grade say is  A or B or C  I was looking to create a calculated measure by using MDX which was a follows (NOT THE CORRECT SYNTAX JUST THE IDEA) SUM  (MARKS SCORED)/COUNT(DISTINCT StudentID) WHERE GRADE = A As this would be measured against a particular subject or a teacher or a Course. I was hoping if I could get help in putting this in a proper SYNATAX   Thanks in advance.  

Please help with converting calculated measure.

Hi, I have to dimensions: [Currency] with members CAD and USD and [Convet to] with members None, CAD, USD. The measures involved in calculation are [Amount Billed] and [Amount Received] The problem calculation is defined pretty simple: MEMBER CURRENTCUBE.[Measures].[AR] AS (abs([Measures].[Amount Billed] - [Measures].[Amount Received])>0.1,[Measures].[Amount Billed] - [Measures].[Amount Received] ,null); and worked perfect without currency conversion . I've read some info about currency conversion and designed this calculation to convert: scope (leaves([Time 2])); scope([Convert To].[Convert To].&[USD],[Currency].[Currency].&[CAD]); [Measures].[Amount Received] = ([Measures].[Amount Received],[Convert To].[Convert To].&[None])/validmeasure([Measures].[Cdrate]); [Measures].[Amount Billed] = ([Measures].[Amount Billed],[Convert To].[Convert To].&[None])/validmeasure([Measures].[Cdrate]); End Scope; scope([Convert To].[Convert To].&[CAD],[Currency].[Currency].&[USD]); [Measures].[Amount Received] = ([Measures].[Amount Received],[Convert To].[Convert To].&[None])*validmeasure([Measures].[Cdrate]); [Measures].[Amount Billed] = ([Measures].[Amount Billed],[Convert To].[Convert To].&[None])*validmeasure([Measures].[Cdrate]); End Scope; End Scope; That calculation provides correct results for both [Amount Billed] and [Amount

processing measure group : memory error : the operation cannot be completed because the memory quota

Hi, I'm stucked with this problem. Untill last week, the cube processed without any problem. Since last week, I'm getting this error. I have been searching in different forums, and I tried some suggestions, like changing memory limit properties, ... It is getting worse.. So I reset all properties to default again. I am running SQL-Server + MS-AS 2005 SP2 on server with 4GB of memory. This is a dedicated server, nothing else is running on it. The fact table has +/- 14 million records, several dimensions en 2 measure groups. I don't have problems to process the dimensions, but when I try to process the cube or the measure groups of that cube separately , the error persists. I have changed the datasource view, and replaced the fact table by a Named query. Even when I put a 'WHERE datapart( year , fact_date ) >= 2009 ' clause to reduce the number of records to +/- 5 million, I'm still getting the error. I don't understand what is wrong, the cube always processed since +/- 2 years. As I said, I have found a lot of this kind of Issues on different websites, I have been trying to change some properties. But this still does not solve the problem. Could it be that MS-AS settings are corrupt somewhere ? Is it a good idea to re-install MS-AS 2005 + SP1 + SP2 ? Or is there another reason possible ? I really appreciate any kind of help, because I'm

Optimize Calculated Measure containing COUNT EXISTING

Hi, My goal is to change the text color of all cells that contain aggregated values. Currently I achieve it like this: I COUNT the members of all attributes of all dimensions. To be multi-select-safe I am using the EXISTING keyword. If the members count of at least one dimension is not 1 than the background color of the cell is changed. CREATE MEMBER CURRENTCUBE .[Measures].[SingleCellSelected]   AS iif ((COUNT (Existing ([Dim1].[Attr1].[ Attr1].MEMBERS ))=1) AND (COUNT (Existing ([Dim1].[Attr2].[Attr2].MEMBERS ))=1) AND (COUNT (Existing ([Dim2].[Attr3].[Attr3].MEMBERS ))=1) AND (COUNT (Existing ([Dim2].[Attr4].[Attr4].MEMBERS ))=1) AND (COUNT (Existing ([Dim3].[Attr5].[Attr5].MEMBERS ))=1),1,0),   VISIBLE = 0  ;    SCOPE ([Measures].AllMembers ); FORE_COLOR (this ) = iif ([Measures].[SingleCellSelected]=1,0,16744448); END SCOPE ; This approach works but it performs badly with attributes with many members. Do you have any idea how to optimize this? Thank you!

Measure - New Customer Count

I need to create a measure which counts the number of new customers for each time period. My fact table contains the customer number and its easy to create a distinct count of customers per month/week/year. I'm thinking I need to obtain the first order date for the given customer and compare if the select period is within the time frame, then include or exclude.  
ASP.NetWindows Application  .NET Framework  C#  VB.Net  ADO.Net  
Sql Server  SharePoint  Silverlight  Others  All   

Hall of Fame    Twitter   Terms of Service    Privacy Policy    Contact Us    Archives   Tell A Friend