Hello everyone,

at the moment I try to figure out how the implementation of k-means in SSAS works, especially how the disctance between 2 datapoints (with continuous and discrete variables) is measured.

I figured out, that in case of only coninuous variables the square of the eucledian distance is used.

Furthermore I figured out that the distance between only discrete variables is described as 1 - the probability of that value in a cluster.

My questions are:

-How does the algorithm work with continuous AND disrete variables? Is the distance calculated by sum the eucledian distance (e.g. 13,2) and the probability of a discrete value in a cluster (e.g. 0,4)?

-Is it necessary to scale the continuous variables to a lower range, maybe -1 to 1? How does SSAS handle this job

Unfortunately there are no sufficient documentations...

regards

BigMarchy

**View Complete Post**

## How do you create a custom BDC data field that allows for multiple selected values?