.NET Tutorials, Forums, Interview Questions And Answers
Welcome :Guest
Sign In
Win Surprise Gifts!!!

Top 5 Contributors of the Month
Gaurav Pal
Post New Web Links

Clustering algorithms

Posted By:      Posted Date: May 22, 2011    Points: 0   Category :

I am trying to understand clustering algorithm trying to understand and get information about outliers...

I am trying to get the information from my audit logs that would have information like

Session(Unique Key)

source IP 


time spent 

start time 


Now few of the outliers which i want to able to report are

--One user constantly trying to make multiple attempts in a x amount of time ( lets say 10 times in 30 mins)

-- One would be same user Id coming up from different IP 

-- One would be  the amount time spent is more than x hours 

-- if start time are off the peak hours...


My list of questions is,  inorder to develop the cluster , how would i need to present the data in DSV 

Do I need to aggregate the data ? Because a user can have more session it will always show up as different session ?

Do i need to create some threshold value to predict outliers on ......?

I just want to detect the anomoly in the patter of data ...  ?

How would i present the data for a same user who logged in 10 times in 30 mins and have 10 different sessionid but is coming from same ip ? address


Could some one help with a direction 


Thank you 



View Complete Post

More Related Resource Links

Windows with C++: Exploring High-Performance Algorithms


See how you can gain efficiency in surprising ways by looking closely at your algorithms, the data they operate on, and the hardware you're designing for.

Kenny Kerr

MSDN Magazine October 2008

Genetic Algorithms: Survival of the Fittest: Natural Selection with Windows Forms


Genetic Programming is an evolutionary algorithm that employs reproduction and natural selection to breed better and better executable computer programs. It can create programs that implement subtle, non-intuitive solutions to complex problems. By taking a well-known example from the Genetic Programming community and implementing it with the .NET Framework, this article demonstrates that CodeDOM and Reflection provide all the facilities that are needed to do Genetic Programming effectively.

Brian Connolly

MSDN Magazine August 2004

Tamper-Resistant Apps: Cryptographic Hash Algorithms Let You Detect Malicious Code in ASP.NET


Cryptographic hash algorithms produce fixed-length sequences based on input of arbitrary length. A given input always produces the same output, called a hash code. Using these algorithms, you can compute and validate hash codes to ensure that code running on your machine has not been tampered with or otherwise changed. ASP.NET provides a software mechanism for validating hash code fingerprints for every page requested by a client. In this article, the author shows how to use hash codes with ASP.NET applications to detect tampering and prevent malicious code from running when tampering is detected.

Jason Coombs

MSDN Magazine September 2002

C++ and STL: Take Advantage of STL Algorithms by Implementing a Custom Iterator


There are many benefits to using the Standard Template Library (STL) for C++ development, including the ability to use generic data structures and algorithms. To use the STL algorithms, an STL-conforming container is required. Iterating through the Internet Explorer cache is an informative exercise, but the cache is not an STL-conforming container. So, to use the STL algorithms to search and enumerate the Internet Explorer cache, an adapter is needed. Building such an adapter-an STL-conforming iterator-is the topic of this article. Also provided is an overview of the components of the STL and the Win32 Internet APIs used.

Samir Bajaj

MSDN Magazine April 2001


Hi  I am a newbie for data mining. As part of a project I need to categorise a set of customers of a grocery based on their age, gender, and average spending. I believe I should use the Clustering algorithm. I created a mining model a structure and a cube as well from the existing data base. But i can't get much further than that. What kind of queries should i write to "cluster" the customers and get their relevant clusters??  

Would you use clustering to find documents similar to this one?

Hi, I am taking a BI Developer course and enrolled at a school where the teacher knows nothing about data mining. So I designed a project for myself and am trying to surmount the curve without the benefit of a mentor. Please have patience with me. I have a database full of documents where the entire text resides in a single field (nvarchar(max)) I want to find documents that are similar to the one I pick. I am not sure what approach I should take to get there..         1) create clusters for the entire database and then see what cluster my document is in OR                 2) some-how train a model on the single document and then let the DM engine go find the matches.   I have tried approach 1, but the similarity between the documents is not very high (at least in the few of thousands I looked at) I do not see how to do approach 2, unless I cluster with a test set size of 1, and that doesnt seem right.   Is clustering even the way to go?   Thanks, struggling in Austin

Clustering on cube slices

Hi, I would like to use the clustering algorithm "seperately" on each cube slice.  The problem description: Having many employes and their arrival-to-work time I created a cube with a fact table and 3 dimensions: Employee, Time, Office and I would like to run the clustering algorithm seperately for each employee to receive his arrival-to-work clusters and to find his exceptions e.g., for the employee John who has the following arrival time records: 08:05, 07:55, 8:10, 8:07, 10:30, 10:31, 10:28, 15:00 I want to receive two clusters, the first one will include  08:05, 07:55, 8:10, 8:07 The second one will include 10:30, 10:31, 10:28 (and the exception will be 15:00) Having many employees I would like to create one data mining model and to use the cube slice and dice options    Searching the web and the MSDN I didn't find whether and how it can be done - I will very appreciate your help   Thanks in advance, Lora

SQL 2008 R2 Full Clustering

I was recently informed during a Microsoft Roadmap meeting that SQL2008R2 can now be fully clustered. That is full load balancing across SQL servers not just Failover. However, having come away from the meeting and done some investigation I cannot find any references as to how to setup SQL2008R2 in anything other than a failover cluster. My question is ... Is SQL 2008 R2 fully clusterable ... and if Yes where can I find some documentation helping with its setup ?   Thanks in advance Derek

20,000 foot view: Sql Server clustering vs. Windows OS Clustering

I read that Sql Server Failover requires Windows Sevrer clustering to be in place first. (The former builds on the latter). Why is this the case? In other words, what does "sql server clustering" bring to the mix  that normal Windows clustering doesn't supply? TIA, Barkingdog  

troubleshooting clustering

i need step by step for installations of clustering in sql server 2008 environment? please help me on this.

Clustering of Sql server

Hi Everybody, please answer this questions this will be very helpful to me     1. discuss at a high level how to add a new node to an existing SQL cluster in both SQL 2008 and SQL 2005. Begin with SQL 2005.   2. You are a DBA.  The SAN Admin has provisioned a LUN on the SAN and presented it to the nodes and made it clustered storage. At a high level, in either 2005 or 2008 walk through adding the new storage to the SQL instance and what is very important not to forget?   3. Beginning in 2005, there is a SERVERPROPERTY we can now use in T-SQL to discover which node the SQL instance is running on?     Thank you Regards rcnj

SQL clustering 2008

While going to install sql2008 in 2008 OS sp1 unable to install Error " Managed SQL Server Landing Page has stopped Working"

Getting Started -> clustering

Hello All,   We currently have the following:   SERVER A2000 with SQL Server 2000 SERVER B2005 with SQL Server 2005 SERVER C2005 with SQL Server 2005   SERVER B2005 is our research database and currently has a growing number of survey-based applications that each has its own database {SurveySystem1, SurveySystem2, SurveySystem3, SurveySystem4, etc...} ..... each of these applications are C# .net applications.   Over time we have noticed that if one of the SurveySystem applications (for example: SurveySystem2) has a large simultaneous response (100+ participants all responding to a survey at the same time), the SurveySystem2 website application will slow down and some participants will not be able to complete the survey.   We think the problem is the SQL Server 2005 database... and that it is not able to handle that many simultaneous users.  So, we are thinking of upgrading (BOTH to 2008 AND to a clustered environment).....   QUESTION: can someone give me an overview of: (1)What we will need to move to this new environment (software + hardware); (2)Do you think this will fix the problems that the participants are experiencing, or do you think the bottleneck is 'more likely' somewhere else? (3)Any words of caution?   thanks!  

In windows server 2008 standard edition feature Failover clustering is not looking

In windows server 2008 standard edition feature Failover clustering is not looking.Yet microsoft is telling that we give 2 node cluster support on 2008 server standard edition.Therefore I have suggested the company to buy it and now I am fixed due to this.  Please Help  "SQLSERVER DBA" "INDIA"

SQL Server 2008 R2 Clustering Questions?

Hi All, I have got bit Idea for my question from http://social.msdn.microsoft.com/Forums/en-US/sqlsetupandupgrade/thread/bff2b555-f25e-4644-a58f-264611818971 I have several questions 1. We have two Serves with Different Configuration (Server Model, CPU, Memory, etc.). Can we do Active-Active Failover Clustering on the same? 2. Can we do Database Level Clustering in SQL Serve 2008 R2 3. Where I will get Documentation for Clustering SQL Serve 2008 R2? Thanks, Ashish K India  

Is there any article or webcast for how to set up a SQL Server Clustering in Hyper-V environment?

I have already set up a Windows Server 2008 R2 Enterprise x64 with Hyper-V installed for trying out different features in SQL Server.  I'm interesting in trying out SQL Server Clustering.  I have only set up Windows 7 Enterprise x64, Windows 7 Pro x64,  and Windows XP PRO SP2 x86 VMs.  Can I use this VMs for SQL Server 2005 or 2008 Clustering?  Is there any article or webcast for how to set up a SQL Server Clustering in Hyper-V environment?
ASP.NetWindows Application  .NET Framework  C#  VB.Net  ADO.Net  
Sql Server  SharePoint  Silverlight  Others  All   

Hall of Fame    Twitter   Terms of Service    Privacy Policy    Contact Us    Archives   Tell A Friend