Search Results

Advanced Search Tips

Ask Question

Search type	Search syntax
Tags	[tag]
Exact	"words here"
Author	user:1234 user:me (yours)
Score	score:3 (3+) score:0 (none)
Answers	answers:3 (3+) answers:0 (none) isaccepted:yes hasaccepted:no inquestion:1234
Views	views:250
Code	code:"if (foo != bar)"
Sections	title:apples body:"apples oranges"
URL	url:"*.example.com"
Saves	in:saves
Status	closed:yes duplicate:no migrated:no wiki:no
Types	is:question is:answer
Exclude	-[tag] -apples
For more details on advanced search visit our help page

Results tagged with clustering

Search options answers only not deleted user 7290

35 results

Relevance Newest Score Active

Cluster analysis is the task of partitioning data into subsets of objects according to their mutual "similarity," without using preexisting knowledge such as class labels. [Clustered-standard-errors and/or cluster-samples should be tagged as such; do NOT use the "clustering" tag for them.]

46 votes

Accepted

Clustering methods that do not require pre-specifying the number of clusters

Clustering algorithms are often categorized into broad kingdoms: Partitioning algorithms (like k-means and it's progeny) Hierarchical clustering (as @Tim describes) Density based clustering (such as … It might help you to read an overview of different types of clustering algorithms. The following might be a place to start: Berkhin, P. "Survey of Clustering Data Mining Techniques" (pdf) …

gung - Reinstate Monica

150k

answered Oct 20, 2016 at 14:17

9 votes

Can any dataset be clustered or does there need to be some sort of pattern in the data?

As other answers have pointed out, determining if the clustering represents 'real' latent groups is a very difficult task. … , and the section on evaluating clusterings in Wikipedia's clustering entry). None of the methods are perfect, however. …

gung - Reinstate Monica

150k

answered Aug 22, 2017 at 17:23

4 votes

Accepted

Clustering in data

Thus the 'clustering' in $X$ is not necessarily a problem. In addition, data tend to have less leverage* over the fitted regression line the closer they are to the mean of $X$. …

gung - Reinstate Monica

150k

answered Nov 30, 2015 at 17:45

1 vote

Clustering Matrices

And you can use whatever clustering algorithm you like and find appropriate (e.g., k-means) likewise. …

gung - Reinstate Monica

150k

answered Mar 24, 2015 at 21:19

8 votes

Accepted

Clustering a dense dataset

The clustering algorithms I am familiar with that can do so are fuzzy k-means, Gaussian mixture modeling, and clustering by kernel density estimation. … Clustering via nonparametric density estimation, Statisticis and Computing, 17, 1, pp. 71-80. …

gung - Reinstate Monica

150k

answered Sep 16, 2015 at 23:09

2 votes

Accepted

Visualise clusters and relationship with features; alternative to chord diagram

I agree with @Nick Cox. This figure is pretty, but doesn't seem very good to me except as eye candy. In essence, this is a Sankey plot (a.k.a., river plot or flow diagram) with just two levels where …

gung - Reinstate Monica

150k

answered Apr 23, 2021 at 15:54

1 vote

How to assess the consistency of clustering

You can see an example where I do this here: How to use both binary and continuous variables together in clustering? This works out fine with a small number of clusterings. … strength of association / correlation between nominal variables, or Determine the probability that two patterns are consistent across the clusterings in that if they are clustered together in one clustering …

gung - Reinstate Monica

150k

answered Mar 16, 2021 at 18:57

6 votes

Accepted

Why is clustering data with many categorical variables so slow?

An alternative is to use a clustering algorithm that can operate over a distance matrix. Then you can calculate the distances only once. … here: How to use both binary and continous variables together in clustering? …

gung - Reinstate Monica

150k

answered Aug 9, 2015 at 19:14

2 votes

Is there a distance metric that measures the ratio between the two rows of data?

I agree with @Anony-Mousse. Using anything in statistics, simply because it is the default will never be the best way to go. Take your situation as an example: At the core of Euclidean distance is …

gung - Reinstate Monica

150k

answered Aug 11, 2015 at 2:51

1 vote

Clustering as a means of splitting up data for logistic regression

I want to acknowledge from the beginning that I know relatively little about clustering. However, I don't see the point of the procedure you describe. …

gung - Reinstate Monica

150k

answered Jun 30, 2012 at 22:32

2 votes

Does the sign of the adjusted residuals matter in a crosstable?

The null hypothesis for a $\chi^2$ analysis of a contingency table is that the rows and columns are independent of each other. Under this null model, the expected count is: $$ \hat E_{ij}=\left(\frac …

gung - Reinstate Monica

150k

answered Oct 19, 2012 at 3:51

9 votes

Accepted

Is it necessary to normalize data for hierarchical clustering of mixed variables using compl...

It is common to normalize all your variables before clustering. … The fact that you are using complete linkage vs. any other linkage, or hierarchical clustering vs. a different algorithm (e.g., k-means) isn't relevant. …

gung - Reinstate Monica

150k

answered Jan 24, 2016 at 15:09

7 votes

Accepted

A non parametric clustering algorithm suitable for high dimensional data

The most common clustering technique that meets your requirements would be DBSCAN. This finds points that are continuous by virtue of having shared nearest neighbors. …

gung - Reinstate Monica

150k

answered Jun 24, 2015 at 20:34

2 votes

Accepted

Is it possible to select features from completely unlabeled data?

Having figured out what variables are relevant for what you want to do, and having conducted a clustering, it isn't difficult to see which variables play the strongest roles in determining the the clustering … Concept learning and feature selection based on square-error clustering. Machine Learning, 35:25–39, 1999. …

gung - Reinstate Monica

150k

answered Mar 1, 2021 at 21:37

2 votes

Clustering numeric, categorical, and multivalue categorical data

With mixed data types the basic answer is to use Gower's distance (see @ttnphns' thorough explainer here: Hierarchical clustering with mixed type data - what distance/similarity to use?). … How to use both binary and continuous variables together in clustering?). …

gung - Reinstate Monica

150k

answered Mar 3, 2018 at 2:14

2 3 Next

15 30 50 per page

Stack Exchange Network

Search Results

Clustering methods that do not require pre-specifying the number of clusters

Can any dataset be clustered or does there need to be some sort of pattern in the data?

Clustering in data

Clustering Matrices

Clustering a dense dataset

Visualise clusters and relationship with features; alternative to chord diagram

How to assess the consistency of clustering

Why is clustering data with many categorical variables so slow?

Is there a distance metric that measures the ratio between the two rows of data?

Clustering as a means of splitting up data for logistic regression

Does the sign of the adjusted residuals matter in a crosstable?

Is it necessary to normalize data for hierarchical clustering of mixed variables using compl...

A non parametric clustering algorithm suitable for high dimensional data

Is it possible to select features from completely unlabeled data?

Clustering numeric, categorical, and multivalue categorical data

Hot Network Questions