Search Results
| Search type | Search syntax |
|---|---|
| Tags | [tag] |
| Exact | "words here" |
| Author |
user:1234 user:me (yours) |
| Score |
score:3 (3+) score:0 (none) |
| Answers |
answers:3 (3+) answers:0 (none) isaccepted:yes hasaccepted:no inquestion:1234 |
| Views | views:250 |
| Code | code:"if (foo != bar)" |
| Sections |
title:apples body:"apples oranges" |
| URL | url:"*.example.com" |
| Saves | in:saves |
| Status |
closed:yes duplicate:no migrated:no wiki:no |
| Types |
is:question is:answer |
| Exclude |
-[tag] -apples |
| For more details on advanced search visit our help page | |
Results tagged with clustering
Search options answers only
not deleted
user 7290
Cluster analysis is the task of partitioning data into subsets of objects according to their mutual "similarity," without using preexisting knowledge such as class labels. [Clustered-standard-errors and/or cluster-samples should be tagged as such; do NOT use the "clustering" tag for them.]
46
votes
Accepted
Clustering methods that do not require pre-specifying the number of clusters
Clustering algorithms are often categorized into broad kingdoms:
Partitioning algorithms (like k-means and it's progeny)
Hierarchical clustering (as @Tim describes)
Density based clustering (such as … It might help you to read an overview of different types of clustering algorithms. The following might be a place to start:
Berkhin, P. "Survey of Clustering Data Mining Techniques" (pdf) …
9
votes
Can any dataset be clustered or does there need to be some sort of pattern in the data?
As other answers have pointed out, determining if the clustering represents 'real' latent groups is a very difficult task. … , and the section on evaluating clusterings in Wikipedia's clustering entry). None of the methods are perfect, however. …
4
votes
Accepted
Clustering in data
Thus the 'clustering' in $X$ is not necessarily a problem.
In addition, data tend to have less leverage* over the fitted regression line the closer they are to the mean of $X$. …
1
vote
Clustering Matrices
And you can use whatever clustering algorithm you like and find appropriate (e.g., k-means) likewise. …
8
votes
Accepted
Clustering a dense dataset
The clustering algorithms I am familiar with that can do so are fuzzy k-means, Gaussian mixture modeling, and clustering by kernel density estimation. … Clustering via nonparametric density estimation, Statisticis and Computing, 17, 1, pp. 71-80. …
2
votes
Accepted
Visualise clusters and relationship with features; alternative to chord diagram
I agree with @Nick Cox. This figure is pretty, but doesn't seem very good to me except as eye candy. In essence, this is a Sankey plot (a.k.a., river plot or flow diagram) with just two levels where …
1
vote
How to assess the consistency of clustering
You can see an example where I do this here: How to use both binary and continuous variables together in clustering?
This works out fine with a small number of clusterings. … strength of association / correlation between nominal variables, or
Determine the probability that two patterns are consistent across the clusterings in that if they are clustered together in one clustering …
6
votes
Accepted
Why is clustering data with many categorical variables so slow?
An alternative is to use a clustering algorithm that can operate over a distance matrix. Then you can calculate the distances only once. … here: How to use both binary and continous variables together in clustering? …
2
votes
Is there a distance metric that measures the ratio between the two rows of data?
I agree with @Anony-Mousse. Using anything in statistics, simply because it is the default will never be the best way to go. Take your situation as an example: At the core of Euclidean distance is …
1
vote
Clustering as a means of splitting up data for logistic regression
I want to acknowledge from the beginning that I know relatively little about clustering. However, I don't see the point of the procedure you describe. …
2
votes
Does the sign of the adjusted residuals matter in a crosstable?
The null hypothesis for a $\chi^2$ analysis of a contingency table is that the rows and columns are independent of each other. Under this null model, the expected count is:
$$
\hat E_{ij}=\left(\frac …
9
votes
Accepted
Is it necessary to normalize data for hierarchical clustering of mixed variables using compl...
It is common to normalize all your variables before clustering. … The fact that you are using complete linkage vs. any other linkage, or hierarchical clustering vs. a different algorithm (e.g., k-means) isn't relevant. …
7
votes
Accepted
A non parametric clustering algorithm suitable for high dimensional data
The most common clustering technique that meets your requirements would be DBSCAN. This finds points that are continuous by virtue of having shared nearest neighbors. …
2
votes
Accepted
Is it possible to select features from completely unlabeled data?
Having figured out what variables are relevant for what you want to do, and having conducted a clustering, it isn't difficult to see which variables play the strongest roles in determining the the clustering … Concept learning and feature selection based on square-error clustering. Machine Learning, 35:25–39, 1999. …
2
votes
Clustering numeric, categorical, and multivalue categorical data
With mixed data types the basic answer is to use Gower's distance (see @ttnphns' thorough explainer here: Hierarchical clustering with mixed type data - what distance/similarity to use?). … How to use both binary and continuous variables together in clustering?). …