I want to obtain a small optimal value of $k$ (with $k ≤ 5$) for k-means clustering on a dataset of size $5000$. I have used the BIC and the Gap statistic to determine the optimal number of clusters, and both methods indicated an optimal $k$ of $7$ or more. I would like to know if I can make an adjustment (for example, by multiplying $s(k+1)$ by a factor $c > 1$ or by including it as a penalty term in the BIC) so that I obtain a smaller optimal value of $k$.
The following are the calculations I have performed to compute the Gap statistic and the BIC:
BIC calculation $$\text{BIC}=n\ln(\frac{W}{n})+m\ln(n)$$ where $W$ is the total within-cluster sum of squares,
$m$ is the number of free parameters in the model, and $n$ is the total number of data points. The optimal $k$ is the one with least BIC value.
Gap statistics $$\text{Gap}(k)≥\text{Gap}(k+1)−s(k+1)$$ where $\text{Gap}(k)=\frac{1}{B}\sum_{b=1}^{B}\ln(W_{k} ^{∗(b)})−\ln(W_k)$.
Here $W_k$ is the within-cluster dispersion for k clusters, and $W_{k} ^{∗(b)}$ is the within-cluster dispersion of the $b^\text{th}$ reference dataset (from a total of $B$ reference datasets) generated from a distribution with no apparent clustering, for $k$ clusters.
Let me know if I made any errors, if there are other methods to get small cluster numbers, or any modifications I can make.