Questions tagged [clustering]
Cluster analysis is the task of partitioning data into subsets of objects according to their mutual "similarity," without using preexisting knowledge such as class labels. [Clustered-standard-errors and/or cluster-samples should be tagged as such; do NOT use the "clustering" tag for them.]
4,047 questions
2
votes
2
answers
105
views
Handle outliers in clustering
I’m working on a cluster analysis of Italian provinces based on three fire-related indicators: total burned area (ha), burned area per fire, fire density.
Because these variables are measured on ...
1
vote
1
answer
33
views
Does it make sense to represent monetary value with a discrete distribution?
I am implementing a Bayesian non-parametric clustering model based on the paper "Bayesian clustering of multiple zero-inflated outcomes" by Franzolini et al. (2023) to my household spending ...
0
votes
0
answers
25
views
Modeling recurring monthly transactions with weekend-shift effects: DBSCAN vs rule-based temporal detection?
I have 3 months of categorized bank transaction data and need to identify recurring cash inflows and outflows for lending risk modeling.
Complications:
1. Income dates shift earlier when payday falls ...
0
votes
0
answers
35
views
Role of Z-Tests in Kernel Density Estimation for Cluster Classification
In a recent bioinformatics paper, the authors describe a statistical/machine learning approach to classify clusters of cells using kernel density estimation (KDE) and Z-scores. While the details of ...
1
vote
1
answer
52
views
Vector direction of individual clusters after PCA
Suppose I have two multi-dimensional population samples - $A$ and $B$.
I hypothesise that $\mathbb{E}[A]$ and $\mathbb{E}[B]$ are orthogonal in this high-dimensional space.
To test this hypothesis, I ...
1
vote
0
answers
33
views
Supervised Clustering Algorithms / Full Graph Edge Prediction Algorithms
I have an interesting problem I am trying to solve and I cannot find any non-deep methods available to solve it.
Problem Description
Plain
The real life problem this relates to are handwritten digits ...
2
votes
1
answer
46
views
Pattern analysis for time between events data
I am trying to subset data based on a pattern of "strings" or clusters of food deliveries to young that I see in my data (see plots labeled 2, 4, 5, 6, and 8 in the figure below for the most ...
0
votes
0
answers
27
views
How to identify and quantify main tendencies across participants from cluster membership heatmaps?
I'd appreciate your thoughts on the following problem.
I've created a heatmap plot (attached) showing the cluster membership ratio for each participant (in separate subplots) and condition (η).
Now, I'...
2
votes
1
answer
122
views
Examining country-level effects based on individual-level data combined with country-level data
I am new to working with country-level effects in comparative OLS regression with individual-level data. Are there any good resources for this?
Suppose my dependent variable is social integration (an ...
0
votes
0
answers
45
views
Are there clustering algorithms or preprocessing strategies tailored for zero-inflated and continuous data types?
I am currently working on the project where I need to assign customers across N recipes before AB testing such that KPIs for each customer are balanced across recipes (reduce pre-test bias)
Dataset ...
0
votes
0
answers
57
views
How to peform clustering on heavily right skewed data and zero inflated data
I am currently working on clustering continuous variables (such as AOV, RPV, and conversions(conversion/visits)). The variables are heavily right skewed with long tails and one variable is dominated ...
3
votes
1
answer
130
views
Bayesian Clustering with a Finite Gaussian Mixture Model with Missing Data
I would like to perform clustering with a finite Gaussian Mixture model, however, I have missing data (some features are missing at random). I am using Variational Inference to fit my Bayesian GMM. Is ...
2
votes
0
answers
73
views
Estimating number of clusters using Scikit Bayesian GMM
I am generating clustering data using the Bayesian mixture of Gaussian models described in Bishop's Pattern Recognition and Machine Learning textbook, with model parameters drawn from the following ...
1
vote
1
answer
59
views
Mixture-Based Clustering for Ordered Stereotype Model - Distance Scores
I have a 5-variable/3 category-level ordinal survey data set. E.g. 5 health variables ranked 1-3 (good-moderate-poor).
I want to row-cluster different responses. But also, I want determine whether ...
1
vote
0
answers
54
views
Are equal and diagonal variance matrices implicitly assumed in k-means clustering?
When applying k-means clustering, I understand that the goal is to partition the dataset by assigning each point to its nearest cluster center. However, I’ve come across statements that k-means can be ...