pydgc.clusterings
pydgc.clusterings.batch_kmeans_gpu module
- initialize(X, num_clusters, seed)[source]
initialize cluster centers
- Parameters:
X – (torch.tensor) matrix
num_clusters – (int) number of clusters
seed – (int) seed for kmeans
- Returns:
(np.array) initial state
- kmeans(X, num_clusters, distance='euclidean', batch_size=100000, cluster_centers=[], tol=0.0001, tqdm_flag=True, iter_limit=0, device=device(type='cpu'), gamma_for_soft_dtw=0.001, seed=None)[source]
perform kmeans
Reference: https://github.com/EdisonLeeeee/MAGI/blob/master/magi/batch_kmeans_cuda.py
- Parameters:
X – (torch.tensor) matrix
num_clusters – (int) number of clusters
distance – (str) distance [options: ‘euclidean’, ‘cosine’] [default: ‘euclidean’]
seed – (int) seed for kmeans
tol – (float) threshold [default: 0.0001]
device – (torch.device) device [default: cpu]
tqdm_flag – Allows to turn logs on and off
iter_limit – hard limit for max number of iterations
gamma_for_soft_dtw – approaches to (hard) DTW as gamma -> 0
- Returns:
(torch.tensor, torch.tensor) cluster ids, cluster centers
- kmeans_predict(X, cluster_centers, batch_size=100000, distance='euclidean', device=device(type='cpu'), gamma_for_soft_dtw=0.001, tqdm_flag=True)[source]
predict using cluster centers
- Parameters:
X – (torch.tensor) matrix
cluster_centers – (torch.tensor) cluster centers
distance – (str) distance [options: ‘euclidean’, ‘cosine’] [default: ‘euclidean’]
device – (torch.device) device [default: ‘cpu’]
gamma_for_soft_dtw – approaches to (hard) DTW as gamma -> 0
- Returns:
(torch.tensor) cluster ids
- pairwise_distance(data1, data2, batch_size=100000, device=device(type='cpu'), tqdm_flag=True)[source]
compute pairwise distance
- Parameters:
data1 – (torch.tensor) matrix
data2 – (torch.tensor) matrix
batch_size – (int) batch size
device – (torch.device) device [default: ‘cpu’]
tqdm_flag – Allows to turn logs on and off
- Returns:
(torch.tensor) pairwise distance
pydgc.clusterings.kmeans_gpu module
- class KMeansGPU(n_clusters, *, distance='euclidean', tol=0.0001, max_iter=500, device='cuda')[source]
Bases:
objectPerforms K-means clustering on GPU
Reference: https://github.com/yueliu1999/HSAN/blob/main/kmeans_gpu.py
- Parameters:
n_clusters (int) – (int) number of clusters
distance (str) – (str) distance metric [default: ‘euclidean’]
tol (float) – (float) tolerance [default: 1e-4]
max_iter (int) – (int) maximum number of iterations [default: 500]
device (str) – (str) device [default: ‘cuda’]
- initialize(X)[source]
initialize cluster centers
- Parameters:
X (Tensor) – (torch.tensor) matrix
- Returns:
(np.array) initial state
- Return type:
array
- pairwise_distance(data1, data2)[source]
compute pairwise distance
- Parameters:
data1 (Tensor) – (torch.tensor) matrix
data2 (Tensor) – (torch.tensor) matrix
- Returns:
(torch.tensor) pairwise distance
- Return type:
Tensor
- pairwise_cosine(data1, data2)[source]
compute pairwise cosine distance
- Parameters:
data1 (Tensor) – (torch.tensor) matrix
data2 (Tensor) – (torch.tensor) matrix
- Returns:
(torch.tensor) pairwise cosine distance
- Return type:
Tensor