pydgc.clusterings

pydgc.clusterings.batch_kmeans_gpu module

initialize(X, num_clusters, seed)[source]

initialize cluster centers

Parameters:
  • X – (torch.tensor) matrix

  • num_clusters – (int) number of clusters

  • seed – (int) seed for kmeans

Returns:

(np.array) initial state

kmeans(X, num_clusters, distance='euclidean', batch_size=100000, cluster_centers=[], tol=0.0001, tqdm_flag=True, iter_limit=0, device=device(type='cpu'), gamma_for_soft_dtw=0.001, seed=None)[source]

perform kmeans

Reference: https://github.com/EdisonLeeeee/MAGI/blob/master/magi/batch_kmeans_cuda.py

Parameters:
  • X – (torch.tensor) matrix

  • num_clusters – (int) number of clusters

  • distance – (str) distance [options: ‘euclidean’, ‘cosine’] [default: ‘euclidean’]

  • seed – (int) seed for kmeans

  • tol – (float) threshold [default: 0.0001]

  • device – (torch.device) device [default: cpu]

  • tqdm_flag – Allows to turn logs on and off

  • iter_limit – hard limit for max number of iterations

  • gamma_for_soft_dtw – approaches to (hard) DTW as gamma -> 0

Returns:

(torch.tensor, torch.tensor) cluster ids, cluster centers

kmeans_predict(X, cluster_centers, batch_size=100000, distance='euclidean', device=device(type='cpu'), gamma_for_soft_dtw=0.001, tqdm_flag=True)[source]

predict using cluster centers

Parameters:
  • X – (torch.tensor) matrix

  • cluster_centers – (torch.tensor) cluster centers

  • distance – (str) distance [options: ‘euclidean’, ‘cosine’] [default: ‘euclidean’]

  • device – (torch.device) device [default: ‘cpu’]

  • gamma_for_soft_dtw – approaches to (hard) DTW as gamma -> 0

Returns:

(torch.tensor) cluster ids

pairwise_distance(data1, data2, batch_size=100000, device=device(type='cpu'), tqdm_flag=True)[source]

compute pairwise distance

Parameters:
  • data1 – (torch.tensor) matrix

  • data2 – (torch.tensor) matrix

  • batch_size – (int) batch size

  • device – (torch.device) device [default: ‘cpu’]

  • tqdm_flag – Allows to turn logs on and off

Returns:

(torch.tensor) pairwise distance

pydgc.clusterings.kmeans_gpu module

class KMeansGPU(n_clusters, *, distance='euclidean', tol=0.0001, max_iter=500, device='cuda')[source]

Bases: object

Performs K-means clustering on GPU

Reference: https://github.com/yueliu1999/HSAN/blob/main/kmeans_gpu.py

Parameters:
  • n_clusters (int) – (int) number of clusters

  • distance (str) – (str) distance metric [default: ‘euclidean’]

  • tol (float) – (float) tolerance [default: 1e-4]

  • max_iter (int) – (int) maximum number of iterations [default: 500]

  • device (str) – (str) device [default: ‘cuda’]

initialize(X)[source]

initialize cluster centers

Parameters:

X (Tensor) – (torch.tensor) matrix

Returns:

(np.array) initial state

Return type:

array

pairwise_distance(data1, data2)[source]

compute pairwise distance

Parameters:
  • data1 (Tensor) – (torch.tensor) matrix

  • data2 (Tensor) – (torch.tensor) matrix

Returns:

(torch.tensor) pairwise distance

Return type:

Tensor

pairwise_cosine(data1, data2)[source]

compute pairwise cosine distance

Parameters:
  • data1 (Tensor) – (torch.tensor) matrix

  • data2 (Tensor) – (torch.tensor) matrix

Returns:

(torch.tensor) pairwise cosine distance

Return type:

Tensor

fit(X)[source]

perform kmeans

Parameters:

X (Tensor) – (torch.tensor) matrix

Returns:

(torch.tensor, torch.tensor) cluster ids, cluster centers

Return type:

Tuple[Tensor, Tensor]

predict(X)[source]

predict using cluster centers

Parameters:

X (Tensor) – (torch.tensor) matrix

Returns:

(torch.tensor) cluster ids

Return type:

Tensor