pydgc.datasets

pydgc.datasets.utils module

class UserDataset(root, dataset_name)[source]

Bases: InMemoryDataset

User custom Dataset inherited from InMemoryDataset of PyG

Parameters:
  • root (str) – Path of data stored

  • dataset_name (str) – Name of dataset

property raw_file_names: str

The name of the files in the self.raw_dir folder that must be present in order to skip downloading.

property processed_file_names: str

The name of the files in the self.processed_dir folder that must be present in order to skip processing.

download()[source]

Downloads the dataset to the self.raw_dir folder.

process()[source]

Processes the dataset to the self.processed_dir folder.

class NonGraphDataset(root, dataset_name, neighbors=1, metric='minkowski', p=2)[source]

Bases: InMemoryDataset

Dataset object for constructing non-graph data

Parameters:
  • root (str) – Path of data stored

  • dataset_name (str) – Name of dataset

  • neighbors (int, optional) – k for knn. Defaults to 1.

  • metric (str, optional) – Similarity measurement. Defaults to ‘minkowski’.

  • p (int, optional) – Power parameter for the Minkowski metric. Defaults to 2.

download()[source]

Downloads the dataset to the self.raw_dir folder.

property raw_file_names: str

The name of the files in the self.raw_dir folder that must be present in order to skip downloading.

property processed_dir: str
property processed_file_names: str

The name of the files in the self.processed_dir folder that must be present in order to skip processing.

process()[source]

Processes the dataset to the self.processed_dir folder.

heat_kernel_knn_graph(x, k)[source]

Construct heat kernel graph

Parameters:
  • x (np.ndarray) – Input data

  • k (int) – Number of neighbors

Returns:

Adjacency matrix

Return type:

Tensor

class DGCGraphDataset(root, dataset_name)[source]

Bases: UserDataset

DGC Dataset object for constructing graph data

Parameters:
  • root (str) – Path of data stored

  • dataset_name (str) – Name of dataset

download()[source]

Downloads the dataset to the self.raw_dir folder.

Return type:

None

class DGCNonGraphDataset(root, dataset_name, neighbors=1, metric='minkowski', p=2)[source]

Bases: NonGraphDataset

DGC Non-Graph Dataset object for constructing graph from non-graph data

Parameters:
  • root (str) – Path of data stored

  • dataset_name (str) – Name of dataset

  • neighbors (int, optional) – k for knn. Defaults to 1.

  • metric (str, optional) – Similarity measurement. Defaults to ‘minkowski’.

  • p (int, optional) – Power parameter for the Minkowski metric. Defaults to 2.

download()[source]

Downloads the dataset to the self.raw_dir folder.

Return type:

None

load_pyg(dataset_dir, dataset_name)[source]

Load PyG dataset built in PyDGC.

Parameters:
  • dataset_dir (str) – Dataset stored root path.

  • dataset_name (str) – Dataset name. Available datasets: CORA, CITE, CITESEER, PUBMED, BAT, EAT, UAT, COCS, COPS, AMAC, AMAP, CORNELL, TEXAS, WISC, WIKI, BLOG, PPI, FLICKR, FACEBOOK, TWEIBO, MAG, ACTOR, CORAFULL, DBLPFULL, NELL, REDDIT, REDDIT2, YELP, AMP, LFMA, ROMAN.

Returns:

PyG dataset.

Return type:

Dataset

load_dgc_graph(dataset_dir, dataset_name)[source]

Load custom DGC graph dataset.

Parameters:
  • dataset_dir (str) – Dataset stored root path.

  • dataset_name (str) – Dataset name.

Returns:

Custom DGC graph dataset.

Return type:

Dataset

load_dgc_non_graph(dataset_dir, dataset_name, *, neighbors=1, metric='minkowski', p=2)[source]

Load custom non-graph dataset. :param dataset_dir: Dataset stored root path. :type dataset_dir: str :param dataset_name: Dataset name for non-graph dataset. :type dataset_name: str :param neighbors: K for KNN. Self is not included. Defaults to 1. :type neighbors: int, optional :param metric: Distance type, ‘minkowski’ for default. Defaults to ‘minkowski’. :type metric: str, optional :param p: Power parameter for the Minkowski metric. When p = 1, this is equivalent to using manhattan_distance (l1), and euclidean_distance (l2) for p = 2. For arbitrary p, minkowski_distance (l_p) is used. :type p: int, optional

Returns

NonGraphDataset: Custom non-graph Dataset object.

Parameters:
  • dataset_dir (str) –

  • dataset_name (str) –

  • neighbors (int) –

  • metric (str) –

  • p (int) –

Return type:

Dataset

load_ogb(dataset_dir, dataset_name)[source]

Load OGB dataset.

Parameters:
  • dataset_dir (str) – Dataset stored root path.

  • dataset_name (str) – Dataset name.

Returns:

OGB dataset.

Return type:

Dataset

load_dataset(dataset_dir, dataset_name, p=2, is_custom=False, custom_is_graph=True, metric='minkowski')[source]

Load raw datasets.

Parameters:
  • dataset_dir (str) – Dataset stored root path.

  • dataset_name (str) – Dataset name.

  • p (int, optional) – Power parameter for the Minkowski metric. Defaults to 2.

  • is_custom (bool, optional) – Whether the dataset is custom. Defaults to False.

  • custom_is_graph (bool, optional) – Whether the custom dataset is graph. Defaults to True.

  • metric (str, optional) – Distance type for non-graph data. Defaults to ‘minkowski’.

Returns:

Raw dataset.

Return type:

Dataset

preprocess_custom_data(root, dataset_name, dataset_type='graph')[source]

Transform dataset with format from Awesome-Deep-Graph-Clustering.

Parameters:
  • root (str) – Dataset stored root path.

  • dataset_name (str) – Dataset name.

  • dataset_type (str, optional) – Dataset type. Options: ‘graph’, ‘non-graph’. Defaults to ‘graph’.

class LoadAttribute(x)[source]

Bases: Dataset

Load attribute dataset.

Parameters:

x (np.ndarray) – Attribute matrix.