pydgc.datasets
pydgc.datasets.utils module
- class UserDataset(root, dataset_name)[source]
Bases:
InMemoryDatasetUser custom Dataset inherited from InMemoryDataset of PyG
- Parameters:
root (str) – Path of data stored
dataset_name (str) – Name of dataset
- property raw_file_names: str
The name of the files in the
self.raw_dirfolder that must be present in order to skip downloading.
- property processed_file_names: str
The name of the files in the
self.processed_dirfolder that must be present in order to skip processing.
- class NonGraphDataset(root, dataset_name, neighbors=1, metric='minkowski', p=2)[source]
Bases:
InMemoryDatasetDataset object for constructing non-graph data
- Parameters:
root (str) – Path of data stored
dataset_name (str) – Name of dataset
neighbors (int, optional) – k for knn. Defaults to 1.
metric (str, optional) – Similarity measurement. Defaults to ‘minkowski’.
p (int, optional) – Power parameter for the Minkowski metric. Defaults to 2.
- property raw_file_names: str
The name of the files in the
self.raw_dirfolder that must be present in order to skip downloading.
- property processed_dir: str
- property processed_file_names: str
The name of the files in the
self.processed_dirfolder that must be present in order to skip processing.
- heat_kernel_knn_graph(x, k)[source]
Construct heat kernel graph
- Parameters:
x (np.ndarray) – Input data
k (int) – Number of neighbors
- Returns:
Adjacency matrix
- Return type:
Tensor
- class DGCGraphDataset(root, dataset_name)[source]
Bases:
UserDatasetDGC Dataset object for constructing graph data
- Parameters:
root (str) – Path of data stored
dataset_name (str) – Name of dataset
- class DGCNonGraphDataset(root, dataset_name, neighbors=1, metric='minkowski', p=2)[source]
Bases:
NonGraphDatasetDGC Non-Graph Dataset object for constructing graph from non-graph data
- Parameters:
root (str) – Path of data stored
dataset_name (str) – Name of dataset
neighbors (int, optional) – k for knn. Defaults to 1.
metric (str, optional) – Similarity measurement. Defaults to ‘minkowski’.
p (int, optional) – Power parameter for the Minkowski metric. Defaults to 2.
- load_pyg(dataset_dir, dataset_name)[source]
Load PyG dataset built in PyDGC.
- Parameters:
dataset_dir (str) – Dataset stored root path.
dataset_name (str) – Dataset name. Available datasets: CORA, CITE, CITESEER, PUBMED, BAT, EAT, UAT, COCS, COPS, AMAC, AMAP, CORNELL, TEXAS, WISC, WIKI, BLOG, PPI, FLICKR, FACEBOOK, TWEIBO, MAG, ACTOR, CORAFULL, DBLPFULL, NELL, REDDIT, REDDIT2, YELP, AMP, LFMA, ROMAN.
- Returns:
PyG dataset.
- Return type:
Dataset
- load_dgc_graph(dataset_dir, dataset_name)[source]
Load custom DGC graph dataset.
- Parameters:
dataset_dir (str) – Dataset stored root path.
dataset_name (str) – Dataset name.
- Returns:
Custom DGC graph dataset.
- Return type:
Dataset
- load_dgc_non_graph(dataset_dir, dataset_name, *, neighbors=1, metric='minkowski', p=2)[source]
Load custom non-graph dataset. :param dataset_dir: Dataset stored root path. :type dataset_dir: str :param dataset_name: Dataset name for non-graph dataset. :type dataset_name: str :param neighbors: K for KNN. Self is not included. Defaults to 1. :type neighbors: int, optional :param metric: Distance type, ‘minkowski’ for default. Defaults to ‘minkowski’. :type metric: str, optional :param p: Power parameter for the Minkowski metric. When p = 1, this is equivalent to using manhattan_distance (l1), and euclidean_distance (l2) for p = 2. For arbitrary p, minkowski_distance (l_p) is used. :type p: int, optional
- Returns
NonGraphDataset: Custom non-graph Dataset object.
- Parameters:
dataset_dir (str) –
dataset_name (str) –
neighbors (int) –
metric (str) –
p (int) –
- Return type:
Dataset
- load_ogb(dataset_dir, dataset_name)[source]
Load OGB dataset.
- Parameters:
dataset_dir (str) – Dataset stored root path.
dataset_name (str) – Dataset name.
- Returns:
OGB dataset.
- Return type:
Dataset
- load_dataset(dataset_dir, dataset_name, p=2, is_custom=False, custom_is_graph=True, metric='minkowski')[source]
Load raw datasets.
- Parameters:
dataset_dir (str) – Dataset stored root path.
dataset_name (str) – Dataset name.
p (int, optional) – Power parameter for the Minkowski metric. Defaults to 2.
is_custom (bool, optional) – Whether the dataset is custom. Defaults to False.
custom_is_graph (bool, optional) – Whether the custom dataset is graph. Defaults to True.
metric (str, optional) – Distance type for non-graph data. Defaults to ‘minkowski’.
- Returns:
Raw dataset.
- Return type:
Dataset
- preprocess_custom_data(root, dataset_name, dataset_type='graph')[source]
Transform dataset with format from Awesome-Deep-Graph-Clustering.
- Parameters:
root (str) – Dataset stored root path.
dataset_name (str) – Dataset name.
dataset_type (str, optional) – Dataset type. Options: ‘graph’, ‘non-graph’. Defaults to ‘graph’.