pydgc.utils

pydgc.utils.command

parse_arguments(dataset_name='ACM', arg_config=None)[source]

Parse arguments.

Parameters:
  • dataset_name (str) – Dataset name.

  • arg_config (dict) – Custom arguments.

Returns:

Arguments.

Return type:

argparse.Namespace

pydgc.utils.config

validate_and_create_path(save_path)[source]

Validate whether save_path is valid or not. If it contains directory and is valid but not exists, create directory.

Parameters:

save_path (str) – Save path.

Returns:

True if save_path is valid, False otherwise.

Return type:

bool

default_cfg(dataset_name)[source]

Default configuration.

Parameters:

dataset_name (str) – Dataset name.

Returns:

Default configuration.

Return type:

CN

yaml_to_cfg(yaml_data)[source]

Transform YAML into CfgNode.

Parameters:

yaml_data (dict) – Data loaded from yaml.

Returns:

Transformed CfgNode.

Return type:

CN

dump_cfg(cfg, save_path=None)[source]

Records the configuration of this experiment.

Parameters:
  • cfg (CN) – Configuration.

  • save_path (str, optional) – Save path. Defaults to None.

load_dataset_specific_cfg(cfg_file_path, dataset_name)[source]

Load config on specified dataset.

Parameters:
  • cfg_file_path (str) – Path of config file.

  • dataset_name (str) – Name of specific dataset.

Returns:

Config of specific dataset.

Return type:

CN

check_required_cfg(cfg, dataset_name, auto_complete=True)[source]

Check required config items.

Parameters:
  • cfg (CN) – Configuration.

  • dataset_name (str) – Name of specific dataset.

  • auto_complete (bool, optional) – Whether to auto-complete missing config items. Defaults to True.

Returns:

True if all required config items are present, False otherwise.

Return type:

bool

generate_default_cfg(datasets, save_path=None)[source]

Generate default config.

Parameters:
  • datasets (str or list) – Name(s) of dataset(s).

  • save_path (str, optional) – Save path. Defaults to None.

Returns:

Default config.

Return type:

CN

pydgc.utils.device

@Reference: https://github.com/snap-stanford/GraphGym/blob/master/graphgym/utils/device.py

count_parameters(model)[source]

Count the parameters’ number of the input model.

Note: The unit of return value is millions(M) if exceeds 1,000,000.

Parameters:

model (torch.nn.Module) – The model instance you want to count.

Returns:

The number of model parameters, in Million (M).

Return type:

float

get_gpu_memory_map()[source]

Get the current gpu usage.

Returns:

The current gpu memory usage.

Return type:

np.ndarray

get_current_gpu_usage(gpu_mem, device)[source]

Get the current GPU memory usage.

Parameters:
  • gpu_mem (np.ndarray) – The current gpu memory usage.

  • device (str) – The device.

Returns:

The current GPU memory usage.

Return type:

int

auto_select_device(logger, cfg, memory_max=8000, memory_bias=200, strategy='random')[source]

Auto select device for the experiment. Useful when having multiple GPUs.

Parameters:
  • logger – Logger.

  • cfg (CN) – Config.

  • memory_max (int, optional) – Threshold of existing GPU memory usage. GPUs with memory usage beyond this threshold will be deprioritized. Defaults to 8000.

  • memory_bias (int, optional) – A bias GPU memory usage added to all the GPUs. Avoid divided by zero error. Defaults to 200.

  • strategy (str, optional) – ‘random’ (random select GPU) or ‘greedy’ (greedily select GPU). Defaults to ‘random’.

Returns:

Config.

Return type:

CN

pydgc.utils.logger

get_formatted_time()[source]

Get formatted time.

Returns:

Formatted time in the format of ‘YYYY-MM-DD HH-MM-SS’.

Return type:

str

create_logger(logger_name, log_mode='both', log_file_path=None, encoding='utf-8')[source]

Create logger.

Parameters:
  • logger_name (str) – Used to name logger.

  • log_mode (str, optional) – Print mode. Options: [file, stdout, both]. Defaults to ‘both’.

  • log_file_path (str, optional) – If print output to file, you must specify file path. Defaults to None.

  • encoding (str, optional) – Encoding mode, ‘utf-8’ for default. Defaults to ‘utf-8’.

Returns:

Logger.

Return type:

Logger

class Logger(name)[source]

Bases: object

Logger.

Parameters:

name (str) – Name of logger.

info(message)[source]

Info level log.

Parameters:

message (str) – Log message.

error(message)[source]

Error level log.

Parameters:

message (str) – Log message.

debug(message)[source]

Debug level log.

Parameters:

message (str) – Log message.

warning(message)[source]

Warning level log.

Parameters:

message (str) – Log message.

flag(message)[source]

Print flag to partition different parts above and below.

Parameters:

message (str) – Log message.

static table(results_dir, dataset_name, results_dict, decimal=4)[source]

Create table.

Parameters:
  • results_dir (str) – Results directory.

  • dataset_name (str) – Dataset name.

  • results_dict (dict) – Results dictionary.

  • decimal (int, optional) – Decimal. Defaults to 4.

loss(epoch, loss, decimal=6)[source]

Loss level log.

Parameters:
  • epoch (int) – Epoch.

  • loss (float) – Loss.

  • decimal (int, optional) – Decimal. Defaults to 6.

model_info(model)[source]

Model info level log.

Parameters:

model (nn.Module) – Model.

pydgc.utils.random

setup_seed(seed)[source]

Fix the random seed.

Parameters:

seed (int) – The random seed.

pydgc.utils.transform

get_M(adj, t=2)[source]
Calculate the matrix M by the equation:

$M=(B^1 + B^2 + … + B^t) / t$

Parameters:
  • adj (torch.Tensor) – The adjacency matrix.

  • t (int, optional) – Default value is 2.

Returns:

The matrix M.

Return type:

torch.Tensor

target_distribution(q)[source]

Target distribution.

Parameters:

q (torch.Tensor) – The input tensor.

Returns:

The target distribution.

Return type:

torch.Tensor

diffusion_adj(adj, mode='ppr', transport_rate=0.2)[source]

Graph diffusion.

Parameters:
  • adj (torch.Tensor) – The adjacency matrix.

  • mode (str, optional) – The mode of graph diffusion. Defaults to “ppr”.

  • transport_rate (float, optional) – The transport rate. Defaults to 0.2.

Returns:

The graph diffusion.

Return type:

torch.Tensor

add_gaussian_noise(x, mean=0, std_dev=0.1)[source]

Add gaussian noise to x.

Parameters:
  • x (torch.Tensor) – The input tensor.

  • mean (int, optional) – The mean of the gaussian noise. Defaults to 0.

  • std_dev (float, optional) – The standard deviation of the gaussian noise. Defaults to 0.1.

Returns:

The tensor with gaussian noise.

Return type:

torch.Tensor

perturb_data(data, cfg)[source]

Perturb the data.

Parameters:
  • data (Data) – The input data.

  • cfg (CN) – The configuration.

Returns:

The perturbed data.

Return type:

Data

sparse_mx_to_torch_sparse_tensor(sparse_mx)[source]

Convert a scipy sparse matrix to a torch sparse tensor.

Parameters:

sparse_mx (scipy.sparse.csr_matrix) – The input scipy sparse matrix.

Returns:

The torch sparse tensor.

Return type:

torch.sparse_coo_tensor

normalize_adj_torch(adj, symmetry=True)[source]

Normalize the adjacency matrix.

Parameters:
  • adj (torch.Tensor) – The input adjacency matrix.

  • symmetry (bool, optional) – Symmetry normalize or not. Defaults to True.

Returns:

The normalized adjacency matrix.

Return type:

torch.Tensor

pydgc.utils.visualization

class DGCVisual(save_path='.', save_format='png', font_family='sans-serif', font_size=20)[source]

Bases: object

A class for visualizing data.

Parameters:
  • save_path (str, optional) – The path to save the images. Defaults to ‘.’.

  • save_format (str, optional) – The format of the images. Defaults to ‘png’.

  • font_family (str or list, optional) – The font family. Defaults to ‘sans-serif’.

  • font_size (int, optional) – The font size. Defaults to 20.

static check_save_format(save_format)[source]

Check if the save format is supported.

Parameters:

save_format (str) – The save format, e.g., ‘png’, ‘pdf’, ‘jpg’, ‘jpeg’, ‘bmp’, ‘tiff’, ‘gif’, ‘svg’, ‘eps’.

Raises:

ValueError – If the save format is not supported.

plot_clustering(data, labels, method='tsne', palette='viridis', fig_size=(10, 8), filename='tsne_plot', show_axis=False, legend=False, dpi=300, random_state=42)[source]

Plot the clustering results with tsne or umap dimension reduction.

Parameters:
  • data (np.array) – The input data, shape (n_samples, n_features).

  • labels (np.array) – The data labels.

  • method (str, optional) – The dimensionality reduction method, ‘tsne’ or ‘umap’. Defaults to ‘tsne’.

  • palette (str, optional) – The color palette. Defaults to “viridis”.

  • fig_size (Tuple[int, int], optional) – The figure size. Defaults to (10, 8).

  • filename (str, optional) – The filename to save the plot. Defaults to “tsne_plot”.

  • show_axis (bool, optional) – Whether to show the axis. Defaults to False.

  • legend (bool, optional) – Whether to show the legend. Defaults to False.

  • dpi (int, optional) – The DPI of the plot. Defaults to 300.

  • random_state (int, optional) – The random state. Defaults to 42.

plot_heatmap(data, labels, method='inner_product', color_map='YlGnBu', fig_size=(8, 8), filename='heatmap_plot', show_color_bar=False, show_axis=False, dpi=300)[source]

Plot the heatmap of the data.

Parameters:
  • data (np.array) – The input data, shape (n_samples, n_features).

  • labels (np.array) – The data labels.

  • method (str, optional) – The similarity method, ‘cosine’ or ‘euclidean’ or ‘inner_product’. Defaults to ‘inner_product’.

  • color_map (str, optional) – The color map. Defaults to “YlGnBu”.

  • fig_size (Tuple[int, int], optional) – The figure size. Defaults to (8, 8).

  • filename (str, optional) – The filename to save the plot. Defaults to “heatmap_plot”.

  • show_color_bar (bool, optional) – Whether to show the color bar. Defaults to False.

  • show_axis (bool, optional) – Whether to show the axis. Defaults to False.

  • dpi (int, optional) – The DPI of the plot. Defaults to 300.

plot_loss(losses, metrics=None, metrics_name=None, fig_size=(3.149606299212598, 2.3622047244094486), marker='o', line_style='-', color='blue', line_width=2, title=None, dpi=300, filename='loss_curve_plot')[source]

Plot the loss curve and metrics curve if metrics valid.

Parameters:
  • losses (list) – The loss values.

  • metrics (list, optional) – The metrics values. Defaults to None.

  • metrics_name (str, optional) – The metrics name. Defaults to None.

  • fig_size (Tuple[int, int], optional) – The figure size. Defaults to (8/2.54, 6/2.54).

  • marker (str, optional) – The marker style. Defaults to ‘o’.

  • line_style (str, optional) – The line style. Defaults to ‘-‘.

  • color (str, optional) – The line color. Defaults to ‘blue’.

  • line_width (int, optional) – The line width. Defaults to 2.

  • title (str, optional) – The title. Defaults to None.

  • dpi (int, optional) – The DPI. Defaults to 300.

  • filename (str, optional) – The filename. Defaults to “loss_curve_plot”.