pgl.dataset

This package implements some benchmark dataset for graph network and node representation learning.

class pgl.dataset.CitationDataset(name, symmetry_edges=True, self_loop=True)[source]

Bases: object

Citation dataset helps to create data for citation dataset (Pubmed and Citeseer)

Parameters
  • name – The name for the dataset (“pubmed” or “citeseer”)

  • symmetry_edges – Whether to create symmetry edges.

  • self_loop – Whether to contain self loop edges.

graph

The Graph data object

y

Labels for each nodes

num_classes

Number of classes.

train_index

The index for nodes in training set.

val_index

The index for nodes in validation set.

test_index

The index for nodes in test set.

class pgl.dataset.CoraDataset(symmetry_edges=True, self_loop=True)[source]

Bases: object

Cora dataset implementation

Parameters
  • symmetry_edges – Whether to create symmetry edges.

  • self_loop – Whether to contain self loop edges.

graph

The Graph data object

y

Labels for each nodes

num_classes

Number of classes.

train_index

The index for nodes in training set.

val_index

The index for nodes in validation set.

test_index

The index for nodes in test set.

class pgl.dataset.ArXivDataset(np_random_seed=123)[source]

Bases: object

ArXiv dataset implementation

Parameters

np_random_seed – The random seed for numpy.

graph

The Graph data object.

class pgl.dataset.BlogCatalogDataset(symmetry_edges=True, self_loop=False)[source]

Bases: object

BlogCatalog dataset implementation

Parameters
  • symmetry_edges – Whether to create symmetry edges.

  • self_loop – Whether to contain self loop edges.

graph

The Graph data object.

num_groups

Number of classes.

train_index

The index for nodes in training set.

test_index

The index for nodes in validation set.

class pgl.dataset.RedditDataset(normalize=True, symmetry=True)[source]

Bases: object