pgl.graph

This package implement Graph structure for handling graph data.

class pgl.graph.Graph(edges, num_nodes=None, node_feat=None, edge_feat=None, **kwargs)[source]

Bases: object

Implementation of graph interface in pgl.

This is a simple implementation of graph structure in pgl. pgl.Graph is alias on pgl.graph.Graph

Parameters
  • edges – list of (u, v) tuples, 2D numpy.ndarry or 2D paddle.Tensor

  • (optional (num_nodes) – int, numpy or paddle.Tensor): Number of nodes in a graph. If not provided, the number of nodes will be infered from edges.

  • node_feat (optional) – a dict of numpy array as node features

  • edge_feat (optional) – a dict of numpy array as edge features (should have consistent order with edges)

Examples 1:

  • Create a graph with numpy.

  • Convert it into paddle.Tensor .

  • Do send recv for graph neural network.

import numpy as np
import pgl

num_nodes = 5
edges = [ (0, 1), (1, 2), (3, 4)]
feature = np.random.randn(5, 100).astype(np.float32)
edge_feature = np.random.randn(3, 100).astype(np.float32)
graph = pgl.Graph(num_nodes=num_nodes,
            edges=edges,
            node_feat={
                "feature": feature
            },
            edge_feat={
                "edge_feature": edge_feature
            })
graph.tensor()

model = pgl.nn.GCNConv(100, 100)
out = model(graph, graph.node_feat["feature"])

Examples 2:

  • Create a graph with paddle.Tensor.

  • Do send recv for graph neural network.

import paddle
import pgl

num_nodes = 5
edges = paddle.to_tensor([ (0, 1), (1, 2), (3, 4)])
feature = paddle.randn(shape=[5, 100])
edge_feature = paddle.randn(shape=[3, 100])
graph = pgl.Graph(num_nodes=num_nodes,
            edges=edges,
            node_feat={
                "feature": feature
            },
            edge_feat={
                "edge_feature": edge_feature
            })

model = pgl.nn.GCNConv(100, 100)
out = model(graph, graph.node_feat["feature"])
property adj_dst_index

Return an EdgeIndex object for dst.

property adj_src_index

Return an EdgeIndex object for src.

static batch(graph_list)[source]

This is alias on pgl.Graph.disjoint with merged_graph_index=False

classmethod disjoint(graph_list, merged_graph_index=False)[source]

This method disjoint list of graph into a big graph.

Parameters
  • graph_list (Graph List) – A list of Graphs.

  • merged_graph_index – whether to keeped the graph_id that the nodes belongs to.

import numpy as np
import pgl

num_nodes = 5
edges = [ (0, 1), (1, 2), (3, 4)]
graph = pgl.Graph(num_nodes=num_nodes,
            edges=edges)
joint_graph = pgl.Graph.disjoint([graph, graph], merged_graph_index=False)
print(joint_graph.graph_node_id)
>>> [0, 0, 0, 0, 0, 1, 1, 1, 1 ,1]
print(joint_graph.num_graph)
>>> 2

joint_graph = pgl.Graph.disjoint([graph, graph], merged_graph_index=True)
print(joint_graph.graph_node_id)
>>> [0, 0, 0, 0, 0, 0, 0, 0, 0, 0]
print(joint_graph.num_graph)
>>> 1
dump(path)[source]

Dump the graph into a directory.

This function will dump the graph information into the given directory path. The graph can be read back with pgl.Graph.load

Parameters

path – The directory for the storage of the graph.

property edge_feat

Return a dictionary of edge features.

property edges

Return all edges in numpy.ndarray or paddle.Tensor with shape (num_edges, 2).

property graph_edge_id

Return a numpy.ndarray or paddle.Tensor with shape [num_edges] that indicates which graph the edges belongs to.

Examples:

import numpy as np
import pgl

num_nodes = 5
edges = [ (0, 1), (1, 2), (3, 4)]
graph = pgl.Graph(num_nodes=num_nodes,
            edges=edges)
joint_graph = pgl.Graph.batch([graph, graph])
print(joint_graph.graph_edge_id)

>>> [0, 0, 0, 1, 1, 1]
property graph_node_id

Return a numpy.ndarray or paddle.Tensor with shape [num_nodes] that indicates which graph the nodes belongs to.

Examples:

import numpy as np
import pgl

num_nodes = 5
edges = [ (0, 1), (1, 2), (3, 4)]
graph = pgl.Graph(num_nodes=num_nodes,
            edges=edges)
joint_graph = pgl.Graph.batch([graph, graph])
print(joint_graph.graph_node_id)

>>> [0, 0, 0, 0, 0, 1, 1, 1, 1 ,1]
indegree(nodes=None)[source]

Return the indegree of the given nodes

This function will return indegree of given nodes.

Parameters

nodes – Return the indegree of given nodes, if nodes is None, return indegree for all nodes

Returns

A numpy.ndarray or paddle.Tensor as the given nodes’ indegree.

is_tensor()[source]

Return whether the Graph is in paddle.Tensor or numpy format.

classmethod load(path, mmap_mode='r')[source]

Load Graph from path and return a Graph in numpy.

Parameters
  • path – The directory path of the stored Graph.

  • mmap_mode – Default mmap_mode="r". If not None, memory-map the graph.

node_batch_iter(batch_size, shuffle=True)[source]

Node batch iterator

Iterate all node by batch.

Parameters
  • batch_size – The batch size of each batch of nodes.

  • shuffle – Whether shuffle the nodes.

Returns

Batch iterator

property node_feat

Return a dictionary of node features.

property nodes

Return all nodes id from 0 to num_nodes - 1

property num_edges

Return the number of edges.

property num_graph

Return Number of Graphs

property num_nodes

Return the number of nodes.

numpy(inplace=True)[source]

Convert the Graph into numpy format.

In numpy format, the graph edges and node features are in numpy.ndarray format. But you can’t use send and recv in numpy graph.

Parameters

inplace – (Default True) Whether to convert the graph into numpy inplace.

outdegree(nodes=None)[source]

Return the outdegree of the given nodes.

This function will return outdegree of given nodes.

Parameters

nodes – Return the outdegree of given nodes, if nodes is None, return outdegree for all nodes

Returns

A numpy.array or paddle.Tensor as the given nodes’ outdegree.

predecessor(nodes=None, return_eids=False)[source]

Find predecessor of given nodes.

This function will return the predecessor of given nodes.

Parameters
  • nodes – Return the predecessor of given nodes, if nodes is None, return predecessor for all nodes.

  • return_eids – If True return nodes together with corresponding eid

Returns

Return a list of numpy.ndarray and each numpy.ndarray represent a list of predecessor ids for given nodes. If return_eids=True, there will be an additional list of numpy.ndarray and each numpy.ndarray represent a list of eids that connected nodes to their predecessors.

Example

import numpy as np
import pgl

num_nodes = 5
edges = [ (0, 1), (1, 2), (3, 4)]
graph = pgl.Graph(num_nodes=num_nodes,
        edges=edges)
pred, pred_eid = graph.predecessor(return_eids=True)

This will give output.

pred:
      [[],
       [0],
       [1],
       [],
       [3]]

pred_eid:
      [[],
       [0],
       [1],
       [],
       [2]]
recv(reduce_func, msg, recv_mode='dst')[source]

Recv message and aggregate the message by reduce_func

The UDF reduce_func function should has the following format.

def reduce_func(msg):
    '''
        Args:

            msg: A LodTensor or a dictionary of LodTensor whose batch_size
                 is equals to the number of unique dst nodes.

        Return:

            It should return a tensor with shape (batch_size, out_dims). The
            batch size should be the same as msg.
    '''
    pass
Parameters
  • msg – A tensor or a dictionary of tensor created by send function..

  • reduce_func – A callable UDF reduce function.

Returns

A tensor with shape (num_nodes, out_dims). The output for nodes with no message will be zeros.

sample_predecessor(nodes, max_degree, return_eids=False, shuffle=False)[source]

Sample predecessor of given nodes.

Parameters
  • nodes – Given nodes whose predecessor will be sampled.

  • max_degree – The max sampled predecessor for each nodes.

  • return_eids – Whether to return the corresponding eids.

Returns

Return a list of numpy.ndarray and each numpy.ndarray represent a list of sampled predecessor ids for given nodes. If return_eids=True, there will be an additional list of numpy.ndarray and each numpy.ndarray represent a list of eids that connected nodes to their predecessors.

sample_successor(nodes, max_degree, return_eids=False, shuffle=False)[source]

Sample successors of given nodes.

Parameters
  • nodes – Given nodes whose successors will be sampled.

  • max_degree – The max sampled successors for each nodes.

  • return_eids – Whether to return the corresponding eids.

Returns

Return a list of numpy.ndarray and each numpy.ndarray represent a list of sampled successor ids for given nodes. If return_eids=True, there will be an additional list of numpy.ndarray and each numpy.ndarray represent a list of eids that connected nodes to their successors.

send(message_func, src_feat=None, dst_feat=None, edge_feat=None, node_feat=None)[source]

Send message from all src nodes to dst nodes.

The UDF message function should has the following format.

def message_func(src_feat, dst_feat, edge_feat):
    '''
        Args:
            src_feat: the node feat dict attached to the src nodes.
            dst_feat: the node feat dict attached to the dst nodes.
            edge_feat: the edge feat dict attached to the
                       corresponding (src, dst) edges.

        Return:
            It should return a tensor or a dictionary of tensor. And each tensor
            should have a shape of (num_edges, dims).
    '''
    return {'msg': src_feat['h']}
Parameters
  • message_func – UDF function.

  • src_feat – a dict {name: tensor,} to build src node feat

  • dst_feat – a dict {name: tensor,} to build dst node feat

  • node_feat – a dict {name: tensor,} to build both src and dst node feat

  • edge_feat – a dict {name: tensor,} to build edge feat

Returns

A dictionary of tensor representing the message. Each of the values in the dictionary has a shape (num_edges, dim) which should be collected by recv function.

send_recv(feature, reduce_func='sum')[source]

This method combines the send and recv function.

Now, this method only supports default copy send function, and built-in receive function (‘sum’, ‘mean’, ‘max’, ‘min’).

Parameters
  • feature (Tensor | Tensor List) – the node feature of a graph.

  • reduce_func (str) – ‘sum’, ‘mean’, ‘max’, ‘min’ built-in receive function.

sorted_edges(sort_by='src')[source]

Return sorted edges with different strategies.

This function will return sorted edges with different strategy. If sort_by="src", then edges will be sorted by src nodes and otherwise dst.

Parameters

sort_by – The type for sorted edges. (“src” or “dst”)

Returns

A tuple of (sorted_src, sorted_dst, sorted_eid).

successor(nodes=None, return_eids=False)[source]

Find successor of given nodes.

This function will return the successor of given nodes.

Parameters
  • nodes – Return the successor of given nodes, if nodes is None, return successor for all nodes.

  • return_eids – If True return nodes together with corresponding eid

Returns

Return a list of numpy.ndarray and each numpy.ndarray represent a list of successor ids for given nodes. If return_eids=True, there will be an additional list of numpy.ndarray and each numpy.ndarray represent a list of eids that connected nodes to their successors.

Example

import numpy as np
import pgl

num_nodes = 5
edges = [ (0, 1), (1, 2), (3, 4)]
graph = pgl.Graph(num_nodes=num_nodes,
        edges=edges)
succ, succ_eid = graph.successor(return_eids=True)

This will give output.

succ:
      [[1],
       [2],
       [],
       [4],
       []]

succ_eid:
      [[0],
       [1],
       [],
       [2],
       []]
tensor(inplace=True)[source]

Convert the Graph into paddle.Tensor format.

In paddle.Tensor format, the graph edges and node features are in paddle.Tensor format. You can use send and recv in paddle.Tensor graph.

Parameters

inplace – (Default True) Whether to convert the graph into tensor inplace.

to_mmap(path='./tmp')[source]

Turn the Graph into Memmap mode which can share memory between processes.