pgl.graph¶

This package implement Graph structure for handling graph data.

class pgl.graph.Graph(edges, num_nodes=None, node_feat=None, edge_feat=None, **kwargs)[source]¶

Bases: object

Implementation of graph interface in pgl.

This is a simple implementation of graph structure in pgl. pgl.Graph is alias on pgl.graph.Graph

Parameters

edges – list of (u, v) tuples, 2D numpy.ndarry or 2D paddle.Tensor
(optional (num_nodes) – int, numpy or paddle.Tensor): Number of nodes in a graph. If not provided, the number of nodes will be infered from edges.
node_feat (optional) – a dict of numpy array as node features
edge_feat (optional) – a dict of numpy array as edge features (should have consistent order with edges)

Examples 1:

Create a graph with numpy.
Convert it into paddle.Tensor .
Do send recv for graph neural network.

import numpy as np
import pgl

num_nodes = 5
edges = [ (0, 1), (1, 2), (3, 4)]
feature = np.random.randn(5, 100).astype(np.float32)
edge_feature = np.random.randn(3, 100).astype(np.float32)
graph = pgl.Graph(num_nodes=num_nodes,
            edges=edges,
            node_feat={
                "feature": feature
            },
            edge_feat={
                "edge_feature": edge_feature
            })
graph.tensor()

model = pgl.nn.GCNConv(100, 100)
out = model(graph, graph.node_feat["feature"])

Examples 2:

Create a graph with paddle.Tensor.
Do send recv for graph neural network.

import paddle
import pgl

num_nodes = 5
edges = paddle.to_tensor([ (0, 1), (1, 2), (3, 4)])
feature = paddle.randn(shape=[5, 100])
edge_feature = paddle.randn(shape=[3, 100])
graph = pgl.Graph(num_nodes=num_nodes,
            edges=edges,
            node_feat={
                "feature": feature
            },
            edge_feat={
                "edge_feature": edge_feature
            })

model = pgl.nn.GCNConv(100, 100)
out = model(graph, graph.node_feat["feature"])

property adj_dst_index¶: Return an EdgeIndex object for dst.

property adj_src_index¶: Return an EdgeIndex object for src.

static batch(graph_list)[source]¶: This is alias on pgl.Graph.disjoint with merged_graph_index=False

classmethod disjoint(graph_list, merged_graph_index=False)[source]¶

This method disjoint list of graph into a big graph.

Parameters

graph_list (Graph List) – A list of Graphs.
merged_graph_index – whether to keeped the graph_id that the nodes belongs to.

import numpy as np
import pgl

num_nodes = 5
edges = [ (0, 1), (1, 2), (3, 4)]
graph = pgl.Graph(num_nodes=num_nodes,
            edges=edges)
joint_graph = pgl.Graph.disjoint([graph, graph], merged_graph_index=False)
print(joint_graph.graph_node_id)
>>> [0, 0, 0, 0, 0, 1, 1, 1, 1 ,1]
print(joint_graph.num_graph)
>>> 2

joint_graph = pgl.Graph.disjoint([graph, graph], merged_graph_index=True)
print(joint_graph.graph_node_id)
>>> [0, 0, 0, 0, 0, 0, 0, 0, 0, 0]
print(joint_graph.num_graph)
>>> 1

dump(path)[source]¶

Dump the graph into a directory.

This function will dump the graph information into the given directory path. The graph can be read back with pgl.Graph.load

Parameters: path – The directory for the storage of the graph.

property edge_feat¶: Return a dictionary of edge features.

property edges¶: Return all edges in numpy.ndarray or paddle.Tensor with shape (num_edges, 2).

property graph_edge_id¶

Return a numpy.ndarray or paddle.Tensor with shape [num_edges] that indicates which graph the edges belongs to.

Examples:

import numpy as np
import pgl

num_nodes = 5
edges = [ (0, 1), (1, 2), (3, 4)]
graph = pgl.Graph(num_nodes=num_nodes,
            edges=edges)
joint_graph = pgl.Graph.batch([graph, graph])
print(joint_graph.graph_edge_id)

>>> [0, 0, 0, 1, 1, 1]

property graph_node_id¶

Return a numpy.ndarray or paddle.Tensor with shape [num_nodes] that indicates which graph the nodes belongs to.

Examples:

import numpy as np
import pgl

num_nodes = 5
edges = [ (0, 1), (1, 2), (3, 4)]
graph = pgl.Graph(num_nodes=num_nodes,
            edges=edges)
joint_graph = pgl.Graph.batch([graph, graph])
print(joint_graph.graph_node_id)

>>> [0, 0, 0, 0, 0, 1, 1, 1, 1 ,1]

indegree(nodes=None)[source]¶

Return the indegree of the given nodes

This function will return indegree of given nodes.

Parameters: nodes – Return the indegree of given nodes, if nodes is None, return indegree for all nodes
Returns: A numpy.ndarray or paddle.Tensor as the given nodes’ indegree.

is_tensor()[source]¶: Return whether the Graph is in paddle.Tensor or numpy format.

classmethod load(path, mmap_mode='r')[source]¶

Load Graph from path and return a Graph in numpy.

Parameters

path – The directory path of the stored Graph.
mmap_mode – Default mmap_mode="r". If not None, memory-map the graph.

node_batch_iter(batch_size, shuffle=True)[source]¶

Node batch iterator

Iterate all node by batch.

Parameters

batch_size – The batch size of each batch of nodes.
shuffle – Whether shuffle the nodes.

Returns

Batch iterator

property node_feat¶: Return a dictionary of node features.

property nodes¶: Return all nodes id from 0 to num_nodes - 1

property num_edges¶: Return the number of edges.

property num_graph¶: Return Number of Graphs

property num_nodes¶: Return the number of nodes.

numpy(inplace=True)[source]¶

Convert the Graph into numpy format.

In numpy format, the graph edges and node features are in numpy.ndarray format. But you can’t use send and recv in numpy graph.

Parameters: inplace – (Default True) Whether to convert the graph into numpy inplace.

outdegree(nodes=None)[source]¶

Return the outdegree of the given nodes.

This function will return outdegree of given nodes.

Parameters: nodes – Return the outdegree of given nodes, if nodes is None, return outdegree for all nodes
Returns: A numpy.array or paddle.Tensor as the given nodes’ outdegree.

predecessor(nodes=None, return_eids=False)[source]¶

Find predecessor of given nodes.

This function will return the predecessor of given nodes.

Parameters

nodes – Return the predecessor of given nodes, if nodes is None, return predecessor for all nodes.
return_eids – If True return nodes together with corresponding eid

Returns

Return a list of numpy.ndarray and each numpy.ndarray represent a list of predecessor ids for given nodes. If return_eids=True, there will be an additional list of numpy.ndarray and each numpy.ndarray represent a list of eids that connected nodes to their predecessors.

Example

import numpy as np
import pgl

num_nodes = 5
edges = [ (0, 1), (1, 2), (3, 4)]
graph = pgl.Graph(num_nodes=num_nodes,
        edges=edges)
pred, pred_eid = graph.predecessor(return_eids=True)

This will give output.

pred:
      [[],
       [0],
       [1],
       [],
       [3]]

pred_eid:
      [[],
       [0],
       [1],
       [],
       [2]]

recv(reduce_func, msg, recv_mode='dst')[source]¶

Recv message and aggregate the message by reduce_func

The UDF reduce_func function should has the following format.

def reduce_func(msg):
    '''
        Args:

            msg: A LodTensor or a dictionary of LodTensor whose batch_size
                 is equals to the number of unique dst nodes.

        Return:

            It should return a tensor with shape (batch_size, out_dims). The
            batch size should be the same as msg.
    '''
    pass

Parameters

msg – A tensor or a dictionary of tensor created by send function..
reduce_func – A callable UDF reduce function.

Returns

A tensor with shape (num_nodes, out_dims). The output for nodes with no message will be zeros.

sample_predecessor(nodes, max_degree, return_eids=False, shuffle=False)[source]¶

Sample predecessor of given nodes.

Parameters

nodes – Given nodes whose predecessor will be sampled.
max_degree – The max sampled predecessor for each nodes.
return_eids – Whether to return the corresponding eids.

Returns

Return a list of numpy.ndarray and each numpy.ndarray represent a list of sampled predecessor ids for given nodes. If return_eids=True, there will be an additional list of numpy.ndarray and each numpy.ndarray represent a list of eids that connected nodes to their predecessors.

sample_successor(nodes, max_degree, return_eids=False, shuffle=False)[source]¶

Sample successors of given nodes.

Parameters

nodes – Given nodes whose successors will be sampled.
max_degree – The max sampled successors for each nodes.
return_eids – Whether to return the corresponding eids.

Returns

Return a list of numpy.ndarray and each numpy.ndarray represent a list of sampled successor ids for given nodes. If return_eids=True, there will be an additional list of numpy.ndarray and each numpy.ndarray represent a list of eids that connected nodes to their successors.

send(message_func, src_feat=None, dst_feat=None, edge_feat=None, node_feat=None)[source]¶

Send message from all src nodes to dst nodes.

The UDF message function should has the following format.

def message_func(src_feat, dst_feat, edge_feat):
    '''
        Args:
            src_feat: the node feat dict attached to the src nodes.
            dst_feat: the node feat dict attached to the dst nodes.
            edge_feat: the edge feat dict attached to the
                       corresponding (src, dst) edges.

        Return:
            It should return a tensor or a dictionary of tensor. And each tensor
            should have a shape of (num_edges, dims).
    '''
    return {'msg': src_feat['h']}

Parameters

message_func – UDF function.
src_feat – a dict {name: tensor,} to build src node feat
dst_feat – a dict {name: tensor,} to build dst node feat
node_feat – a dict {name: tensor,} to build both src and dst node feat
edge_feat – a dict {name: tensor,} to build edge feat

Returns

A dictionary of tensor representing the message. Each of the values in the dictionary has a shape (num_edges, dim) which should be collected by recv function.

send_recv(feature, reduce_func='sum')[source]¶

This method combines the send and recv function.

Now, this method only supports default copy send function, and built-in receive function (‘sum’, ‘mean’, ‘max’, ‘min’).

Parameters

feature (Tensor | Tensor List) – the node feature of a graph.
reduce_func (str) – ‘sum’, ‘mean’, ‘max’, ‘min’ built-in receive function.

sorted_edges(sort_by='src')[source]¶

Return sorted edges with different strategies.

This function will return sorted edges with different strategy. If sort_by="src", then edges will be sorted by src nodes and otherwise dst.

Parameters: sort_by – The type for sorted edges. (“src” or “dst”)
Returns: A tuple of (sorted_src, sorted_dst, sorted_eid).

successor(nodes=None, return_eids=False)[source]¶

Find successor of given nodes.

This function will return the successor of given nodes.

Parameters

nodes – Return the successor of given nodes, if nodes is None, return successor for all nodes.
return_eids – If True return nodes together with corresponding eid

Returns

Return a list of numpy.ndarray and each numpy.ndarray represent a list of successor ids for given nodes. If return_eids=True, there will be an additional list of numpy.ndarray and each numpy.ndarray represent a list of eids that connected nodes to their successors.

Example

import numpy as np
import pgl

num_nodes = 5
edges = [ (0, 1), (1, 2), (3, 4)]
graph = pgl.Graph(num_nodes=num_nodes,
        edges=edges)
succ, succ_eid = graph.successor(return_eids=True)

This will give output.

succ:
      [[1],
       [2],
       [],
       [4],
       []]

succ_eid:
      [[0],
       [1],
       [],
       [2],
       []]

tensor(inplace=True)[source]¶

Convert the Graph into paddle.Tensor format.

In paddle.Tensor format, the graph edges and node features are in paddle.Tensor format. You can use send and recv in paddle.Tensor graph.

Parameters: inplace – (Default True) Whether to convert the graph into tensor inplace.

to_mmap(path='./tmp')[source]¶: Turn the Graph into Memmap mode which can share memory between processes.