pgl.graph¶
This package implement Graph structure for handling graph data.
-
class
pgl.graph.
Graph
(edges, num_nodes=None, node_feat=None, edge_feat=None, **kwargs)[source]¶ Bases:
object
Implementation of graph interface in pgl.
This is a simple implementation of graph structure in pgl. pgl.Graph is alias on pgl.graph.Graph
- Parameters
edges – list of (u, v) tuples, 2D numpy.ndarry or 2D paddle.Tensor
(optional (num_nodes) – int, numpy or paddle.Tensor): Number of nodes in a graph. If not provided, the number of nodes will be infered from edges.
node_feat (optional) – a dict of numpy array as node features
edge_feat (optional) – a dict of numpy array as edge features (should have consistent order with edges)
Examples 1:
Create a graph with numpy.
Convert it into paddle.Tensor .
Do send recv for graph neural network.
import numpy as np import pgl num_nodes = 5 edges = [ (0, 1), (1, 2), (3, 4)] feature = np.random.randn(5, 100).astype(np.float32) edge_feature = np.random.randn(3, 100).astype(np.float32) graph = pgl.Graph(num_nodes=num_nodes, edges=edges, node_feat={ "feature": feature }, edge_feat={ "edge_feature": edge_feature }) graph.tensor() model = pgl.nn.GCNConv(100, 100) out = model(graph, graph.node_feat["feature"])
Examples 2:
Create a graph with paddle.Tensor.
Do send recv for graph neural network.
import paddle import pgl num_nodes = 5 edges = paddle.to_tensor([ (0, 1), (1, 2), (3, 4)]) feature = paddle.randn(shape=[5, 100]) edge_feature = paddle.randn(shape=[3, 100]) graph = pgl.Graph(num_nodes=num_nodes, edges=edges, node_feat={ "feature": feature }, edge_feat={ "edge_feature": edge_feature }) model = pgl.nn.GCNConv(100, 100) out = model(graph, graph.node_feat["feature"])
-
property
adj_dst_index
¶ Return an EdgeIndex object for dst.
-
property
adj_src_index
¶ Return an EdgeIndex object for src.
-
classmethod
disjoint
(graph_list, merged_graph_index=False)[source]¶ This method disjoint list of graph into a big graph.
- Parameters
graph_list (Graph List) – A list of Graphs.
merged_graph_index – whether to keeped the graph_id that the nodes belongs to.
import numpy as np import pgl num_nodes = 5 edges = [ (0, 1), (1, 2), (3, 4)] graph = pgl.Graph(num_nodes=num_nodes, edges=edges) joint_graph = pgl.Graph.disjoint([graph, graph], merged_graph_index=False) print(joint_graph.graph_node_id) >>> [0, 0, 0, 0, 0, 1, 1, 1, 1 ,1] print(joint_graph.num_graph) >>> 2 joint_graph = pgl.Graph.disjoint([graph, graph], merged_graph_index=True) print(joint_graph.graph_node_id) >>> [0, 0, 0, 0, 0, 0, 0, 0, 0, 0] print(joint_graph.num_graph) >>> 1
-
dump
(path)[source]¶ Dump the graph into a directory.
This function will dump the graph information into the given directory path. The graph can be read back with
pgl.Graph.load
- Parameters
path – The directory for the storage of the graph.
-
property
edge_feat
¶ Return a dictionary of edge features.
-
property
edges
¶ Return all edges in numpy.ndarray or paddle.Tensor with shape (num_edges, 2).
-
property
graph_edge_id
¶ Return a numpy.ndarray or paddle.Tensor with shape [num_edges] that indicates which graph the edges belongs to.
Examples:
import numpy as np import pgl num_nodes = 5 edges = [ (0, 1), (1, 2), (3, 4)] graph = pgl.Graph(num_nodes=num_nodes, edges=edges) joint_graph = pgl.Graph.batch([graph, graph]) print(joint_graph.graph_edge_id) >>> [0, 0, 0, 1, 1, 1]
-
property
graph_node_id
¶ Return a numpy.ndarray or paddle.Tensor with shape [num_nodes] that indicates which graph the nodes belongs to.
Examples:
import numpy as np import pgl num_nodes = 5 edges = [ (0, 1), (1, 2), (3, 4)] graph = pgl.Graph(num_nodes=num_nodes, edges=edges) joint_graph = pgl.Graph.batch([graph, graph]) print(joint_graph.graph_node_id) >>> [0, 0, 0, 0, 0, 1, 1, 1, 1 ,1]
-
indegree
(nodes=None)[source]¶ Return the indegree of the given nodes
This function will return indegree of given nodes.
- Parameters
nodes – Return the indegree of given nodes, if nodes is None, return indegree for all nodes
- Returns
A numpy.ndarray or paddle.Tensor as the given nodes’ indegree.
-
classmethod
load
(path, mmap_mode='r')[source]¶ Load Graph from path and return a Graph in numpy.
- Parameters
path – The directory path of the stored Graph.
mmap_mode – Default
mmap_mode="r"
. If not None, memory-map the graph.
-
node_batch_iter
(batch_size, shuffle=True)[source]¶ Node batch iterator
Iterate all node by batch.
- Parameters
batch_size – The batch size of each batch of nodes.
shuffle – Whether shuffle the nodes.
- Returns
Batch iterator
-
property
node_feat
¶ Return a dictionary of node features.
-
property
nodes
¶ Return all nodes id from 0 to
num_nodes - 1
-
property
num_edges
¶ Return the number of edges.
-
property
num_graph
¶ Return Number of Graphs
-
property
num_nodes
¶ Return the number of nodes.
-
numpy
(inplace=True)[source]¶ Convert the Graph into numpy format.
In numpy format, the graph edges and node features are in numpy.ndarray format. But you can’t use send and recv in numpy graph.
- Parameters
inplace – (Default True) Whether to convert the graph into numpy inplace.
-
outdegree
(nodes=None)[source]¶ Return the outdegree of the given nodes.
This function will return outdegree of given nodes.
- Parameters
nodes – Return the outdegree of given nodes, if nodes is None, return outdegree for all nodes
- Returns
A numpy.array or paddle.Tensor as the given nodes’ outdegree.
-
predecessor
(nodes=None, return_eids=False)[source]¶ Find predecessor of given nodes.
This function will return the predecessor of given nodes.
- Parameters
nodes – Return the predecessor of given nodes, if nodes is None, return predecessor for all nodes.
return_eids – If True return nodes together with corresponding eid
- Returns
Return a list of numpy.ndarray and each numpy.ndarray represent a list of predecessor ids for given nodes. If
return_eids=True
, there will be an additional list of numpy.ndarray and each numpy.ndarray represent a list of eids that connected nodes to their predecessors.
Example
import numpy as np import pgl num_nodes = 5 edges = [ (0, 1), (1, 2), (3, 4)] graph = pgl.Graph(num_nodes=num_nodes, edges=edges) pred, pred_eid = graph.predecessor(return_eids=True)
This will give output.
pred: [[], [0], [1], [], [3]] pred_eid: [[], [0], [1], [], [2]]
-
recv
(reduce_func, msg, recv_mode='dst')[source]¶ Recv message and aggregate the message by reduce_func
The UDF reduce_func function should has the following format.
def reduce_func(msg): ''' Args: msg: A LodTensor or a dictionary of LodTensor whose batch_size is equals to the number of unique dst nodes. Return: It should return a tensor with shape (batch_size, out_dims). The batch size should be the same as msg. ''' pass
- Parameters
msg – A tensor or a dictionary of tensor created by send function..
reduce_func – A callable UDF reduce function.
- Returns
A tensor with shape (num_nodes, out_dims). The output for nodes with no message will be zeros.
-
sample_predecessor
(nodes, max_degree, return_eids=False, shuffle=False)[source]¶ Sample predecessor of given nodes.
- Parameters
nodes – Given nodes whose predecessor will be sampled.
max_degree – The max sampled predecessor for each nodes.
return_eids – Whether to return the corresponding eids.
- Returns
Return a list of numpy.ndarray and each numpy.ndarray represent a list of sampled predecessor ids for given nodes. If
return_eids=True
, there will be an additional list of numpy.ndarray and each numpy.ndarray represent a list of eids that connected nodes to their predecessors.
-
sample_successor
(nodes, max_degree, return_eids=False, shuffle=False)[source]¶ Sample successors of given nodes.
- Parameters
nodes – Given nodes whose successors will be sampled.
max_degree – The max sampled successors for each nodes.
return_eids – Whether to return the corresponding eids.
- Returns
Return a list of numpy.ndarray and each numpy.ndarray represent a list of sampled successor ids for given nodes. If
return_eids=True
, there will be an additional list of numpy.ndarray and each numpy.ndarray represent a list of eids that connected nodes to their successors.
-
send
(message_func, src_feat=None, dst_feat=None, edge_feat=None, node_feat=None)[source]¶ Send message from all src nodes to dst nodes.
The UDF message function should has the following format.
def message_func(src_feat, dst_feat, edge_feat): ''' Args: src_feat: the node feat dict attached to the src nodes. dst_feat: the node feat dict attached to the dst nodes. edge_feat: the edge feat dict attached to the corresponding (src, dst) edges. Return: It should return a tensor or a dictionary of tensor. And each tensor should have a shape of (num_edges, dims). ''' return {'msg': src_feat['h']}
- Parameters
message_func – UDF function.
src_feat – a dict {name: tensor,} to build src node feat
dst_feat – a dict {name: tensor,} to build dst node feat
node_feat – a dict {name: tensor,} to build both src and dst node feat
edge_feat – a dict {name: tensor,} to build edge feat
- Returns
A dictionary of tensor representing the message. Each of the values in the dictionary has a shape (num_edges, dim) which should be collected by
recv
function.
-
send_recv
(feature, reduce_func='sum')[source]¶ This method combines the send and recv function.
Now, this method only supports default copy send function, and built-in receive function (‘sum’, ‘mean’, ‘max’, ‘min’).
- Parameters
feature (Tensor | Tensor List) – the node feature of a graph.
reduce_func (str) – ‘sum’, ‘mean’, ‘max’, ‘min’ built-in receive function.
-
sorted_edges
(sort_by='src')[source]¶ Return sorted edges with different strategies.
This function will return sorted edges with different strategy. If
sort_by="src"
, then edges will be sorted bysrc
nodes and otherwisedst
.- Parameters
sort_by – The type for sorted edges. (“src” or “dst”)
- Returns
A tuple of (sorted_src, sorted_dst, sorted_eid).
-
successor
(nodes=None, return_eids=False)[source]¶ Find successor of given nodes.
This function will return the successor of given nodes.
- Parameters
nodes – Return the successor of given nodes, if nodes is None, return successor for all nodes.
return_eids – If True return nodes together with corresponding eid
- Returns
Return a list of numpy.ndarray and each numpy.ndarray represent a list of successor ids for given nodes. If
return_eids=True
, there will be an additional list of numpy.ndarray and each numpy.ndarray represent a list of eids that connected nodes to their successors.
Example
import numpy as np import pgl num_nodes = 5 edges = [ (0, 1), (1, 2), (3, 4)] graph = pgl.Graph(num_nodes=num_nodes, edges=edges) succ, succ_eid = graph.successor(return_eids=True)
This will give output.
succ: [[1], [2], [], [4], []] succ_eid: [[0], [1], [], [2], []]