struc2vec: Learning Node Representations from Structural Identity

Struc2vec is is a concept of symmetry in which network nodes are identified according to the network structure and their relationship to other nodes. A novel and flexible framework for learning latent representations is proposed in the paper of struc2vec. We reproduce Struc2vec algorithm in the PGL.

DataSet

The paper of use air-traffic network to valid algorithm of Struc2vec. The each edge in the dataset indicate that having one flight between the airports. Using the the connection between the airports to predict the level of activity. The following dataset will be used to valid the algorithm accuracy.Data collected from the Bureau of Transportation Statistics2 from January to October, 2016. The network has 1,190 nodes, 13,599 edges (diameter is 8). Link

  • usa-airports.edgelist

  • labels-usa-airports.txt

Dependencies

If use want to use the struc2vec model in pgl, please install the gensim, pathos, fastdtw additional.

  • paddlepaddle>=1.6

  • pgl

  • gensim

  • pathos

  • fastdtw

How to use

For examples, we want to train and valid the Struc2vec model on American airpot dataset

python struc2vec.py –edge_file data/usa-airports.edgelist –label_file data/labels-usa-airports.txt –train True –valid True –opt2 True

Hyperparameters

Args

Meaning

edge_file

input file name for edges

label_file

input file name for node label

emb_file

input file name for node label

walk_depth

The step3 for random walk

opt1

The flag to open optimization 1 to reduce time cost

opt2

The flag to open optimization 2 to reduce time cost

w2v_emb_size

The dims of output the word2vec embedding

w2v_window_size

The context length of word2vec

w2v_epoch

The num of epoch to train the model.

train

The flag to run the struc2vec algorithm to get the w2v embedding

valid

The flag to use the w2v embedding to valid the classification result

num_class

The num of class in classification model to be trained

Experiment results

Dataset

Model

Metric

PGL Result

Paper repo Result

American airport dataset

Struc2vec without time cost optimization

ACC

0.6483

0.6340

American airport dataset

Struc2vec with optimization 1

ACC

0.6466

0.6242

American airport dataset

Struc2vec with optimization 2

ACC

0.6252

0.6241

American airport dataset

Struc2vec with optimization1&2

ACC

0.6226

0.6083