node2vec: Scalable Feature Learning for Networks

Node2vec is an algorithmic framework for representational learning on graphs. Given any graph, it can learn continuous feature representations for the nodes, which can then be used for various downstream machine learning tasks. Based on PGL, we reproduce node2vec algorithms and reach the same level of indicators as the paper.

Datasets

The datasets contain two networks: BlogCatalog and Arxiv.

Dependencies

  • paddlepaddle>=1.4

  • pgl

How to run

For examples, use gpu to train gcn on cora dataset.

# multiclass task example
python node2vec.py --use_cuda --dataset BlogCatalog --save_path ./tmp/node2vec_BlogCatalog/ --offline_learning --epoch 400

python multi_class.py --use_cuda --ckpt_path ./tmp/node2vec_BlogCatalog/paddle_model --epoch 1000

# link prediction task example
python node2vec.py --use_cuda --dataset ArXiv --save_path
./tmp/node2vec_ArXiv --offline_learning --epoch 10

python link_predict.py --use_cuda --ckpt_path ./tmp/node2vec_ArXiv/paddle_model --epoch 400

Hyperparameters

  • dataset: The citation dataset “BlogCatalog” and “ArXiv”.

  • use_cuda: Use gpu if assign use_cuda.

Experiment results

Dataset

model

Task

Metric

PGL Result

Reported Result

BlogCatalog

deepwalk

multi-label classification

MacroF1

0.250

0.211

BlogCatalog

node2vec

multi-label classification

MacroF1

0.262

0.258

ArXiv

deepwalk

link prediction

AUC

0.9538

0.9340

ArXiv

node2vec

link prediction

AUC

0.9541

0.9366