LINE: Large-scale Information Network Embedding¶
LINE is an algorithmic framework for embedding very large-scale information networks. It is suitable to a variety of networks including directed, undirected, binary or weighted edges. Based on PGL, we reproduce LINE algorithms and reach the same level of indicators as the paper.
Datasets¶
Flickr network is a social network, which contains 1715256 nodes and 22613981 edges.
You can dowload data from here.
Flickr network contains four files:
flickr-groupmemberships.txt.gz
flickr-groups.txt.gz
flickr-links.txt.gz
flickr-users.txt.gz
After downloading the data,uncompress them, let’s say, in ./data/flickr/ . Note that the current directory is the root directory of LINE model.
Then you can run the below command to preprocess the data.
python data_process.py
Then it will produce three files in ./data/flickr/ directory:
nodes.txt
edges.txt
nodes_label.txt
Dependencies¶
paddlepaddle>=1.6
pgl
How to run¶
For examples, use gpu to train LINE on Flickr dataset.
# multiclass task example
python line.py --use_cuda --order first_order --data_path ./data/flickr/ --save_dir ./checkpoints/model/
python multi_class.py --ckpt_path ./checkpoints/model/model_epoch_20 --percent 0.5
Hyperparameters¶
-use_cuda: Use gpu if assign use_cuda.
-order: LINE with First_order Proximity or Second_order Proximity
-percent: The percentage of data as training data
Experiment results¶
Dataset |
model |
Task |
Metric |
PGL Result |
Reported Result |
---|---|---|---|---|---|
Flickr |
LINE with first_order |
multi-label classification |
MacroF1 |
0.626 |
0.627 |
Flickr |
LINE with first_order |
multi-label classification |
MicroF1 |
0.637 |
0.639 |
Flickr |
LINE with second_order |
multi-label classification |
MacroF1 |
0.615 |
0.621 |
Flickr |
LINE with second_order |
multi-label classification |
MicroF1 |
0.630 |
0.635 |