Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Do you have any API or DEMO for PageRank OR GNN #9

Closed
cfangplus opened this issue May 4, 2023 · 8 comments
Closed

Do you have any API or DEMO for PageRank OR GNN #9

cfangplus opened this issue May 4, 2023 · 8 comments

Comments

@cfangplus
Copy link

hi,

I used BVGraph API to load graph of nearly hundreds of billion edges, and the memory cost is nearly 10 GB, that's amazing.
Now as you know graph processing like Pagerank or Graph neural network like GCN are become very populer, so the question is,
Could WebGraph used as storage layer for graph processing and graph neural network ? Do U have any work on this area as part of your anlysis of web graphs and socical networks ? THX.

@vigna
Copy link
Owner

vigna commented May 4, 2023

You can certainly use the framework for that albeit an EFGraph might be more appropriate. We never worked on that tho...

@cfangplus
Copy link
Author

We found that many papers cites the layered label propagation paper, among which includes graph processing system like GraphX/Gemini/PowerGraph and graph neural network system like P3. Specially, many of them use dataset like uk-2007-05. However, we found those who opened their source code like GraphX/Gemini system tried to transfer the .graph file into <source id, target id> txt format via ArcListASCIIGraph class. After this operation, the graph data loaded into memory is not compressed. So when the graph data is large, there may cause out of memory issue, but we thought what WebGraph/LLP really aims to do is to resolve this memory issue when facing huge large graph. That's really confused me, do you have any suggestion?

@vigna
Copy link
Owner

vigna commented May 5, 2023

We distribute data in the WebGraph format, and a lot of people use the data, but the same people does not want to use the WebGraph interface, so they convert to ASCII. Sad, but it works for them apparently.

@cfangplus
Copy link
Author

cfangplus commented May 6, 2023

We also observed an experiment that we list the figure as follow and it's from the paper of graph processing system Gemini.
Hereby they set up an experiment for their called chunk-based partitioning method which partition graph with load balance. Although they use the <source id, target id> txt format rather than .graph format, the results show that locality property of uk-2007-05 graph data is very good which helps getting a better performance. So I think although graph data with <source id, target id> txt format is not compressed but the node ordering still helped to maintain the locality property, right? Sorry I have not totally understand the theory of WebGraph or LLP, so need your help to check the judgement.
捕获

@vigna
Copy link
Owner

vigna commented May 6, 2023

Totally. The ordering of the graph is a form of clustering. Chunks of consecutive nodes have dense connections.

@cfangplus
Copy link
Author

Great thx.

@cfangplus
Copy link
Author

Just one idea, I observed that BVGraph/EVGraph/ASCIIGraph class provide many load and store methods, I think maybe WebGraph could provide more methods for fast random access to a link , such as fast access to neighbors or subgraph for one node. If so, maybe people would more likely trends to use the WebGraph interface rather than convert to ASCII to get coo data format which is not compressed.

@vigna
Copy link
Owner

vigna commented May 9, 2023

I think you really misunderstood what WebGraph does. Please have a look at ImmutableGraph.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants