Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

some questions about the kg extraction process #26

Open
cd86254081 opened this issue Mar 28, 2020 · 2 comments
Open

some questions about the kg extraction process #26

cd86254081 opened this issue Mar 28, 2020 · 2 comments

Comments

@cd86254081
Copy link

I followed the process of kb4rec and extracted a subgraph of movie domain for movielens-1m, which considers only the most important 10 relations. However, for a dataset that only contains 3000+ items. The entity number in my extracted first-order subgraph reached 100000+. I found that in most papers, for dataset movielens their number of entity in subgraph is realtively small, maybe about 20000.
Could you please share how you treat the subgraph so that the entity number in the subgraph is small.

@xiangwang1223
Copy link
Owner

Hi, based on my experience, 10-core (or 5-core) setting is usually adopted to guarantee the data quality, i.e., retaining entities with at least ten triples. Moreover, when extracting the KG entities from the original KG, you can only consider at most k-hop (say 2-hop) neighbors of seed items, i.e., retaining entities with at most two hops from the seed items of MovieLen.

Hope it can help you.

@cd86254081
Copy link
Author

it helps a lot,thank you so much.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants