some questions about the kg extraction process #26

cd86254081 · 2020-03-28T03:16:15Z

I followed the process of kb4rec and extracted a subgraph of movie domain for movielens-1m, which considers only the most important 10 relations. However, for a dataset that only contains 3000+ items. The entity number in my extracted first-order subgraph reached 100000+. I found that in most papers, for dataset movielens their number of entity in subgraph is realtively small, maybe about 20000.
Could you please share how you treat the subgraph so that the entity number in the subgraph is small.

xiangwang1223 · 2020-03-28T12:56:06Z

Hi, based on my experience, 10-core (or 5-core) setting is usually adopted to guarantee the data quality, i.e., retaining entities with at least ten triples. Moreover, when extracting the KG entities from the original KG, you can only consider at most k-hop (say 2-hop) neighbors of seed items, i.e., retaining entities with at most two hops from the seed items of MovieLen.

Hope it can help you.

cd86254081 · 2020-03-28T13:14:28Z

it helps a lot,thank you so much.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

some questions about the kg extraction process #26

some questions about the kg extraction process #26

cd86254081 commented Mar 28, 2020

xiangwang1223 commented Mar 28, 2020

cd86254081 commented Mar 28, 2020

some questions about the kg extraction process #26

some questions about the kg extraction process #26

Comments

cd86254081 commented Mar 28, 2020

xiangwang1223 commented Mar 28, 2020

cd86254081 commented Mar 28, 2020