The definition of sentences? #28

GoogleCodeExporter · 2016-03-01T03:03:20Z

Hi, I am new to word2vec. I am preparing corpus in sentences using wikipedia 
dump. However the dump is pre-splitted in paragraphs which seems need to 
further be processed into sentences. 

My question is 

is it possible to directly train paragraphs instead of sentences? Or it is a 
must that word2vec (the SkipGram model) has to work with sentences. 

Since the algorithm trains the data by a context window, I didn't see much 
difference by add the extra window across sentences within the same paragraph.

Original issue reported on code.google.com by yel...@gmail.com on 24 Feb 2015 at 9:35

The text was updated successfully, but these errors were encountered:

GoogleCodeExporter · 2016-03-01T03:03:21Z

You can train with paragraphs instead of sentences, it should be OK.

Original comment by tmiko...@gmail.com on 25 Feb 2015 at 8:24

GoogleCodeExporter added Priority-Medium Type-Defect auto-migrated labels Mar 1, 2016

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

The definition of sentences? #28

The definition of sentences? #28

GoogleCodeExporter commented Mar 1, 2016

GoogleCodeExporter commented Mar 1, 2016

The definition of sentences? #28

The definition of sentences? #28

Comments

GoogleCodeExporter commented Mar 1, 2016

GoogleCodeExporter commented Mar 1, 2016