Skip to content

turbo/medium125k

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

8 Commits
 
 
 
 
 
 
 
 

Repository files navigation

medium125k

This is a data set of more than 125,000 titles and subtitles of articles published at medium.com. The data is formatted as CSV and the columns are:


The data set should only contain english posts, but sometimes a foreign title slipped through the Medium language detector (e.g. chinese) when the input is ambiguous, like in > 我等香港人:海外研究生就法院有關雨傘佔領者判決之聲明 We Hong Kongers: Statement by Overseas Graduate Students on the. However, not many of these are present in the set and manual cleanup shouldn't take too long (if you did this, please open a PR).

Available on Kaggle.

About

Dataset of 125,000 Medium Blog Post Titles and Subtitles (with Categories)

Resources

Stars

Watchers

Forks

Packages

No packages published