WikiCat

Code to train and evaluate a Wikipedia page categorizer

Code Structure

Run WikiCatBuild.py category_uri_file [options] to scrape a list of Categories

Options: category_uri_file File containig a newline separated list of URIs to Cateogry pages -h Print this -v [0,1,2,3] Set verbosity level. Defaults to 1. -r Root directory where model, representer, cache, and GloVe vectors will be stored. Make sure there's at least 3GB available for the GloVe vectors.

Run WikiCatClassify.py uri [options] afterwards with the arguments specified to obtain a list of category probabilities for the page specified Options: uri -h Print this -v [0,1,2,3] Set verbosity level -r Root directory containing representer and model.

IPython Notebooks. Run WikiCatBuild with -r test at least once to use these notebooks unmodified

Scraper No Scraping! : This is the notebook I used to prototype the scraping process
Exploratory : This is the notebook I used to to prototype the process of finding a decent classifier for the data
Analysis : This is a notebook going through some properties of the data and (to a lesser extent) the learned classifier

##Installation

Note: So far I can only support Python 3.

git clone https://github.com/zmjjmz/WikiCat.git
cd WikiCat/
pip install -r requirements.txt

The scripts and notebook should be useable as outlined above.

Name		Name	Last commit message	Last commit date
Latest commit History 14 Commits
Analysis.ipynb		Analysis.ipynb
Exploratory.ipynb		Exploratory.ipynb
LICENSE		LICENSE
README.md		README.md
Scraper No Scraping!.ipynb		Scraper No Scraping!.ipynb
WikiCatBuild.py		WikiCatBuild.py
WikiCatTest.py		WikiCatTest.py
WikiCatUtils.py		WikiCatUtils.py
example_cats.txt		example_cats.txt
requirements.txt		requirements.txt
setup.py		setup.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

WikiCat

Code Structure

About

Releases 1

Packages

Languages

License

zmjjmz/WikiCat

Folders and files

Latest commit

History

Repository files navigation

WikiCat

Code Structure

About

Resources

License

Stars

Watchers

Forks

Releases 1

Packages 0

Languages

Packages