Skip to content
/ robodoc Public

Finding documents that use proprietary formats on UK government websites

License

Notifications You must be signed in to change notification settings

tlocke/robodoc

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

30 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

RoboDoc

This is the software for the RoboDoc project. Its purpose is to crawl UK government websites and make a list of documents that have proprietary file formats.

Suggestions, contributions or bug reports are very welcome. Please open a new issue on GitHub or send a pull request.

Installing and running the RoboDoc software

  • Make sure Python 3 is installed.
  • Make sure Scrapy is installed.
  • Create a virtual environment: virtualenv --python=python3 venv
  • Activate virtual environment: source venv/bin/activate
  • Install via git: pip install -e git+https://github.com/tlocke/robodoc.git
  • Run with: ./run.sh

Running Tests

  • Install pytest: pip install pytest
  • Run the tests: pytest test_robodoc.py

About

Finding documents that use proprietary formats on UK government websites

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published