GitHub

virtualenv venv
. venv/bin/activate
pip install -r requirements.txt

Getting the data

Choose your search, e.g. <url> = http://content.guardianapis.com/search?tag=profile%2Fpollytoynbee&show-fields=body

Then:

wget <url> -O example.json
./json2txt.py < example.json > example.txt
./txt2tags.py < example.txt > example.tags
./rules.py < example.tags > example.rules

Or the quick way:

wget <url> -O - | ./json2txt.py | ./txt2tags.py | ./rules.py  > example.rules

`json2txt.py`

Extracts article content from the results of a Content API content search

`txt2tags.py`

Takes a body of text and tags each word, outputting a word and its tag to each line.

Analysing the data

Text concordance checker: `tcc.py`

Context-free grammar parser: `cfg.py`

Parses a set of CFG rules and generates some text from a given base rule. Heavily based on http://pdos.csail.mit.edu/scigen/

./cfg.py [-l] <base rule> < examples.rules

-l lists the rules related to <base rule> instead of generating text

Rules file format

non-terminal <terminal/non-terminal>...

non-terminal must be of the format [A-Z][A-Z0-9_]*

e.g.

SENTENCE This is WORD

WORD easy
WORD hard
WORD another WORD2

WORD2 level

./cfg.py SENTENCE < eg.rules
This is easy

Generate rules based on text concordance: `rules.py`

Generates some rules based on common proper nouns, adjectives and verbs in the given text

./rules.py < example.tags > example.rules

Use rules2json.py to convert the rules to a JSON object the app can use.

Name		Name	Last commit message	Last commit date
Latest commit History 65 Commits
app		app
data		data
.gitignore		.gitignore
README.md		README.md
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Getting the data

`json2txt.py`

`txt2tags.py`

Analysing the data

Text concordance checker: `tcc.py`

Context-free grammar parser: `cfg.py`

Rules file format

Generate rules based on text concordance: `rules.py`

About

Releases

Packages

Languages

wpf500/guarbot

Folders and files

Latest commit

History

Repository files navigation

Getting the data

json2txt.py

txt2tags.py

Analysing the data

Text concordance checker: tcc.py

Context-free grammar parser: cfg.py

Rules file format

Generate rules based on text concordance: rules.py

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

`json2txt.py`

`txt2tags.py`

Text concordance checker: `tcc.py`

Context-free grammar parser: `cfg.py`

Generate rules based on text concordance: `rules.py`

Packages