Skip to content
Branch: master
Find file History
Fetching latest commit…
Cannot retrieve the latest commit at this time.
Permalink
Type Name Latest commit message Commit time
..
Failed to load latest commit information.
data
.gitignore
README.MD
collectbatch.py
findpath.py
memo.py
wikiapi.py
wikigraph.py

README.MD

Wikigraph

The file wikigraph.py implements classes for finding paths between wikipedia articles and other related functions using the wikimedia API. A path is created by linking articles by the links they contain, just like the wikipedia game. See blog post https://winstonjay.github.io/posts/homunculus for more info on project motivations.

Example session:

The main method find_path is better run in a shell session or in a batch collection as its use of memoization will speed up searches whilst it runs, reducing requests to the Wikimedia API.

>>> import wikigraph
>>> w = wikigraph.WikiGraph()
>>> path = w.find_path(start="Tom Hanks", end="Kevin Bacon")
>>> print(path)
<wikigraph.Path: Tom Hanks -> Kevin Bacon>
>>> print(path.info)
Path:
        Path:        Tom Hanks -> Kevin Bacon
        Separation:  1 steps
        Time Taken:  0.578131 seconds
        Requests:    2

>>> path.data
{'start': 'Tom Hanks', 'end': 'Kevin Bacon', 'path': 'Tom Hanks->Kevin Bacon', 'degree': 1}
>>> print(path.json(indent=2))
{
  "start": "Tom Hanks",
  "end": "Kevin Bacon",
  "path": "Tom Hanks->Kevin Bacon",
  "degree": 1
}

collectbatch.py

For a given sample of start articles find a path from each to a central end article. Save the output to a given csv file. Without start list specified, program will default to collecting an k sized random sample generated by the wikimedia API. For more info, See command line arg details below.

usage:

-h, --help            show this help message and exit
-o OUTFILE, --outfile OUTFILE
                        Filename to save the results to.
-x CENTER, --center CENTER
                        Title of valid wiki page to center all nodes on
-k SAMPLE_SIZE, --sample_size SAMPLE_SIZE
                        Sample size of k pages to search from. (Only applies
                        when sample source is not given)
-s SAMPLE_SOURCE, --sample_source SAMPLE_SOURCE
                        Filename containing newline delimited list of valid
                        wiki article titles if not specified sample defaults
                        to random selection from wikimedia api.
-v                    add to display titles of page requests made.

Requirements: requests

You can’t perform that action at this time.