-
Notifications
You must be signed in to change notification settings - Fork 8
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Wikiprojects parser sanity tests #1
Conversation
codez266
commented
Sep 6, 2017
- Add tests for methods used by WikiProjects parser
- Add testfiles/ with sample wikitext containing WikiProjects for comparison in tests
- Modify parser from iterative to recursive
- Minor improvements to regexes
* Add tests for methods used by WikiProjects parser * Add testfiles/ with sample wikitext containing WikiProjects for parsing * Modify parser from iterative to recursive * Minor improvements to regex
Remove session as parameter to functions
|
||
excluded_keys = ['topics', 'root_url', 'index', 'name', 'url'] | ||
|
||
def test_WikiProjectsFromTable(): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
functions should be snake case https://www.python.org/dev/peps/pep-0008/#function-names
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
fixed.
@@ -61,17 +61,13 @@ def main(argv=None): | |||
r'\[\[Wikipedia:WikiProject Council/Directory/([A-Za-z_, ]+)\|([A-Za-z_, ]+)\]\]=' | |||
WPListingRegex =\ | |||
r'See the full listing \[\[Wikipedia:WikiProject Council/Directory/([A-Za-z_,/ ]+)' |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
"Wikipedia:WikiProject Council/Directory" should be a parameter or a static variable
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
fixed. Only left to figure out how to parameterize the complex regex - wp_section_regex
wp[sec['line']]['url']) | ||
|
||
|
||
subPageSections = self.getSections(wp[sec['line']]['url']) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Variable names should be snake case (lower_case_with_underscores) https://www.python.org/dev/peps/pep-0008/#descriptive-naming-styles
prevTopic = None | ||
self.logger.info("Index:{}, Level:{}".format(index, level)) | ||
idx = index | ||
while idx < len(sections): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
would for section in sections
be better here?
You can also do something like:
for i, section in enumerate(sections):
...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would've have done that but the recursion inherently depends on fast-forwarding the loop variable over parsed sections while exiting to avoid repeated calls to parse sections.
wpTopicsCompare(actualTopics, topics) | ||
|
||
def test_getWikiProjectsFromSectionIntroText(): | ||
parser = WikiProjectsParser(WPDPage) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This parser is actually making an HTTP connection. That means the test could fail if you don't have a connection to the internet. It's better to download the page content and include the data in the test so that an internet connection is not necessary.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Only the test - test_get_sub_categories makes api calls while all others use offline data.
Even the above test has been made to cache data offline in case tests are repeated. However, I'll still think about how can we mock api calls, as the method tested by this test makes repeated api calls for sections rather than fetching a page in a single call.
Generally, I'm concerned that the WikiProjectsParser constructs a
Then you'd write tests for parse_section() but not extract_directory which would only be responsible for making calls to the api iteratively. In the end, this works. I'd like to, at minimum, get it pep8 compliant before we merge. We can worry about a restructuring later. |
I agree with your point that we should have a session variable in the top level function extract_directory and make api calls only there. But because of certain nesting in the structure of WikiProjects directory, api calls only in the extract_directory was becoming impossible. Will discuss this in detail and think about how to restructure. The above approach would work if I had fetched the entire page once and called helper methods subsequently to parse sections. However that seemed difficult because:
|
This looks good! I ran |
Thanks Adam! for your suggestions regarding flake8. |
Wikiprojects parser sanity tests