Skip to content

Bad request when continuations query for zh.wikipedia #162

Open
@arbalest339

Description

@arbalest339

Hello, I have a problem when query zh.wikipedia, here is my code and console output.

import wptools
page = wptools.page('西安', lang='zh')
page.get_query(proxy='http://127.0.0.1:1080')   # local proxy
zh.wikipedia.org (query) 西安
zh.wikipedia.org (query) 西安市 (&plcontinue=7536|0|炮里街道)
Traceback (most recent call last):
  File "d:\software\Anaconda\lib\site-packages\wptools\core.py", line 199, in _load_response
    data = utils.json_loads(response)
  File "d:\software\Anaconda\lib\site-packages\wptools\utils.py", line 95, in json_loads
    return json.loads(data, encoding='utf-8')
  File "d:\software\Anaconda\lib\json\__init__.py", line 357, in loads
    return _default_decoder.decode(s)
  File "d:\software\Anaconda\lib\json\decoder.py", line 337, in decode
    obj, end = self.raw_decode(s, idx=_w(s, 0).end())
  File "d:\software\Anaconda\lib\json\decoder.py", line 355, in raw_decode
    raise JSONDecodeError("Expecting value", s, err.value) from None
json.decoder.JSONDecodeError: Expecting value: line 1 column 1 (char 0)

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "d:\software\Anaconda\lib\runpy.py", line 194, in _run_module_as_main
    return _run_code(code, main_globals, None,
  File "d:\software\Anaconda\lib\runpy.py", line 87, in _run_code
    exec(code, run_globals)
  File "c:\Users\lzk\.vscode\extensions\ms-python.python-2020.12.424452561\pythonFiles\lib\python\debugpy\__main__.py", line 45, in <module>
    cli.main()
  File "c:\Users\lzk\.vscode\extensions\ms-python.python-2020.12.424452561\pythonFiles\lib\python\debugpy/..\debugpy\server\cli.py", line 444, in main
    run()
  File "c:\Users\lzk\.vscode\extensions\ms-python.python-2020.12.424452561\pythonFiles\lib\python\debugpy/..\debugpy\server\cli.py", line 285, in run_file
    runpy.run_path(target_as_str, run_name=compat.force_str("__main__"))
  File "d:\software\Anaconda\lib\runpy.py", line 265, in run_path
    return _run_module_code(code, init_globals, run_name,
  File "d:\software\Anaconda\lib\runpy.py", line 97, in _run_module_code
    _run_code(code, mod_globals, init_globals,
  File "d:\software\Anaconda\lib\runpy.py", line 87, in _run_code
    exec(code, run_globals)
  File "d:\项目代码\wiki\wikitools_exp.py", line 3, in <module>
    page.get_query(proxy='http://127.0.0.1:1080')   # 本地建立代理
  File "d:\software\Anaconda\lib\site-packages\wptools\page.py", line 641, in get_query
  File "d:\software\Anaconda\lib\site-packages\wptools\core.py", line 183, in _get
    self._set_data(action)
  File "d:\software\Anaconda\lib\site-packages\wptools\page.py", line 200, in _set_data
    self._set_query_data(action)
  File "d:\software\Anaconda\lib\site-packages\wptools\page.py", line 295, in _set_query_data
    data = self._load_response(action)
  File "d:\software\Anaconda\lib\site-packages\wptools\core.py", line 201, in _load_response
    raise ValueError(_query)
ValueError: https://zh.wikipedia.org/w/api.php?action=query&exintro&formatversion=2&inprop=url|watchers&list=random&pithumbsize=240&pllimit=500&ppprop=disambiguation|wikibase_item&prop=extracts|info|links|pageassessments|pageimages|pageprops|pageterms|redirects&redirects&rdlimit=500&rnlimit=1&rnnamespace=0&titles=%E8%A5%BF%E5%AE%89%E5%B8%82&plcontinue=7536|0|炮里街道

I've noticed that when the query of "西安" is finished, wptools continued to query "炮里街道", that is not what I needed. So I further read the source code and in the file page.py, line 640, it seems that wptools try to make more queries from the "continue" field.

Issue 57 said that this is a new support, but this support should be an option implement in the function "get_querymore" (line 645). However, this continuation support is now implemented in function "get_query" too. I believe that this is a little bug to be fixed.

Although redundant, it still works well for en.wikipedia. But when query zh.wikipedia, there seems something wrong with the URL and pycurl always returns "Bad request" (core.py line 175), which is not an json format and cannot be dumped by json. So I believe that this is another bug.

At present, I simply delete line 640-641 of page.py and it works very well to me. Looking forward to your reply.

Metadata

Metadata

Assignees

Labels

No labels
No labels

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions