JSONDecodeError #93

eimis41 · 2018-02-19T21:18:19Z

When running this:
twitterscraper "ethereum OR eth" -bd 2018-01-02 -ed 2018-01-15 -o bitcoin_tweets.json

I get this error:
ERROR:root:Failed to parse JSON "Expecting value: line 1 column 1 (char 0)" while requesting "https://twitter.com/i/search/timeline?f=tweets&vertical=default&include_available_features=1&include_entities=1&reset_error_state=false&src=typd&max_position=TWEET-949063965027569664-949067891449634816&q=ethereum%20OR%20eth%20since%3A2018-01-04%20until%3A2018-01-05&l=None".
Traceback (most recent call last):
File "c:\users....\appdata\local\programs\python\python36-32\lib\site-packages\twitterscraper\query.py", line 38, in query_single_page
json_resp = response.json()
File "c:\users....\appdata\local\programs\python\python36-32\lib\site-packages\requests\models.py", line 866, in json
return complexjson.loads(self.text, **kwargs)
File "c:\users....\appdata\local\programs\python\python36-32\lib\json_init_.py", line 354, in loads
return _default_decoder.decode(s)
File "c:\users....\appdata\local\programs\python\python36-32\lib\json\decoder.py", line 339, in decode
obj, end = self.raw_decode(s, idx=_w(s, 0).end())
File "c:\users....\appdata\local\programs\python\python36-32\lib\json\decoder.py", line 357, in raw_decode
raise JSONDecodeError("Expecting value", s, err.value) from None
json.decoder.JSONDecodeError: Expecting value: line 1 column 1 (char 0)

Any idea why?

tredmill · 2018-02-24T14:17:17Z

I have the same issue, will update if i can figure it out

This change ensures that nothing is done if the response does not have status_code 200. Previously .json() method was called upon an empty response. Fixes #93

taspinar · 2018-02-25T14:22:26Z

Hi @tredmill @eimis41
I believe this PR #96 should fix this problem. Can you have a look at it?

tredmill · 2018-02-27T18:18:31Z

Thank you so much, this seems to fix the issue. I have not looked at the output thoroughly. The JSON error re-appears regularly but the script continues anyway. I am not yet sure if the extracted data is complete, although I will leave any additional remarks here.

eimis41 · 2018-03-04T20:44:35Z

@tredmill @taspinar Thank you so much!!!

hengruo · 2018-03-05T04:03:57Z

Hi @taspinar
I met this problem and I found that ujson could decode the string correctly. Maybe you could import ujson library and change response.json() into ujson.loads(response.text).

taspinar · 2018-03-05T20:53:27Z

@hengruo Thanks for the tip. I'll implement it in the bugfix_jsondecodeerror branch and add it to the PR.
But with the json library instead of ujson since that one is already imported.

If a request which is supposed to be non-html (JSON) returns an empty response or a html response, it serialization leads to an error. This error is catched with a try / except. This fixes issue #93

taspinar · 2018-04-19T10:12:20Z

This issue should be fixed with the latest version (0.6.2).

evaezekwem · 2020-01-26T22:55:51Z

I just experienced thesame issue.
When running the query:
twitterscraper "@RealDonaldTrump OR @Potus -@ewarren -@SenWarren -@BernieSanders -@SenSanders -@JoeBiden since:2019-01-01 until:2020-01-26" -o "..\data\raw\0_@realDonaldTrump.json" -l 10000000'

I get the error:
ERROR: Failed to parse JSON "Expecting value: line 1 column 1 (char 0)" while requesting "https://twitter.com/i/search/timeline?f=tweets&vertical=default&include_available_features=1&include_entities=1&reset_error_state=false&src=typd&max_position=thGAVUV0VFVBaAwLyRwdbQ8iEWgMC0nf2N0_IhEjUAFQAlAFUAFQAA&q=@realDonaldTrump%20OR%20@POTUS%20-@ewarren%20-@SenWarren%20-@BernieSanders%20-@SenSanders%20-@JoeBiden%20since%3A2019-01-01%20until%3A2020-01-26%20since%3A2019-05-18%20until%3A2020-01-26&l=None"
Traceback (most recent call last):
File "c:\anaconda\lib\site-packages\twitterscraper\query.py", line 99, in query_single_page
json_resp = response.json()
File "c:\anaconda\lib\site-packages\requests\models.py", line 892, in json
return complexjson.loads(self.text, **kwargs)
File "c:\anaconda\lib\json_init_.py", line 354, in loads
return _default_decoder.decode(s)
File "c:\anaconda\lib\json\decoder.py", line 339, in decode
obj, end = self.raw_decode(s, idx=_w(s, 0).end())
File "c:\anaconda\lib\json\decoder.py", line 357, in raw_decode
raise JSONDecodeError("Expecting value", s, err.value) from None
json.decoder.JSONDecodeError: Expecting value: line 1 column 1 (char 0)

The query did generate the json file with some tweets but I don't know if it returned all the tweets.

panoptikum · 2020-05-27T14:42:28Z

same issue here.

twollnik · 2020-05-29T09:14:58Z

Having the same issue. I found out that the requests for which this error occurs have a status code 429 - Too Many Requests. Sleeping for a random number of seconds (between 1 and 20) after each request and setting the poolsize to 1 fixed the issue for me.

I am also using a different proxy provider to have more different proxies. I am using proxyscrape.com and iterating over about 300 different proxies (instead of just 20 (i think) like vanilla twitterscraper).

deqncho2 · 2020-05-30T02:00:54Z

Hi @twollnik , how do you sleep the requests? There's no option for it, it's just a single command (I can see the poolsize option).

twollnik · 2020-05-31T09:47:51Z

I added the following code to query.py right after the get request is issued. On my machine (Ubuntu 18.04 LTS) twitterscraper is installed under /home/USERNAME/.local/lib/python3.6/site-packages/twitterscraper/, so I just edited the file there. Note that you have to import sleep and numpy: from time import sleep; import numpy as np. Depending on where twitterscraper is installed and on your compute environment (maybe you are working on a shared server) you may need root priviledges to edit the package source files.

Anyway, here is the code that I inserted into query.py right after the twitter get request is issued:

        # be nice to twitter
        delays = [7, 4, 6, 2, 10, 19, 1]
        delay = np.random.choice(delays)
        sleep(delay)

If a request which is supposed to be non-html (JSON) returns an empty response or a html response, it serialization leads to an error. This error is catched with a try / except. This fixes issue taspinar/twitterscraper#93

taspinar added a commit that referenced this issue Feb 25, 2018

update query.py: refactor the querying

5977423

This change ensures that nothing is done if the response does not have status_code 200. Previously .json() method was called upon an empty response. Fixes #93

taspinar mentioned this issue Feb 25, 2018

Bugfix jsondecodeerror #96

Closed

taspinar closed this as completed Apr 19, 2018

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

JSONDecodeError #93

JSONDecodeError #93

eimis41 commented Feb 19, 2018

tredmill commented Feb 24, 2018

taspinar commented Feb 25, 2018

tredmill commented Feb 27, 2018

eimis41 commented Mar 4, 2018

hengruo commented Mar 5, 2018

taspinar commented Mar 5, 2018

taspinar commented Apr 19, 2018

evaezekwem commented Jan 26, 2020

panoptikum commented May 27, 2020

twollnik commented May 29, 2020 •

edited

deqncho2 commented May 30, 2020

twollnik commented May 31, 2020 •

edited

JSONDecodeError #93

JSONDecodeError #93

Comments

eimis41 commented Feb 19, 2018

tredmill commented Feb 24, 2018

taspinar commented Feb 25, 2018

tredmill commented Feb 27, 2018

eimis41 commented Mar 4, 2018

hengruo commented Mar 5, 2018

taspinar commented Mar 5, 2018

taspinar commented Apr 19, 2018

evaezekwem commented Jan 26, 2020

panoptikum commented May 27, 2020

twollnik commented May 29, 2020 • edited

deqncho2 commented May 30, 2020

twollnik commented May 31, 2020 • edited

twollnik commented May 29, 2020 •

edited

twollnik commented May 31, 2020 •

edited