Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

JSONDecodeError #93

Closed
eimis41 opened this issue Feb 19, 2018 · 12 comments
Closed

JSONDecodeError #93

eimis41 opened this issue Feb 19, 2018 · 12 comments

Comments

@eimis41
Copy link

eimis41 commented Feb 19, 2018

When running this:
twitterscraper "ethereum OR eth" -bd 2018-01-02 -ed 2018-01-15 -o bitcoin_tweets.json

I get this error:
ERROR:root:Failed to parse JSON "Expecting value: line 1 column 1 (char 0)" while requesting "https://twitter.com/i/search/timeline?f=tweets&vertical=default&include_available_features=1&include_entities=1&reset_error_state=false&src=typd&max_position=TWEET-949063965027569664-949067891449634816&q=ethereum%20OR%20eth%20since%3A2018-01-04%20until%3A2018-01-05&l=None".
Traceback (most recent call last):
File "c:\users....\appdata\local\programs\python\python36-32\lib\site-packages\twitterscraper\query.py", line 38, in query_single_page
json_resp = response.json()
File "c:\users....\appdata\local\programs\python\python36-32\lib\site-packages\requests\models.py", line 866, in json
return complexjson.loads(self.text, **kwargs)
File "c:\users....\appdata\local\programs\python\python36-32\lib\json_init_.py", line 354, in loads
return _default_decoder.decode(s)
File "c:\users....\appdata\local\programs\python\python36-32\lib\json\decoder.py", line 339, in decode
obj, end = self.raw_decode(s, idx=_w(s, 0).end())
File "c:\users....\appdata\local\programs\python\python36-32\lib\json\decoder.py", line 357, in raw_decode
raise JSONDecodeError("Expecting value", s, err.value) from None
json.decoder.JSONDecodeError: Expecting value: line 1 column 1 (char 0)

Any idea why?

@tredmill
Copy link

I have the same issue, will update if i can figure it out

taspinar added a commit that referenced this issue Feb 25, 2018
This change ensures that nothing is done if the response does not have status_code 200.
Previously .json() method was called upon an empty response.

Fixes #93
@taspinar
Copy link
Owner

Hi @tredmill @eimis41
I believe this PR #96 should fix this problem. Can you have a look at it?

@tredmill
Copy link

Thank you so much, this seems to fix the issue. I have not looked at the output thoroughly. The JSON error re-appears regularly but the script continues anyway. I am not yet sure if the extracted data is complete, although I will leave any additional remarks here.

@eimis41
Copy link
Author

eimis41 commented Mar 4, 2018

@tredmill @taspinar Thank you so much!!!

@hengruo
Copy link

hengruo commented Mar 5, 2018

Hi @taspinar
I met this problem and I found that ujson could decode the string correctly. Maybe you could import ujson library and change response.json() into ujson.loads(response.text).

@taspinar
Copy link
Owner

taspinar commented Mar 5, 2018

@hengruo Thanks for the tip. I'll implement it in the bugfix_jsondecodeerror branch and add it to the PR.
But with the json library instead of ujson since that one is already imported.

taspinar added a commit that referenced this issue Mar 21, 2018
If a request which is supposed to be non-html (JSON) returns an empty response or a html response,
it serialization leads to an error. This error is catched with a try / except.

This fixes issue #93
@taspinar
Copy link
Owner

This issue should be fixed with the latest version (0.6.2).

@evaezekwem
Copy link

I just experienced thesame issue.
When running the query:
twitterscraper "@RealDonaldTrump OR @Potus -@ewarren -@SenWarren -@BernieSanders -@SenSanders -@JoeBiden since:2019-01-01 until:2020-01-26" -o "..\data\raw\0_@realDonaldTrump.json" -l 10000000'

I get the error:
ERROR: Failed to parse JSON "Expecting value: line 1 column 1 (char 0)" while requesting "https://twitter.com/i/search/timeline?f=tweets&vertical=default&include_available_features=1&include_entities=1&reset_error_state=false&src=typd&max_position=thGAVUV0VFVBaAwLyRwdbQ8iEWgMC0nf2N0_IhEjUAFQAlAFUAFQAA&q=@realDonaldTrump%20OR%20@POTUS%20-@ewarren%20-@SenWarren%20-@BernieSanders%20-@SenSanders%20-@JoeBiden%20since%3A2019-01-01%20until%3A2020-01-26%20since%3A2019-05-18%20until%3A2020-01-26&l=None"
Traceback (most recent call last):
File "c:\anaconda\lib\site-packages\twitterscraper\query.py", line 99, in query_single_page
json_resp = response.json()
File "c:\anaconda\lib\site-packages\requests\models.py", line 892, in json
return complexjson.loads(self.text, **kwargs)
File "c:\anaconda\lib\json_init_.py", line 354, in loads
return _default_decoder.decode(s)
File "c:\anaconda\lib\json\decoder.py", line 339, in decode
obj, end = self.raw_decode(s, idx=_w(s, 0).end())
File "c:\anaconda\lib\json\decoder.py", line 357, in raw_decode
raise JSONDecodeError("Expecting value", s, err.value) from None
json.decoder.JSONDecodeError: Expecting value: line 1 column 1 (char 0)

The query did generate the json file with some tweets but I don't know if it returned all the tweets.

@panoptikum
Copy link

same issue here.

@twollnik
Copy link
Contributor

twollnik commented May 29, 2020

Having the same issue. I found out that the requests for which this error occurs have a status code 429 - Too Many Requests. Sleeping for a random number of seconds (between 1 and 20) after each request and setting the poolsize to 1 fixed the issue for me.

I am also using a different proxy provider to have more different proxies. I am using proxyscrape.com and iterating over about 300 different proxies (instead of just 20 (i think) like vanilla twitterscraper).

@deqncho2
Copy link

Hi @twollnik , how do you sleep the requests? There's no option for it, it's just a single command (I can see the poolsize option).

@twollnik
Copy link
Contributor

twollnik commented May 31, 2020

I added the following code to query.py right after the get request is issued. On my machine (Ubuntu 18.04 LTS) twitterscraper is installed under /home/USERNAME/.local/lib/python3.6/site-packages/twitterscraper/, so I just edited the file there. Note that you have to import sleep and numpy: from time import sleep; import numpy as np. Depending on where twitterscraper is installed and on your compute environment (maybe you are working on a shared server) you may need root priviledges to edit the package source files.

Anyway, here is the code that I inserted into query.py right after the twitter get request is issued:

        # be nice to twitter
        delays = [7, 4, 6, 2, 10, 19, 1]
        delay = np.random.choice(delays)
        sleep(delay)

meticulousfan added a commit to meticulousfan/scraping-site that referenced this issue Aug 19, 2022
If a request which is supposed to be non-html (JSON) returns an empty response or a html response,
it serialization leads to an error. This error is catched with a try / except.

This fixes issue taspinar/twitterscraper#93
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

8 participants