New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
no results... #115
Comments
I'm getting this problem and I was not getting this problem previously as well. Currently looking into it, twitter probably introduced changes that breaks this. I'm aware of other software which twitter's recent changes broke as well. Full traceback:
Worth mentioning that results exist some of the time, and this is a warning, not an error that halts the program. |
@lapp0 Does you mean that Twitter is creating devices that block crawl bots? |
What I found out so far is that some of the time, Twitter responds with a html page (some kind of 404 / error page) to such requests as above (which should only contain a json file). I have created a new branch where the separate try / except statement for JSONDecodeError is removed. This fix should result in a behavior of retrying the same request with a recursive call to the |
I got a similar error and I wonder if this is related to the one above? I ran this search
|
FYI, I downloaded and installed the jsondecodeerror_bugfix branch. I did get some results but no where near the quantity I would have expected. Running the command below returned no results...
|
Can you in addition force the user agent to the one specified in issue #90 |
Sure, how do I do this? Is it something to add to the command, or do I need to edit a file? |
Here's what you need to do to make this work for now. First, if you installed this with pip, uninstall it.
Check out this repo, checkout this branch
Modify twitterscraper/twitterscraper/query.py by changing the HEADER_LIST to
Then install it
|
I had a similar issue - sometimes I wouldn't get any results, sometime I'd get only 20 or 60 messages in addition to that JSON error. Forcing headers (as per @bengarvey 's instructions) fixed the issue. |
Well, that worked yesterday. Not today :) |
hmmm.... it's still working for me on low volume searches... |
sad news... changing header list doesn't work anymore.. Sorry still works very well!! |
discussion about using the new useragent upstream: fake-useragent/fake-useragent#68 branch I'm working off up applying @bengarvey's fix https://github.com/lapp0/twitterscraper/tree/jsondecodeerror_bugfix_new_chrome_headers |
Edit: ignore below, it's probably not important. I just realized that taspinar's retry functionality is working and this is a non-deterministic failure which usually is fine Here is the
|
Does forcing the US to be |
@taspinar setting that as my HEADER_LIST seems to have worked. |
Worked all day yesterday, but stopped working again this morning. |
@bengarvey still working for me. What's your query? What's your error? |
No results returned for even the simplest query using the HEADER_LIST you posted
|
Seems you've had it stop working for you, then start working again multiple times. Is there some common factor in those times you've had it not work for you? Can you try running from the cloud? I'm getting results for your query btw. |
Not sue. It worked for a few days when I started using |
I made a tiny change to the header and it's working again. I think my header/IP was blocked, maybe? |
Interesting. Could you post the |
I believe that most of the 'not being able to get all tweets' issues are caused by the useragent provided by fake_useragent. Once this PR is merged, the newest versions should no longer have these issues. Can you guys have a look at the PR? |
* query.py : remove fake_useragent, move separate try / except - useragents will no longer be generated with the fake_useragent package. At the moment seven of the most popular useragent strings are written hardcoded in query.py In later stages this should be done via a separate module. -By removing the separate try / except for the JsonDecodeError, each process now also continues with retrying to get the data instead of exiting. This fixes #118, #115, #90
good day, I have the following problem. Now update to version 0.7.0.1 and execute the following query: twitterscraper Trump -l 100 -bd 2017-01-01 -ed 2017-06-01 -o tweets2.json It throws me the following errors: Traceback (most recent call last): Yesterday I was bringing Tweets and today not, so update to the new version. |
@CamiloVeloz this is due to my pull request here #117. I didn't realize it was incompatible with python2. You can fix it by installing python3, or reverting the update. @taspinar is this project intended to retain python2 compatability? If so, I could fix it so it works with python2. |
@CamiloVeloz I fixed the python 3 problem like this... First I uninstalled the old version
Then I got the new one going like so...
Having run my test search
|
Hmm, you should have gotten a log line saying It should retry 10 times by default. I'm assuming you didn't omit any log lines. Can you check whether you are on latest master? |
I may not have upgraded successfully on my test machine but on my live machines, it's working much much better... |
@lapp0 , If it is not too much trouble... |
@taspinar this should be closed |
Hi @taspinar, I don't know if you have received any messages regarding this error lately, but I have been having close to the same issues as stated above. I was going to try and manually input the user agent, but as I can understand the variable has been changed in a later version. I have created a pastebin with the results by running a command from the documentation. I have removed duplicates as the original paste succeeded the free trial limits of PasteBin. Link Best regards |
Update: I've tried changing my IP, knowing that Twitter has some odd ways of blocking. Seemingly it changed the amount of tweets I have been receiving, however, after a short while I still end up getting 0. INFO: Got 120 tweets (120 new). Update: Tested on two devices running 1809 WIN 10. No difference. Also tried running with limits/no limits, adding additional poolsize/running without set pool size. My main issue with this is not so much that I do not get a large enough data pool, but that the datapool is scattered extremely between dates. For instance, given a dataset of 100.000 tweets over 100 days I will have 3 days of 33.000 tweets and 97 days of nothing. |
hello,
|
@lubhaniagarwal I think you should use taspinars branch, mine has last been updated in 2018, and the changes were already merged into taspinars. |
Hey,
Can you please help me with steps like how to proceed .
It will really be helpful for me.
Thank you in advance.
…On Thu, Jun 4, 2020, 23:42 lapp0 ***@***.***> wrote:
@lubhaniagarwal <https://github.com/lubhaniagarwal> I think you should
use taspinars branch, mine has last been updated in 2018.
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#115 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AKUJE5CX24VWYTTFTXET3W3RU7PY5ANCNFSM4FBQ75DQ>
.
|
@lubhaniagarwal |
* query.py : remove fake_useragent, move separate try / except - useragents will no longer be generated with the fake_useragent package. At the moment seven of the most popular useragent strings are written hardcoded in query.py In later stages this should be done via a separate module. -By removing the separate try / except for the JsonDecodeError, each process now also continues with retrying to get the data instead of exiting. This fixes taspinar/twitterscraper#118, taspinar/twitterscraper#115, taspinar/twitterscraper#90
ERROR: Failed to parse JSON "Expecting value: line 1 column 1 (char 0)" while requesting "https://twitter.com/i/search/timeline?f=tweets&vertical=default&include_available_features=1&include_entities=1&reset_error_state=false&src=typd&max_position=TWEET-838177224989753344-838177234682773505&q=trump%20since%3A2016-07-25%20until%3A2017-03-05&l=None"
I don't know why suddenly I'm getting into this problem.
The text was updated successfully, but these errors were encountered: