New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Only getting small amount of data before midnight #126
Comments
What is the command you're running? |
twitterscraper Bitcoin -p 100 --csv -bd 2017-10-13 -ed 2017-12-31 --lang en -o Bitcoin.csv |
What version are you running? Others have experienced issues with missing results prior to the fix applied to If you're running |
My version is 0.7.1 |
Okay, please run twice and share results, I will compare with my results and attempt to debug. |
I am running "twitterscraper Bitcoin -p 100 --csv -bd 2017-10-13 -ed 2017-10-20 --lang en -o Bitcoin.csv" now, sometimes it meets a parsing json error, then it stops scrapping that day, just returning several minutes tweets before midnight. |
I can confirm that this is an issue on my end as well. I am getting non-deterministic failures. Looking into it. |
I guess this is due to twitter block scraping requests...? |
I don't think twitter is blocking scraping requests, I think we're using an API in a way it's not intended to be used and they're not giving any effort to support our use case. Anyways, I have found a workaround for two issues Issue 1Summary of issue:Sometimes too few or no results are returned for a query Observations:*Headers ###Fix: remove bad headers Issue 2:Summary of issue:for some queries, involving datespans greater than a day (e.g. Observations:for the command Note: the second run having 5 fewer results is likely due to deleted tweets and I'm not going to worry about that. Fix:currently debugging |
@taspinar could you shed some light on this? Why did you set the
? Is this from some API or client you used? When I run
So the issue is that it doesn't like this reload URL generated based on the tweets given. However when I run it in browser and capture the requests it shows me these:
These requests gave me the desired responses, but for some reason your URL isn't. Either because
Solutions to this are
|
Observation: these user agents work for me
However, none of the headers currently in
Very strange... how can |
@shenyifan17 please try this pull request, it should fix it #126 |
@lapp0 This is the standard request issues by Twitter to fetch a new batch of tweets. See image. I don't think there is anything wrong with the RELOAD_URL. |
How many tweets do you via your internet browser if you go to https://twitter.com/search?q=Bitcoin%20since%3A2016-10-20%20until%3A2016-10-21&src=typd ? PS: If I search for https://twitter.com/search?q=Bitcoin%20since%3A2016-10-20%20until%3A2016-10-22&src=typd I get about 2506 results and when I search for https://twitter.com/search?q=Bitcoin%20since%3A2016-10-21%20until%3A2016-10-22&src=typd I get about 2500 results. So I think you picked a day ( 2016-10-20 ) on which there were only five tweets about Bitcoin. |
I am trying to scrape tweets about Bitcoin from November to April. However, the data I obtained only contains those ones before midnight, looks like this:
which misses the majority of the tweets...
I wonder anyone has met the same issue
The text was updated successfully, but these errors were encountered: