Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

no results... #115

Closed
usajameskwon opened this issue May 24, 2018 · 38 comments
Closed

no results... #115

usajameskwon opened this issue May 24, 2018 · 38 comments

Comments

@usajameskwon
Copy link

ERROR: Failed to parse JSON "Expecting value: line 1 column 1 (char 0)" while requesting "https://twitter.com/i/search/timeline?f=tweets&vertical=default&include_available_features=1&include_entities=1&reset_error_state=false&src=typd&max_position=TWEET-838177224989753344-838177234682773505&q=trump%20since%3A2016-07-25%20until%3A2017-03-05&l=None"

I don't know why suddenly I'm getting into this problem.

@lapp0
Copy link
Collaborator

lapp0 commented May 24, 2018

I'm getting this problem and I was not getting this problem previously as well. Currently looking into it, twitter probably introduced changes that breaks this. I'm aware of other software which twitter's recent changes broke as well.

Full traceback:

May 24 03:41:32 harvester harvester[1648]: ERROR:root:Failed to parse JSON "Expecting value: line 1 column 1 (char 0)" while requesting "https://twitter.com/i/search/timeline?f=tweets&vertical=default&include_available_features=1&includ>
May 24 03:41:32 harvester harvester[1648]: Traceback (most recent call last):
May 24 03:41:32 harvester harvester[1648]:   File "/nix/store/mxwfji6aas89jx86ilgplb2pkc68jaxq-python3.6-twitterscraper-nix-0.7.0/lib/python3.6/site-packages/twitterscraper/query.py", line 49, in query_single_page
May 24 03:41:32 harvester harvester[1648]:     json_resp = json.loads(response.text)
May 24 03:41:32 harvester harvester[1648]:   File "/nix/store/ljhgdba6n8ag6f8clpi4m9zizm7b8mx3-python3-3.6.5/lib/python3.6/json/__init__.py", line 354, in loads
May 24 03:41:32 harvester harvester[1648]:     return _default_decoder.decode(s)
May 24 03:41:32 harvester harvester[1648]:   File "/nix/store/ljhgdba6n8ag6f8clpi4m9zizm7b8mx3-python3-3.6.5/lib/python3.6/json/decoder.py", line 339, in decode
May 24 03:41:32 harvester harvester[1648]:     obj, end = self.raw_decode(s, idx=_w(s, 0).end())
May 24 03:41:32 harvester harvester[1648]:   File "/nix/store/ljhgdba6n8ag6f8clpi4m9zizm7b8mx3-python3-3.6.5/lib/python3.6/json/decoder.py", line 357, in raw_decode
May 24 03:41:32 harvester harvester[1648]:     raise JSONDecodeError("Expecting value", s, err.value) from None
May 24 03:41:32 harvester harvester[1648]: json.decoder.JSONDecodeError: Expecting value: line 1 column 1 (char 0)

Worth mentioning that results exist some of the time, and this is a warning, not an error that halts the program.

@SpellOnYou
Copy link

SpellOnYou commented May 25, 2018

@lapp0 Does you mean that Twitter is creating devices that block crawl bots?

@taspinar
Copy link
Owner

What I found out so far is that some of the time, Twitter responds with a html page (some kind of 404 / error page) to such requests as above (which should only contain a json file).

I have created a new branch where the separate try / except statement for JSONDecodeError is removed. This fix should result in a behavior of retrying the same request with a recursive call to the query_single_page instead of breaking the process. Usually, the second time Twitter does return the correct response.

@3ruce
Copy link

3ruce commented May 25, 2018

I got a similar error and I wonder if this is related to the one above?

I ran this search twitterscraper bread -bd 2018-04-23 -ed 2018-05-23 --output=tweets-01.json which did return some tweets but then running this search afterwards returned zero tweets twitterscraper bread -bd 2018-03-23 -ed 2018-05-23 -output=csv-01.csv

ERROR: Failed to parse JSON "Expecting value: line 1 column 1 (char 0)" while requesting "https://twitter.com/i/search/timeline?f=tweets&vertical=default&include_available_features=1&include_entities=1&reset_error_state=false&src=typd&max_position=TWEET-997990030411956226-997990270372274177&q=bread%20since%3A2018-05-18%20until%3A2018-05-20&l=None"
Traceback (most recent call last):
  File "/usr/local/lib/python3.5/dist-packages/twitterscraper/query.py", line 49, in query_single_page
    json_resp = json.loads(response.text)
  File "/usr/lib/python3.5/json/__init__.py", line 319, in loads
    return _default_decoder.decode(s)
  File "/usr/lib/python3.5/json/decoder.py", line 339, in decode
    obj, end = self.raw_decode(s, idx=_w(s, 0).end())
  File "/usr/lib/python3.5/json/decoder.py", line 357, in raw_decode
    raise JSONDecodeError("Expecting value", s, err.value) from None
json.decoder.JSONDecodeError: Expecting value: line 1 column 1 (char 0)

@3ruce
Copy link

3ruce commented May 25, 2018

FYI, I downloaded and installed the jsondecodeerror_bugfix branch. I did get some results but no where near the quantity I would have expected. Running the command below returned no results...

twitterscraper trump -bd 2018-05-23 -ed 2018-05-24 --output=tweets-01.json

@taspinar
Copy link
Owner

Can you in addition force the user agent to the one specified in issue #90

@3ruce
Copy link

3ruce commented May 25, 2018

Sure, how do I do this? Is it something to add to the command, or do I need to edit a file?

@bengarvey
Copy link

bengarvey commented May 27, 2018

Here's what you need to do to make this work for now.

First, if you installed this with pip, uninstall it.

pip uninstall twitterscraper

Check out this repo, checkout this branch

git clone git@github.com:taspinar/twitterscraper.git
cd twitterscraper
git checkout jsondecodeerror_bugfix

Modify twitterscraper/twitterscraper/query.py by changing the HEADER_LIST to

HEADERS_LIST = ['Mozilla/5.0 (Windows NT 10.0) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/40.0.2214.93 Safari/537.36']

Then install it

python setup.py install

@jotbasan
Copy link

I had a similar issue - sometimes I wouldn't get any results, sometime I'd get only 20 or 60 messages in addition to that JSON error. Forcing headers (as per @bengarvey 's instructions) fixed the issue.

@bengarvey
Copy link

Well, that worked yesterday. Not today :)

@3ruce
Copy link

3ruce commented May 28, 2018

hmmm.... it's still working for me on low volume searches...

@usajameskwon
Copy link
Author

usajameskwon commented May 28, 2018

sad news... changing header list doesn't work anymore..

Sorry still works very well!!

@lapp0
Copy link
Collaborator

lapp0 commented May 29, 2018

discussion about using the new useragent upstream: fake-useragent/fake-useragent#68

branch I'm working off up applying @bengarvey's fix https://github.com/lapp0/twitterscraper/tree/jsondecodeerror_bugfix_new_chrome_headers

@lapp0
Copy link
Collaborator

lapp0 commented May 29, 2018

Edit: ignore below, it's probably not important. I just realized that taspinar's retry functionality is working and this is a non-deterministic failure which usually is fine

Here is the response.text that the twitterscraper is currently attempting to json-encode.

<!DOCTYPE html>
<html dir="ltr" lang="en">
<meta charset="utf-8" />
<meta name="viewport" content="width=device-width,initial-scale=1,maximum-scale=1,user-scalable=0" />
<link rel="preconnect" href="//abs-0.twimg.com" />
<link rel="preconnect" href="//api.twitter.com" />
<link rel="preconnect" href="//o.twimg.com" />
<link rel="preconnect" href="//pbs.twimg.com" />
<link rel="preconnect" href="//t.co" />
<link rel="preconnect" href="//video.twimg.com" />
<link rel="dns-prefetch" href="//abs-0.twimg.com" />
<link rel="dns-prefetch" href="//api.twitter.com" />
<link rel="dns-prefetch" href="//o.twimg.com" />
<link rel="dns-prefetch" href="//pbs.twimg.com" />
<link rel="dns-prefetch" href="//t.co" />
<link rel="dns-prefetch" href="//video.twimg.com" />
<link rel="preload" as="script" crossorigin="anonymous" href="https://abs-0.twimg.com/responsive-web/web/ltr/runtime.2rvester[1437]: <link rel="preload" as="script" crossorigin="anonymous" href="https://abs-0.twimg.com/responsive-web/we:36 harvester harvester[1437]: <link rel="preload" as="script" crossorigin="anonymous" href="https://abs-0.twimg.com/r>
<link rel="preload" as="script" crossorigin="anonymous" href="https://abs-0.twimg.com/responsive-web/web/ltr/main.d53frvester[1437]: <meta property="fb:app_id" content="2231777543" />
<meta property="og:site_name" content="Twitter" />
<meta name="google-site-verification" content="V0yIS0Ec_o3Ii9KThrCoMCkwTYMMJ_JYx_RSaGhFYvw" />
<link rel="manifest" href="/manifest.json" />
<link rel="icon" sizes="192x192" href="https://abs-0.twimg.com/responsive-web/web/ltr/icon-default.882fa4ccf6539401.png" />
<link rel="apple-touch-icon" sizes="192x192" href="https://abs-0.twimg.com/responsive-web/web/ltr/icon-ios.a9cd885bccbcaf2f.png" />
<meta name="mobile-web-app-capable" content="yes" />
<meta name="apple-mobile-web-app-title" content="Twitter Lite" />
<meta name="apple-mobile-web-app-status-bar-style" content="white" />
<meta name="theme-color" content="#ffffff" />
<body>
  <noscript>
    <form action="https://mobile.twitter.com/i/nojs_router?path=%2Fi%2Fsearch%2Ftimeline%3Ff%3Dtweets%26vertical%3Ddefault%26include_available_features%3D1%26include_entities%3D1%26reset_error_state%3Dfalse%26src%3Dtypd%26max_position%3DTWEET-525762382-605336192%26q%3Dfoobar%2520since%253A2007-06-09%2520until%253A2008-01-17%26l%3D" method="POST" style="background-color: #fff; position: fixed; top: 0; left: 0; right: 0; bottom: 0; z-index: 9999;">
      <div style="font-size: 18px; font-family: Helvetica,sans-serif; line-height: 24px; margin: 10%; width: 80%;">
        <p>We've detected that JavaScript is disabled in your browser. Would you like to proceed to legacy Twitter?</p>
        <p style="margin: 20px 0;">
          <button type="submit" style="background-color: #1da1f2; border-radius: 100px; border: none; box-shadow: none; color: #fff; cursor: pointer; font-size: 14px; font-weight: bold; line-height: 20px; padding: 6px 16px;">Yes</button>
        </p>
      </div>
    </form>
  </noscript>
  <div id="react-root" style="height:100%"><div><div aria-label="Loading…" style="background-color:#fff;top:0;left:0;right:0;bottom:0;position:fixed"><svg style="display:inline-block;fill:currentcolor;height:72px;max-width:100%;position:absolute;user-select:none;vertical-align:text-bottom;color:#1da1f2;width:72px;top:0;left:0;right:0;bottom:0;margin:auto" viewBox="0 0 24 24"><g><path d="M23.643 4.937c-.835.37-1.732.62-2.675.733a4.67 4.67 0 0 0 2.048-2.578 9.3 9.3 0 0 1-2.958 1.13 4.66 4.66 0 0 0-7.938 4.25 13.229 13.229 0 0 1-9.602-4.868c-.4.69-.63 1.49-.63 2.342A4.66 4.66 0 0 0 3.96 9.824a4.647 4.647 0 0 1-2.11-.583v.06a4.66 4.66 0 0 0 3.737 4.568 4.692 4.692 0 0 1-2.104.08 4.661 4.661 0 0 0 4.352 3.234 9.348 9.348 0 0 1-5.786 1.995 9.5 9.5 0 0 1-1.112-.065 13.175 13.175 0 0 0 7.14 2.093c8.57 0 13.255-7.098 13.255-13.254 0-.2-.005-.402-.014-.602a9.47 9.47 0 0 0 2.323-2.41z"></path></g></svg></div><div id="failureMessage" style="background-color:#F5F8FA;top:0;left:0;right:0;bottom:0;position:fixed;display:none;z-index:2"><div style="position:absolute;height:200px;top:0;left:0;right:0;bottom:0;margin:auto;text-align:center;line-height:1.3125;font-size:14px;color:#14171A;font-family:-apple-system, BlinkMacSystemFont, &quot;Segoe UI&quot;, Roboto, Ubuntu, &quot;Helvetica Neue&quot;, sans-serif"><svg style="display:block;fill:currentcolor;height:72px;max-width:100%;position:relative;user-select:none;vertical-align:text-bottom;color:#657786;width:72px;margin:0 auto 24px" viewBox="0 0 24 24"><g><circle cx="12.025" cy="16.437" r="1.281"></circle><path d="M14.39 7.194a.495.495 0 0 0-.4-.2h-3.928a.494.494 0 0 0-.4.2.496.496 0 0 0-.08.442l1.814 6.098a.5.5 0 0 0 .48.357h.298a.501.501 0 0 0 .48-.356l1.813-6.098a.495.495 0 0 0-.077-.442z"></path><path d="M12 22.75C6.072 22.75 1.25 17.928 1.25 12S6.072 1.25 12 1.25 22.75 6.072 22.75 12 17.928 22.75 12 22.75zm0-20C6.9 2.75 2.75 6.9 2.75 12S6.9 21.25 12 21.25s9.25-4.15 9.25-9.25S17.1 2.75 12 2.75z"></path></g></svg><p>A problem was encountered trying to load the page.</p><a href="javascript:reloadFromScriptError()" style="background-color:#1DA1F2;border-radius:0.3em;border:0;padding:0.5em 1em;height:1.5em;display:inline-block;margin:1em 0;color:#fff;text-decoration:none;font-weight:bold"><svg style="display:inline-block;fill:currentcolor;height:1.5em;max-width:100%;position:relative;user-select:none;vertical-align:middle" viewBox="0 0 24 24"><g><path d="M12 2C6.486 2 2 6.486 2 12a.75.75 0 0 0 1.5 0c0-4.687 3.813-8.5 8.5-8.5s8.5 3.813 8.5 8.5-3.813 8.5-8.5 8.5c-2.886 0-5.576-1.5-7.13-3.888l2.983.55a.75.75 0 1 0 .274-1.474l-4.663-.86a.746.746 0 0 0-.88.647l-.57 4.706a.749.749 0 1 0 1.488.181l.32-2.63C5.673 20.36 8.728 22 12 22c5.514 0 10-4.486 10-10S17.514 2 12 2z"></path></g></svg> Retry</a></div></div></div></div>
<script>
  window.__INITIAL_STATE__ = {"optimist":[],"toasts":[],"entities":{"users":{"entities":{},"errors":{},"fetchStatus":{}}},"session":{"country":"XX","guestId":"152763591683681529","language":"en","oneFactorLoginEligibility":{"fetchStatus":"none"}},"analytics":{},"featureSwitch":{"config":{"account_country_setting_countries_whitelist":{"value":["ad","ae","af","ag","ai","al","am","ao","ar","as","at","au","aw","ax","az","ba","bb","bd","be","bf","bg","bh","bi","bj","bl","bm","bn","bo","bq","br","bs","bt","bv","bw","by","bz","ca","cc","cd","cf","cg","ch","ci","ck","cl","cm","co","cr","cu","cv","cw","cx","cy","cz","de","dj","dk","dm","do","dz","ec","ee","eg","er","es","et","fi","fj","fk","fm","fo","fr","ga","gb","gd","ge","gf","gg","gh","gi","gl","gm","gn","gp","gq","gr","gs","gt","gu","gw","gy","hk","hn","hr","ht","hu","id","ie","il","im","in","io","iq","ir","is","it","je","jm","jo","jp","ke","kg","kh","ki","km","kn","kr","kw","ky","kz","la","lb","lc","li","lk","lr","ls","lt","lu","lv","ly","ma","mc","md","me","mf","mg","mh","mk","ml","mn","mo","mp","mq","mr","ms","mt","mu","mv","mw","mx","my","mz","na","nc","ne","nf","ng","ni","nl","no","np","nr","nu","nz","om","pa","pe","pf","pg","ph","pk","pl","pm","pn","pr","ps","pt","pw","py","qa","re","ro","rs","ru","rw","sa","sb","sc","se","sg","sh","si","sk","sl","sm","sn","so","sr","st","sv","sx","sz","tc","td","tf","tg","th","tj","tk","tl","tm","tn","to","tr","tt","tv","tw","tz","ua","ug","us","uy","uz","va","vc","ve","vi","vn","vu","wf","ws","xk","ye","yt","za","zm","zw"]},"live_event_hero_description_fields_enabled":{"value":true},"live_event_hero_ugm_attribution_enabled":{"value":false},"live_event_timeline_default_refresh_rate_interval_seconds":{"value":30},"live_event_timeline_minimum_refresh_rate_interval_seconds":{"value":10},"live_event_timeline_server_controlled_refresh_rate_enabled":{"value":true},"moment_annotations_enabled":{"value":false},"responsive_web_allow_switch_to_ms":{"value":false},"responsive_web_birthdays_enabled":{"value":false},"responsive_web_broadcasts_page_enabled":{"value":false},"responsive_web_composer_v2_enabled":{"value":false},"responsive_web_composer_v2_modal_compose_enabled":{"value":false},"responsive_web_desktop_bookmarks_enabled":{"value":false},"responsive_web_dm_livepipeline_enabled":{"value":false},"responsive_web_dm_reporting_enabled":{"value":false},"responsive_web_dm_typing_indicator_enabled":{"value":false},"responsive_web_eu_countries":{"value":["at","be","bg","ch","cy","cz","de","dk","ee","es","fi","fr","gb","gr","hr","hu","ie","is","it","li","lt","lu","lv","mt","nl","no","pl","pt","ro","se","si","sk"]},"responsive_web_event_card_enabled":{"value":false},"responsive_web_explore_feedback_actions_enabled":{"value":false},"responsive_web_feedback_link":{"value":""},"responsive_web_fetch_hashflags_on_boot":{"value":true},"responsive_web_gdpr_age_gate":{"value":true},"responsive_web_gdpr_twitter_archive":{"value":false},"responsive_web_gdpr_periscope_archive":{"value":false},"responsive_web_gdpr_logged_out_banner":{"value":true},"responsive_web_gdpr_change_country_ocf_flow":{"value":true},"responsive_web_graphql_verify_credentials_enabled":{"value":false},"responsive_web_graphql_verify_credentials_server_enabled":{"value":false},"responsive_web_htl_compose_prompt":{"value":false},"responsive_web_inline_video_player_enabled":{"value":false},"responsive_web_microsoft_jump_links":{"value":false},"responsive_web_ntab_verified_mentions_vit_internal_dogfood":{"value":false},"responsive_web_ocf_enabled":{"value":true},"responsive_web_report_page_not_found":{"value":false},"responsive_web_reported_tweet_tombstones_enabled":{"value":false},"responsive_web_search_filters_enabled":{"value":true},"responsive_web_settings_contacts_dashboard_enabled":{"value":false},"responsive_web_settings_email_notifications_enabled":{"value":true},"responsive_web_settings_facebook_connect_enabled":{"value":false},"responsive_web_settings_login_verification_enabled":{"value":false},"responsive_web_settings_notif_v2_push":{"value":true},"responsive_web_settings_notif_v2_sms":{"value":true},"responsive_web_settings_nsfw_user_enabled":{"value":true},"responsive_web_settings_password_applications_info_enabled":{"value":false},"responsive_web_settings_sessions_dashboard_enabled":{"value":false},"responsive_web_settings_u2f_security_key_enabled":{"value":false},"responsive_web_settings_trends_enabled":{"value":true},"responsive_web_transform_virtual_scroller":{"value":true},"responsive_web_tweet_source_timeline_enabled":{"value":false},"responsive_web_tweet_source_tweet_detail_enabled":{"value":false},"responsive_web_unified_cards":{"value":"no"},"responsive_web_urt_list_tweets_enabled":{"value":true},"responsive_web_urt_show_cover_enabled":{"value":false},"responsive_web_verification_v2_enabled":{"value":false},"responsive_web_windows_oauth_login":{"value":"auto_login"},"scribe_api_error_sample_size":{"value":0},"scribe_api_sample_size":{"value":100},"scribe_cdn_host_list":{"value":["si0.twimg.com","si1.twimg.com","si2.twimg.com","si3.twimg.com","a0.twimg.com","a1.twimg.com","a2.twimg.com","a3.twimg.com","abs.twimg.com","amp.twimg.com","o.twimg.com","pbs.twimg.com","pbs-eb.twimg.com","pbs-ec.twimg.com","pbs-v6.twimg.com","pbs-h1.twimg.com","pbs-h2.twimg.com","video.twimg.com","platform.twitter.com","cdn.api.twitter.com","ton.twimg.com","v.cdn.vine.co","mtc.cdn.vine.co","edge.vncdn.co","mid.vncdn.co"]},"scribe_cdn_sample_size":{"value":50},"scribe_enabled":{"value":true},"traffic_redirect_5347_hostmap":{"value":[]},"user_display_name_max_limit":{"value":50},"responsive_web_logged_out_homepage_6952":{"value":"treatment"},"responsive_web_night_mode_7836":{"value":"enabled"},"responsive_web_smart_lock_7159":{"value":"all"}},"impressions":{},"featureSetToken":"9dee8851ebe378f146b9ea25f610a73259e043ba","isLoaded":true,"isLoading":false,"keysRead":{},"settingsVersion":"ac9b6bc39ab73f6a95ad5c2c5765b7d5"},"typeaheadUsers":{"fetchStatus":"none","users":{},"blacklist":{},"lastUpdated":0,"index":{}},"blockedUsers":{"userIds":[],"fetchStatus":"none"},"settings":{"local":{"nextPushCheckin":0,"shouldAutoPlayGif":false},"remote":{"settings":{"display_sensitive_media":false},"fetchStatus":"none"},"dataSaver":{},"transient":{"dtabBarInfo":{"dtabAll":null,"dtabRweb":null,"hide":false},"loginPromptShown":false}},"devices":{"browserPush":{"fetchStatus":"none","pushNotificationsPrompt":{"count":0,"dismissed":false,"fetchStatus":"none"},"subscribed":false,"supported":null},"devices":{"data":{"emails":[],"phone_numbers":[]},"fetchStatus":"none"},"notificationSettings":{"_pushSettings":{},"_pushSettingsTemplate":{},"_smsSettings":{"error":null,"fetchStatus":"none"},"_smsSettingsTemplate":{}}}};
  window.__META_DATA__ = {"env":"prod","isLoggedIn":false,"isRTL":false};
</script>
<script>
  document.cookie = decodeURIComponent("gt=1001603747062124544; Max-Age=10800; Domain=.twitter.com; Path=/");
</script>
<script>
  window.webpackChunkManifest = {"bundle.AccessInterstitial":"bundle.AccessInterstitial.0a886e55d07fd9d1.js","loader.DashMenu":"loader.DashMenu.f41af6a8aed64560.js","loader.SearchBox":"loader.SearchBox.f33c1d8a9d2f0779.js","loader.WideLayout":"loader.WideLayout.0e65294f86f6c39b.js","loader.PushNotificationsPrompt":"loader.PushNotificationsPrompt.c31086d53bc3ae52.js","bundle.Account":"bundle.Account.0d737ba8cea6cc52.js","bundle.Bookmarks":"bundle.Bookmarks.9ddaf670ca48ccc1.js","loader.AppModules":"loader.AppModules.54a034314566236b.js","loader.BroadcastCard":"loader.BroadcastCard.812462a4ae302154.js","loader.BroadcastPlayer":"loader.BroadcastPlayer.ed281794d61fd3ef.js","loader.VideoPlayer":"loader.VideoPlayer.8ef5e67114d25f39.js","hls.js":"hls.js.bf79fb6a401c797b.js","loader.EntryTombstone":"loader.EntryTombstone.0e0a5df360bd3f79.js","loader.FeedbackSheet":"loader.FeedbackSheet.2e826a17f3a963e0.js","loader.SignupModule":"loader.SignupModule.b458c376fac2e64c.js","loader.TimelineGap":"loader.TimelineGap.d8413106cdcee4e3.js","loader.Trends":"loader.Trends.3b820fa3ca99729b.js","loader.TweetCurationActionSheet":"loader.TweetCurationActionSheet.a8ce51a07d0ef679.js","loader.TweetPhotos":"loader.TweetPhotos.0a542d35ee03636c.js","loader.UnifiedCard":"loader.UnifiedCard.faf7b0d5d441aa83.js","ondemand.InlinePlayer":"ondemand.InlinePlayer.8b36fc73ffde364e.js","ondemand.AccessInterstitial":"ondemand.AccessInterstitial.bbe69b7e2f0224f8.js","bundle.Broadcast":"bundle.Broadcast.eeee777949be1b78.js","bundle.Collection":"bundle.Collection.177ed6760829194f.js","bundle.Compose":"bundle.Compose.3513c1ab3c6aca9f.js","ondemand.MicrosoftInterface":"ondemand.MicrosoftInterface.b114bf15c8508df4.js","bundle.ComposeV2":"bundle.ComposeV2.1e157c35b6434dcb.js","bundle.Conversation":"bundle.Conversation.ac83b1b8865b8ef1.js","bundle.ConversationParticipants":"bundle.ConversationParticipants.1dd98c6532c8d394.js","bundle.CredentialsPicker":"bundle.CredentialsPicker.b1e7a33cdc26f1ac.js","bundle.DirectMessages":"bundle.DirectMessages.f5c056761af48bd2.js","bundle.Download":"bundle.Download.f76c3abf3976b2d6.js","bundle.Explore":"bundle.Explore.357bc5750c112cdb.js","bundle.FollowerRequests":"bundle.FollowerRequests.5639c30d710f4035.js","bundle.FoundMedia":"bundle.FoundMedia.a8dc044ba72a1267.js","bundle.GenericTimeline":"bundle.GenericTimeline.865e036862a39389.js","bundle.Highlights":"bundle.Highlights.dcbcc3a9740565dc.js","bundle.HomeTimeline":"bundle.HomeTimeline.bb1cf156aecd8221.js","bundle.LiveEvent":"bundle.LiveEvent.44b0f8690a218e4b.js","bundle.LoggedOutHome":"bundle.LoggedOutHome.7cebe2d639e6baac.js","bundle.LoggedOutHomeV2":"bundle.LoggedOutHomeV2.aba2615d7d501496.js","bundle.Login":"bundle.Login.c040f59c0b88e5ca.js","bundle.Moment":"bundle.Moment.b5633aa8e8e2dd6c.js","bundle.NetworkInstrument":"bundle.NetworkInstrument.a64aa24912a1ea7d.js","bundle.NotificationDetail":"bundle.NotificationDetail.8626735cc1c716e4.js","bundle.Notifications":"bundle.Notifications.572bd2c0db1af32b.js","bundle.Ocf":"bundle.Ocf.43c617118e85cc40.js","bundle.Report":"bundle.Report.83a2bfca7c7f8d61.js","bundle.RichTextCompose":"bundle.RichTextCompose.33b589c3fbe06d23.js","bundle.Search":"bundle.Search.78ed436e150afbce.js","bundle.Settings":"bundle.Settings.5d9a181a3b3d62fa.js","bundle.SettingsInternals":"bundle.SettingsInternals.d70c97d527dca3b0.js","ondemand.countries-ar":"ondemand.countries-ar.2fb9a1ec172f64d5.js","ondemand.countries-bg":"ondemand.countries-bg.e35c4de062b13441.js","ondemand.countries-bn":"ondemand.countries-bn.fdc22fd7449c8e5e.js","ondemand.countries-ca":"ondemand.countries-ca.0c5f5fd3a6a19cd2.js","ondemand.countries-cs":"ondemand.countries-cs.1f9e5d57c71ac33b.js","ondemand.countries-da":"ondemand.countries-da.ee1520cf42140f35.js","ondemand.countries-de":"ondemand.countries-de.1c92da67cba0f7ef.js","ondemand.countries-el":"ondemand.countries-el.f6fed22f9c4b446a.js","ondemand.countries-en":"ondemand.countries-en.4340fa379bfcd96c.js","ondemand.countries-en-GB":"ondemand.countries-en-GB.0ddb82ed14613896.js","ondemand.countries-es":"ondemand.countries-es.a81afc755f1b2715.js","ondemand.countries-eu":"ondemand.countries-eu.52e072afc9fd8d81.js","ondemand.countries-fa":"ondemand.countries-fa.d39b25a43b80d4e1.js","ondemand.countries-fi":"ondemand.countries-fi.8619573e4c56f488.js","ondemand.countries-fil":"ondemand.countries-fil.86108f2daf1ac62f.js","ondemand.countries-fr":"ondemand.countries-fr.3ad8cb5c65a08975.js","ondemand.countries-ga":"ondemand.countries-ga.21cabd7b5f5fd189.js","ondemand.countries-gl":"ondemand.countries-gl.4e5f386bddc27df1.js","ondemand.countries-gu":"ondemand.countries-gu.a0ea66135f7a5b0e.js","ondemand.countries-he":"ondemand.countries-he.550a1106d3083b7d.js","ondemand.countries-hi":"ondemand.countries-hi.b65f7e584a50308e.js","ondemand.countries-hr":"ondemand.countries-hr.95b04f3a9334fe47.js","ondemand.countries-hu":"ondemand.countries-hu.80376d4679383ea3.js","ondemand.countries-id":"ondemand.countries-id.8e2c5175cd66aa65.js","ondemand.countries-it":"ondemand.countries-it.e9a06233169f19b2.js","ondemand.countries-ja":"ondemand.countries-ja.f766fe1300c540a4.js","ondemand.countries-kn":"ondemand.countries-kn.c2e2138da8f0261e.js","ondemand.countries-ko":"ondemand.countries-ko.606f5d220279b29c.js","ondemand.countries-mr":"ondemand.countries-mr.2ac9b3be671767b0.js","ondemand.countries-ms":"ondemand.countries-ms.75b244ff50f9cb0a.js","ondemand.countries-nb":"ondemand.countries-nb.968cba5a0112717e.js","ondemand.countries-nl":"ondemand.countries-nl.8928a1c2a78f816d.js","ondemand.countries-pl":"ondemand.countries-pl.b93f0a3117b687a2.js","ondemand.countries-pt":"ondemand.countries-pt.1033685135644792.js","ondemand.countries-ro":"ondemand.countries-ro.5f9a77a9943738d1.js","ondemand.countries-ru":"ondemand.countries-ru.c523c152294ff17a.js","ondemand.countries-sk":"ondemand.countries-sk.e5ba5e5a30d63a26.js","ondemand.countries-sr":"ondemand.countries-sr.8436803b7ca45970.js","ondemand.countries-sv":"ondemand.countries-sv.d3d68607f006f364.js","ondemand.countries-ta":"ondemand.countries-ta.9924444018bd6fd6.js","ondemand.countries-th":"ondemand.countries-th.97d27d91f1e8b1f9.js","ondemand.countries-tr":"ondemand.countries-tr.5e21112431856457.js","ondemand.countries-uk":"ondemand.countries-uk.fae009d4afa54dbe.js","ondemand.countries-zh":"ondemand.countries-zh.6ef733aab4c67996.js","ondemand.countries-zh-Hant":"ondemand.countries-zh-Hant.fea4ecfac68c55b0.js","bundle.SettingsProfile":"bundle.SettingsProfile.ab283edb8316db9e.js","bundle.SettingsTransparency":"bundle.SettingsTransparency.bfc5a5b693efdecb.js","bundle.SmsLogin":"bundle.SmsLogin.83386fa187f3751f.js","bundle.Stickers":"bundle.Stickers.44fa69df5bb970b2.js","bundle.Topics":"bundle.Topics.f031b702b5662037.js","bundle.Trends":"bundle.Trends.7d8fb7840bb1695d.js","bundle.TweetActivity":"bundle.TweetActivity.74d0c23f81ca5985.js","bundle.TweetMediaDetail":"bundle.TweetMediaDetail.8a914737d032edde.js","bundle.TweetMediaTags":"bundle.TweetMediaTags.7ac2030e881605c8.js","bundle.Twitterversary":"bundle.Twitterversary.cd6142decdce8223.js","bundle.UserAvatar":"bundle.UserAvatar.4ff8f77f9835ddbf.js","bundle.UserFollowLists":"bundle.UserFollowLists.761911bc2abed421.js","bundle.UserLists":"bundle.UserLists.7d92464eff372ac0.js","bundle.UserMoments":"bundle.UserMoments.2dd22a822fb08c4e.js","bundle.UserOnboarding":"bundle.UserOnboarding.dae2f08bc6ce1e63.js","bundle.UserProfile":"bundle.UserProfile.a7a78f92a1d13554.js","bundle.UserProfileTimelines":"bundle.UserProfileTimelines.44603ddd0d35316a.js","ondemand.ProfileSidebar":"ondemand.ProfileSidebar.59898236d7f06fe0.js","bundle.UserProfileSuspended":"bundle.UserProfileSuspended.d311eb8da64b059a.js","bundle.UserRedirect":"bundle.UserRedirect.870c393f79eed48a.js","bundle.shared.Compose":"bundle.shared.Compose.4cca3cba8d45cd5a.js","ondemand.SettingsInternals":"ondemand.SettingsInternals.7615b372ac35bdbb.js","shared":"shared.18a216f7e10b8cd4.js"};
</script>
<script>
 window.showFailureMessage = function (source) { window.Raven && window.Raven.captureMessage( 'Failed to load source', { level: 'error', extra: { source: source } } ); document.getElementById('failureMessage').style.display = 'block'; }; window.reloadFromScriptError = function () { window.location.reload(); }; </script>
<script crossorigin="anonymous" onerror="showFailureMessage('https://abs-0.twimg.com/responsive-web/web/ltr/runtime.270230779ff8aae3.js');" src="https://abs-0.twimg.com/responsive-web/web/ltr/runtime.270230779ff8aae3.js"></script>
<script crossorigin="anonymous" onerror="showFailureMessage('https://abs-0.twimg.com/responsive-web/web/ltr/vendor.c66168a4671f3557.js');" src="https://abs-0.twimg.com/responsive-web/web/ltr/vendor.c66168a4671f3557.js"></script>
<script crossorigin="anonymous" onerror="showFailureMessage('https://abs-0.twimg.com/responsive-web/web/ltr/i18n/en.eeb53accf85c34c7.js');" src="https://abs-0.twimg.com/responsive-web/web/ltr/i18n/en.eeb53accf85c34c7.js"></script>
<script crossorigin="anonymous" onerror="showFailureMessage('https://abs-0.twimg.com/responsive-web/web/ltr/main.d53f6bdacc4ceca3.js');" src="https://abs-0.twimg.com/responsive-web/web/ltr/main.d53f6bdacc4ceca3.js"></script>

@taspinar
Copy link
Owner

Does forcing the US to be Mozilla/5.0 (Windows NT 6.1; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/66.0.3359.181 Safari/537.36 also help?

@bengarvey
Copy link

@taspinar setting that as my HEADER_LIST seems to have worked.

@bengarvey
Copy link

Worked all day yesterday, but stopped working again this morning.

@lapp0
Copy link
Collaborator

lapp0 commented Jun 3, 2018

@bengarvey still working for me. What's your query? What's your error?

@bengarvey
Copy link

No results returned for even the simplest query using the HEADER_LIST you posted

 twitterscraper Trump --limit 100

@lapp0
Copy link
Collaborator

lapp0 commented Jun 3, 2018

Seems you've had it stop working for you, then start working again multiple times.

Is there some common factor in those times you've had it not work for you? Can you try running from the cloud?

I'm getting results for your query btw.

@bengarvey
Copy link

Not sue. It worked for a few days when I started using Mozilla/5.0 (Windows NT 6.1; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/66.0.3359.181 Safari/537.36 for the HEADER_LIST, but stopped suddenly this morning.

@bengarvey
Copy link

I made a tiny change to the header and it's working again. I think my header/IP was blocked, maybe?

@lapp0
Copy link
Collaborator

lapp0 commented Jun 5, 2018

Interesting. Could you post the response.text when you run it with the bad header? Perhaps there's some header-changing logic which can be applied based on detection of this specific failure by looking at response.text to fix this.

taspinar added a commit that referenced this issue Jun 13, 2018
useragents will no longer be generated with the fake_useragent package.
At the moment seven of the most popular useragent strings are written hardcoded in query.py
In later stages this should be done via a separate module.

This fixes #118,
#115,
#90
@taspinar
Copy link
Owner

I believe that most of the 'not being able to get all tweets' issues are caused by the useragent provided by fake_useragent.
I have removed fake_useragent as an dependency all together. See PR #119

Once this PR is merged, the newest versions should no longer have these issues.

Can you guys have a look at the PR?

taspinar added a commit that referenced this issue Jun 20, 2018
* query.py : remove fake_useragent, move separate try / except

- useragents will no longer be generated with the fake_useragent package.
At the moment seven of the most popular useragent strings are written hardcoded in query.py
In later stages this should be done via a separate module.
-By removing the separate try / except for the JsonDecodeError, each process now also continues with retrying to get the data instead of exiting.

This fixes #118,
#115,
#90
@LesmesWeb
Copy link

good day, I have the following problem. Now update to version 0.7.0.1 and execute the following query:

twitterscraper Trump -l 100 -bd 2017-01-01 -ed 2017-06-01 -o tweets2.json

It throws me the following errors:

Traceback (most recent call last):
File "/webapp/1-ProyectoDjango/ScraperEntorno/Entorno/bin/twitterscraper", line 7, in
from twitterscraper.main import main
File "/webapp/1-ProyectoDjango/ScraperEntorno/Entorno/local/lib/python2.7/site-packages/twitterscraper/init.py", line 13, in
from twitterscraper.query import query_tweets
File "/webapp/1-ProyectoDjango/ScraperEntorno/Entorno/local/lib/python2.7/site-packages/twitterscraper/query.py", line 10, in
from twitterscraper.logging import logger
File "/webapp/1-ProyectoDjango/ScraperEntorno/Entorno/local/lib/python2.7/site-packages/twitterscraper/logging.py", line 4, in
logger = logging.getLogger('twitterscraper')
AttributeError: 'module' object has no attribute 'getLogger'

Yesterday I was bringing Tweets and today not, so update to the new version.

@lapp0
Copy link
Collaborator

lapp0 commented Jun 20, 2018

@CamiloVeloz this is due to my pull request here #117. I didn't realize it was incompatible with python2. You can fix it by installing python3, or reverting the update.

@taspinar is this project intended to retain python2 compatability? If so, I could fix it so it works with python2.

@3ruce
Copy link

3ruce commented Jun 21, 2018

@CamiloVeloz I fixed the python 3 problem like this...

First I uninstalled the old version

sudo pip uninstall twitterscraper

Then I got the new one going like so...

cd /home/me/myinstalldirectory
git clone https://github.com/taspinar/twitterscraper.git
cd twitterscraper
sudo python3 setup.py install

Having run my test search twitterscraper trump -bd 2018-05-23 -ed 2018-05-24 --output=tweets-01.jsonwhich before this fix only returned 675 tweets, I got 1481 this time but the process died with this error

INFO: Querying trump since:2018-05-23 until:2018-05-24
ERROR: Failed to parse JSON "Expecting value: line 1 column 1 (char 0)" while requesting "https://twitter.com/i/search/timeline?f=tweets&vertical=default&include_available_features=1&include_entities=1&reset_error_state=false&src=typd&max_position=TWEET-999439049721970688-999439834686087168&q=trump%20since%3A2018-05-23%20until%3A2018-05-24&l=None"
Traceback (most recent call last):
  File "/usr/local/lib/python3.5/dist-packages/twitterscraper-0.7.1-py3.5.egg/twitterscraper/query.py", line 53, in query_single_page
    json_resp = json.loads(response.text)
  File "/usr/lib/python3.5/json/__init__.py", line 319, in loads
    return _default_decoder.decode(s)
  File "/usr/lib/python3.5/json/decoder.py", line 339, in decode
    obj, end = self.raw_decode(s, idx=_w(s, 0).end())
  File "/usr/lib/python3.5/json/decoder.py", line 357, in raw_decode
    raise JSONDecodeError("Expecting value", s, err.value) from None
json.decoder.JSONDecodeError: Expecting value: line 1 column 1 (char 0)
INFO: Got 1481 tweets for trump%20since%3A2018-05-23%20until%3A2018-05-24.

@lapp0
Copy link
Collaborator

lapp0 commented Jun 21, 2018

Hmm, you should have gotten a log line saying Retrying... (Attempts left attemptsleffhere) as per https://github.com/taspinar/twitterscraper/blob/master/twitterscraper/query.py#L81

It should retry 10 times by default. I'm assuming you didn't omit any log lines.

Can you check whether you are on latest master?

@3ruce
Copy link

3ruce commented Jun 22, 2018

I may not have upgraded successfully on my test machine but on my live machines, it's working much much better...

@taspinar
Copy link
Owner

@lapp0 , If it is not too much trouble...

@lapp0
Copy link
Collaborator

lapp0 commented Jun 24, 2018

@taspinar done #123

@lapp0
Copy link
Collaborator

lapp0 commented Jul 17, 2018

@taspinar this should be closed

@taspinar taspinar closed this as completed Nov 4, 2018
@PulpyJuice
Copy link

Hi @taspinar,

I don't know if you have received any messages regarding this error lately, but I have been having close to the same issues as stated above. I was going to try and manually input the user agent, but as I can understand the variable has been changed in a later version.

I have created a pastebin with the results by running a command from the documentation. I have removed duplicates as the original paste succeeded the free trial limits of PasteBin.

Link
https://pastebin.com/raw/bkyAgGsK

Best regards

@PulpyJuice
Copy link

PulpyJuice commented Jul 18, 2019

Update: I've tried changing my IP, knowing that Twitter has some odd ways of blocking. Seemingly it changed the amount of tweets I have been receiving, however, after a short while I still end up getting 0.

INFO: Got 120 tweets (120 new).
INFO: Got 240 tweets (120 new).
INFO: Got 360 tweets (120 new).
INFO: Got 480 tweets (120 new).
INFO: Got 600 tweets (120 new).
INFO: Got 720 tweets (120 new).
INFO: Got 720 tweets (0 new).
INFO: Got 720 tweets (0 new).
INFO: Got 720 tweets (0 new).
INFO: Got 720 tweets (0 new).
INFO: Got 720 tweets (0 new).
INFO: Got 720 tweets (0 new).
INFO: Got 720 tweets (0 new).
INFO: Got 720 tweets (0 new).
INFO: Got 720 tweets (0 new).
INFO: Got 720 tweets (0 new).
INFO: Got 720 tweets (0 new).

Update: Tested on two devices running 1809 WIN 10. No difference. Also tried running with limits/no limits, adding additional poolsize/running without set pool size. My main issue with this is not so much that I do not get a large enough data pool, but that the datapool is scattered extremely between dates. For instance, given a dataset of 100.000 tweets over 100 days I will have 3 days of 33.000 tweets and 97 days of nothing.

@lubhaniagarwal
Copy link

@bengarvey still working for me. What's your query? What's your error?

hello,
I am facing some issue as I'm getting 0 tweets. I cloned this https://github.com/lapp0/twitterscraper.git
and run python setup.py install
but it shows 0 tweets.
Please HELP!. Thanks

@bengarvey still working for me. What's your query? What's your error?

@lapp0
Copy link
Collaborator

lapp0 commented Jun 4, 2020

@lubhaniagarwal I think you should use taspinars branch, mine has last been updated in 2018, and the changes were already merged into taspinars.

@lubhaniagarwal
Copy link

lubhaniagarwal commented Jun 4, 2020 via email

@lapp0
Copy link
Collaborator

lapp0 commented Jun 4, 2020

@lubhaniagarwal git clone https://github.com/taspinar/twitterscraper.git

meticulousfan added a commit to meticulousfan/scraping-site that referenced this issue Aug 19, 2022
* query.py : remove fake_useragent, move separate try / except

- useragents will no longer be generated with the fake_useragent package.
At the moment seven of the most popular useragent strings are written hardcoded in query.py
In later stages this should be done via a separate module.
-By removing the separate try / except for the JsonDecodeError, each process now also continues with retrying to get the data instead of exiting.

This fixes taspinar/twitterscraper#118,
taspinar/twitterscraper#115,
taspinar/twitterscraper#90
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

10 participants